# Nonlinear manifold representations for functional data

###### Abstract

For functional data lying on an unknown nonlinear low-dimensional space, we study manifold learning and introduce the notions of manifold mean, manifold modes of functional variation and of functional manifold components. These constitute nonlinear representations of functional data that complement classical linear representations such as eigenfunctions and functional principal components. Our manifold learning procedures borrow ideas from existing nonlinear dimension reduction methods, which we modify to address functional data settings. In simulations and applications, we study examples of functional data which lie on a manifold and validate the superior behavior of manifold mean and functional manifold components over traditional cross-sectional mean and functional principal components. We also include consistency proofs for our estimators under certain assumptions.

10.1214/11-AOS936 \volume40 \issue1 2012 \firstpage1 \lastpage29

Functional manifold representations

A]\fnmsDong \snmChen\correflabel=e1]dchen@wald.ucdavis.edu and A]\fnmsHans-Georg \snmMüller\thanksreft1label=e2]mueller@wald.ucdavis.edu

t1Supported in part by NSF Grants DMS-08-06199 and DMS-11-04426.

class=AMS] \kwd62H25 \kwd62M09. Functional data analysis \kwdmodes of functional variation \kwdfunctional manifold components \kwddimension reduction \kwdsmoothing.

## 1 Introduction

Nonlinear dimension reduction methods, such as locally linear embedding row00 (), isometric mapping ten00 () and Laplacian eigenmaps bel03 (), have been successfully applied to image data in recent years. A commonly used example is the analysis of photos of a sculpture face taken under different angles and lighting conditions. The number of pixels of these images is huge, but their structure only depends on a few variables related to angle and lighting conditions. It is then advantageous to treat the observed image data as a manifold that is approximately isomorphic to a low-dimensional Euclidean space.

Unlike shape analysis ken99 () and the recent diffusion tensor imaging huc11 (), where it is assumed that the form of the manifold is known a priori, nonlinear dimension reduction methods usually are manifold-learning procedures, where the manifold is not known but it is assumed that it possesses certain features which are preserved in the observed data. For instance, locally linear embedding preserves the manifold local linear structure while isometric mapping preserves geodesic distance. Their inherent flexibility predisposes these methods for extensions to functional data, where one rarely would have prior information available about the nature of the underlying manifold.

Our goal is to explore manifold representations of functional data. Which observed sets of functions are likely to lie on a low-dimensional manifold? And how should this be taken into consideration? In contrast to multivariate data, functional data are recorded on a time or location domain, and commonly are assumed to consist of sets of smooth random functions. Auspicious examples where functional manifold approaches may lead to improved representations include time-warped functional data wan99 (), ger04 (), density functions kne01 (), and functional data with pre-determined and interpretable modes ize07 (). In such situations, the established linear functional approaches, such as cross-sectional mean and functional principal component analysis (FPCA) often fail to represent the functional data in a parsimonious, efficient and interpretable way. Manifold approaches are expected to be especially useful to represent functional data inherently lying on a low-dimensional nonlinear space.

In this paper, we develop a framework for modeling functions on unknown manifolds and propose pertinent notions, such as manifold mean, manifold modes of functional variation and functional manifold components, as elements of a functional manifold component analysis (FMCA). Manifold means complement notions of a specifically modified functional mean, such as the “structural mean” kne92 (). A major motivation for this proposal is that functional principal component plots, for example, displaying second versus first component, are quite often found to exhibit “horseshoe” shapes, that is, nonlinear dependence in the presence of uncorrelatedness (as principal components by definition are always uncorrelated). An example of this “horseshoe shape” is provided by the Berkeley growth data (see upper right panel of Figure 5). In such situations, one may wish to “unwrap” the “horseshoe” into linear structures by techniques similar to those used in nonlinear dimension reduction. When attempting to “unwrap” functional data, one encounters specific difficulties: Often the underlying smooth functions are not directly observed, but instead need to be inferred from a limited number of noise-contaminated measurements that contain the available information for each subject in the sample. To address these problems, we develop a modified ISOMAP ten00 () procedure, by adding a data-adaptive penalty to the empirical geodesic distances, and employ local smoothing to recover the manifold.

The paper is organized in the following way. In Section 2, we describe what we mean by a functional manifold, manifold mean, manifold modes of functional variation and functional manifold components. We develop corresponding estimates in Section 3 and discuss their asymptotic properties in Section 4. Sections 5 and 6 are devoted to illustrations of the proposed methodology for both simulated and real data. Detailed proofs can be found in an online supplement supp ().

## 2 Manifolds in function space

### 2.1 Preliminaries

A manifold can be expressed in terms of an atlas consisting of a group of charts , where are open sets covering and , the coordinate maps, map the corresponding onto an open subset of . Additional assumptions on are usually imposed in order to study the structure of doc92 (), hel01 ().

In this paper, we only consider “simple” functional manifolds in space, where is isomorphic to a subspace of the Euclidean space, that is, the manifold can be represented by a coordinate map , such that is bijective, and both , are continuous, in the sense that if and , ; if and , . Here, is the intrinsic dimension of the manifold . Such “simple” manifold settings have been commonly considered in the dimension reduction literature, for example in ten00 ().

For a continuous curve defined on the manifold , define the length operator

(1) |

where the supremum is taken over all partitions of the interval with arbitrary break points . We call an isometric map if for any continuous , where is similarly defined as in (1) with the norm replaced by the Euclidean norm. We say is an isometric manifold if there exists an isometric coordinate map . The isometry assumption is pragmatically desirable and can be found in many approaches ten00 (), don03 (). Conditions under which isometry holds for image data are discussed in don05 ().

We use the notation and refer to as the representation map. The manifold is naturally equipped with the distance, which, due to the nonlinearity of , is not an adequate metric ten00 (). More useful is the geodesic distance

(2) |

where the infimum is taken over all continuous paths on . The geodesic distance is the length of the shortest path on connecting the two points, and therefore is adapted to .

### 2.2 Manifold mean and manifold modes of variation

Suppose is a functional manifold of intrinsic dimension and is a representation map for . Define, with respect to a probability measure in ,

(3) |

where is the mean in the -dimensional representation space, and is the manifold mean in space. If is isometric, the manifold mean is uniquely defined for all isometric representation maps, as the following results shows.

###### Proposition 1

The expected value in equation (4) is with respect to the probability measure that is induced by the map ; see also bic07 (). Equation (4) defines the Fréchet mean for geodesic distance , and therefore does not depend on the choice of the isometric map . The motivation to consider the manifold mean is that the traditional cross-sectional mean for functional data in has significant drawbacks as a measure of location when the data indeed lie on a nonlinear functional manifold. Estimates of means, obtained by averaging observed sample curves, can be far away from the data cloud in such situations, and therefore do not represent the data in a meaningful way. Going beyond the mean, one encounters analogous problems when linearly representing such random functions in an basis, such as the Fourier, B spline or eigenfunction basis.

Consider random functions defined on a bounded domain . With and , according to Mercer’s theorem ash75 (), if the covariance function is jointly continuous in , , there is an orthonormal expansion of in terms of the eigenvalues (ordered nonincreasingly) and associated eigenfunctions ,

(5) |

By the Hilbert–Schmidt theorem gre50 (), rie90 (), can be expressed in terms of the so-called Karhunen–Loève representation,

(6) | |||||

where the are uncorrelated random variables with mean and variance , known as functional principal components (FPCs).

In the manifold case, the FPCs intrinsically lie on a -dimensional manifold. Therefore, we expect that the FPCs do not provide a parsimonious representation of . A better adapted and more compact representation can be obtained through nonlinear manifold modes of functional variation that are defined below. The established eigenfunction-based modes of functional variation cas86 (), jon92 () are

(7) |

where factors standardize the scale for different and the functional variation in the direction of eigenfunction is visualized by the changing of functional shapes as varies. However, when the functional data lie on a manifold, neither nor may belong to , so that these linear modes will not provide a sensible description of the variation in the data.

To address this problem, we define functional manifold component (FMC) vectors , , by the eigenvectors of the covariance matrix of , that is,

(8) |

where are the eigenvalues of . The manifold modes of functional variation are

(9) |

where is the mean in the -dimensional representation space according to measure , as given in (3). A distinct advantage of manifold-based modes of functional variation over the principal component based version (7) is that in (9) only finitely many modes are needed, while (7) requires potentially infinitely many components. The manifold modes are unique for the case of isometric , as shown in the following.

###### Proposition 2

Suppose and are two isometric representation maps for a functional manifold of intrinsic dimension . Let be the th manifold mode defined in (9) based on representation map , and be the th manifold mode using map . Then for all and , if the eigenvalues of and of are of multiplicity one.

For each , given the representation map , can be uniquely represented (due to the bijectivity of ) by a vector

(10) |

where and are defined in (3) and (8), respectively, is the inner product in and are uncorrelated r.v.s with mean and variance . We call the functional manifold components (FMCs) in the representation space.

## 3 Estimating functional manifolds

Suppose we observe which are noise-contaminated measurements made on independent realizations of a random function , according to the data model

Here the are the time points where the functions are sampled, and the are i.i.d. errors with mean 0 and variance . A first task is to find an approximation to the representation map based on the observed . We also require the inverse . Prior knowledge about the data may suggest a specific form for ize07 (), or one may have direct observations of . But in general, the representation map is unknown and needs to be determined from the data.

### 3.1 Inferring -dimensional manifold representations

Following ten00 (), we use the pairwise distances between observed data to obtain a map that preserves the geodesic distances. Alternative approaches include LLE row00 () and Laplacian eigenmaps bel03 (). While these methods have been developed for multivariate data, they can be adapted to functional data in a two-step procedure as follows.

In a first step, given an intrinsic dimension of , adopt the proposal of ten00 () to obtain the function only at the sample points , where , by

(11) |

Here, is the geodesic distance (2) and the minimum is taken over the vectors , formed by the values of on the functions , that is, the goal is to find vectors , that minimize (11). For this, one needs to estimate the geodesic distances, and then the minimizer is obtained by multidimensional scaling (MDS) based on estimates of cox01 (). Our asymptotic results pertain to a second step, where the assumed smoothness of is invoked to obtain global estimates for , as described in Section 3.2. As for , as determined by (11), we assume that the minimization in (11) provides values on or defines the target manifold at the sample points, that is, that , or alternatively, that .

In order to approximate geodesic distances , we first aim at estimates of the distances . For this purpose, the Karhunen–Loève representation (6) can be used to obtain fitted curves,

(12) |

Here, and are first obtained by applying local linear one-dimensional and two-dimensional smoothers to the pooled data; then eigenfunctions and eigenvalues are extracted by classical vector spectral analysis applied to a discretized version of the estimate of the covariance surface ; and then the FPCs are approximated by discretizing integrals

(13) |

or alternatively by conditional expectation (for details on these steps, see yao05 ()),

(14) |

where , , , , , and is estimated from the difference between empirical variances of and . The conditioning method (14) is the only available option if the data are sparsely sampled. To ensure that a sufficiently large number of components is included in the truncated expansion (12), one may choose by requiring a large fraction of variance explained (FVE), that is,

(15) |

for, say, , where the are estimates of the eigenvalues in (5). The resulting distances are .

Note that alternatively to representation (12), one can also directly apply local constant or local linear smoothing to obtain smooth trajectories in the case of dense and balanced designs, for example, using Nadaraya–Watson kernel estimators,

(16) |

where and are smoothing kernel and bandwidth. For the smoothing kernel one can use any standard kernel such as the standard Gaussian density function or the Epanechnikov kernel, while in practice may be chosen by cross-validation or generalized cross-validation.

Then the pairwise distances are simply . We will not explicitly explore this alternative smoothing approach in our theoretical analysis, but note that essentially the same results as those reported below hold for this alternative approach, by minor extensions of our arguments. In the implementations (simulation and data analysis), we use both approaches (12) and (16). The estimated random trajectories, obtained though (12) or (16), generally are not lying on the manifold , as they are merely approximations to the true unknown functions, due to additional noise and discrete sampling of the random trajectories. However, these estimates, owing to their consistency, will fall inside a small -neighborhood around . Asymptotic properties are discussed in Section 4.

Since the geodesic is the shortest path connecting points on a manifold, one may first connect the points inside small neighborhoods and then define the path between two far away points by moving along these small neighborhoods, and then find the geodesic by the shortest path connecting through such neighborhoods. This is essentially the idea of the ISOMAP algorithm ten00 (). The performance of this method however proved somewhat unstable in our applications, as functional data typically must be inferred from discretized and noisy observations of underlying smooth trajectories and therefore do not exactly lie on the manifold, as is assumed in ISOMAP.

In such situations, due to random scattering of the data around the manifold, the shortest path found by the ISOMAP criterion may pass through “empty areas” outside the proper data cloud. This problem can be effectively addressed by modifying the ISOMAP criterion, by additionally penalizing against paths that include sections situated within “empty regions” with few neighboring data points. Density-penalized geodesics are characterized by sequences of functions from the starting point to the end point of the geodesic, where each of the stands for one of the observed functions (with unrelated index), and are defined as

Here the parameter limits the step length, and the penalty function is determined by the density of the data cloud around and ,

where and denotes the cardinality of a set. By selecting the parameter , one can control the threshold of the local density of points, below which the penalty kicks in. The ISOMAP algorithm corresponds to the special case where .

The choice leads to “penalized ISOMAP” or P-ISOMAP, where the penalty parameter may be selected data-adaptively by cross-validation. The choice of and also of the step size parameter is discussed in Section 3.3. If the manifold is very smooth, a large and small will lead to a sufficiently good estimate of the geodesic distance. A detailed discussion of the convergence of the estimated geodesics in the framework of ISOMAP can be found at http://isomap.stanford.edu/BdSLT.pdf. For the proposed P-ISOMAP, we implement the minimization of by Dijkstra’s algorithm, which selects and the geodesic paths . The resulting estimated geodesic distance is

(18) |

where or , depending on which preliminary approximation is used for . Once these distances have been determined, an application of MDS yields , in the same way as in the standard ISOMAP method.

### 3.2 Obtaining the global map and representing sample trajectories

For any location , we find by local weighted averaging, that is,

(19) |

where is a -dimensional kernel, like the Epanechnikov kernel , with for a suitably chosen bandwidth , could be either as in (16) or as in (12), and is defined after (18). We use cross-validation to select (see Section 3.3). The asymptotic properties of (19) will be discussed in Section 4.

Specifically, as predictor of , we propose

(20) |

borrowing strength from local neighbors in the -dimensional representation space. This can be seen as an alternative to representation (12), where we use the FPCs and borrow strength from the whole data set to estimate functional mean and eigenbasis. As before, we note that (20) is not necessarily in , but will be in a small neighborhood asymptotically and in comparison with (12), (20) usually proves to be a much better predictor of for functional manifold data as shown in the simulations and applications in Section 5. Asymptotic properties are discussed in Section 4.

Definition (3) suggests to estimate the manifold mean by

(21) |

where . The FMC vectors defined in (8) are estimated by eigendecomposition of the sample covariance matrix of , that is, and are such that

where the are ordered to be nonincreasing in . From (9) and (19), we obtain estimates of the manifold modes as

(23) |

where and .

### 3.3 Selection of auxiliary parameters

We use -fold cross-validation to simultaneously choose the step size , the truncation parameter , and the smoothing bandwidth (see Sections 3.1 and 3.2). The number of candidates for and is kept small so that the cross-validation procedure runs reasonably fast. Candidates for the step size are the median distance of the th, the th and the th nearest neighbor; those for are selected such that , , and of the data with the lowest local density estimates are penalized. Each of 10 subgroups of curves denoted by is used as a validation set, one at a time, while the remaining data are used as training set.

In an initial step, we use the whole data set and a given , to determine , followed by estimation of for in the validation set, using (19) and assuming that only those in the training set are known. Denoting the value of the estimated trajectory , evaluated at time , by , the sum of squared prediction errors for the validation set is , where is the observed value of trajectory at time . The cross-validation choice is the minimizer of .

Following ten00 (), the intrinsic dimension can be chosen by the fraction of distances explained (FDE), that is,

(24) |

where we choose and , are by distance matrixes with as in (18), and where denotes the MDS solution (11) in , and is the matrix Frobenius norm, . Note that is the square root of the minimized value of (11).

## 4 Asymptotic properties

We provide the specific convergence rateof , defined in (12), under assumptions (A1)–(A5) in the Appendix. Note that condition (A3) requires that the random functions are sampled at a dense design. Our starting point is that the manifold can be well identified at the sample points through ISOMAP, or alternatively, that the ISOMAP identified manifold may be viewed as the target. The difference between the target and the identified manifold from ISOMAP is quantified by a rate that is assumed as given; if the target manifold corresponds to the manifold as identified at the sample points, we may set . The theoretical analysis aims to justify the new manifold representations that we propose, and for this it is essential to consider the behavior of the estimates across the entire function space. Therefore, our theoretical results demonstrate how to extend local behavior at the sample points to obtain global consistency of the proposed functional manifold representations.

As the convergence is for as , the rate of decline of the eigenvalues in (5) and also lower bounds on the spacing of consecutive eigenvalues, as postulated in (A4) are relevant, with a requirement of polynomially fast declining eigenvalues. Required smoothness and boundedness assumptions for are as in (A5).

###### Proposition 3

Assume (A1)–(A5) in the Appendix, and define . If there are infinitely many nonzero eigenvalues in (5), which are all of multiplicity one, then for sequences , subject to , where is a constant such that for some and where with and where is defined in (5) and is defined after (12), it holds that

(25) |

for defined in (12), where is such that for all and some .

We note that under the assumptions, . The first term on the r.h.s. of (25) is due to estimation error and the second term is due to truncation error. In the special case when there are only finitely many nonzero in (5), it can be shown that the rate in (25) simply becomes . Next we discuss the convergence of the estimates that appear in (3.2).

###### Proposition 4

###### Theorem 1

Under (A1)–(A5), (B1), (B2) and (C1)–(C3) in the Appen- dix, assume that the density function of satisfies for a specific and that is selected such that , and . Then defined in (19), using , is a consistent estimate of . Specifically, defining where and the orthonormal basis is given in (5), and defining , where as , it holds that

(29) | |||

where , and are as in assumptions (A3), (A4) and (B1).

Note that corresponds to the truncation error for . The last term is due to the estimation error as in Lemma 1. The middle term reflects the estimation error of the weights, which is influenced by the scale of the bandwidth. The first part is the optimal rate when the and are known, reflecting an intrinsically -dimensional smoothing problem. Related findings are discussed in bic07 ().

For the manifold modes, we obtain the following corollary.

###### Corollary 1

## 5 Examples and simulation study

### 5.1 Functional manifolds and isometry

To illustrate our methods and to discuss the impact of the critical isometry assumption, we consider the following three example functional manifolds: {longlist}[(iii)]

A one-dimensional functional manifold

where . This corresponds to random warping of a common shape function , which has two peaks. The time warping function is generated from the cumulative Beta distribution family and is a random parameter, , where .

A two-dimensional functional manifold

This manifold is a collection of Gaussian densities, corresponding to a shift-scale family, where , and .

Another two-dimensional functional manifold

a mixture of two peaks with randomly varying centers, where and . Note that the two peaks will merge to a larger peak when their locations are close, so this set of functions has a randomly varying number of peaks.

Functional manifolds – are illustrated in Figure 1. We note that is an isometric manifold and is approximately isometric, while is not isometric. This can be seen as follows. For functions on a differentiable isometric manifold with representation , using the definition of isometry given after (1), the condition for and any is equivalent to isometry. Therefore, the existence of a parametrization of the map for which the norms of the partial derivatives of with respect to the parameter components are constant is sufficient and necessary for to be isometric. For one-dimensional manifolds such as , one can always find such a parametrization, as long as is differentiable in the parameter and the derivative is integrable in .

For , such a parametrization does not exist, but since and for constants , and as is chosen to remain very close to , the natural parametrization approximately satisfies the condition for isometry. In contrast to and , the functional manifold is nonisometric and we include it as an example how the proposed methodology is faring when the key assumption of isometry is violated. As our considerations take place in a manifold learning framework, where the underlying manifold is unknown, an interesting aspect is to devise a data-based check to gauge the degree to which the isometry assumption can be expected to be satisfied. A natural metric for such a check is the fraction of distances explained (FDE), defined in (24). This criterion quantifies the percentage of geodesic distance that is preserved when fitting a -dimensional isometric manifold to the data. For cases where the underlying manifold is actually nonisomorphic, the fitted manifold is an isometric approximation to the true underlying manifold, obtained by minimizing the stress function in the MDS algorithm.

An informal goodness-of-fit criterion for isometry is to require FDE to be larger than 95%, and choosing the manifold with the smallest dimension that satisfies this criterion. In Table 1, values for FDE obtained for the simulated data for manifolds – under two signal-to-noise ratios (defined in the following subsection) are reported, with dimension ranging from to . The well-known fact that the stress function declines when the dimension of the projection space is increased underlies the traditional MDS-Scree Plot cox01 () and is reflected by the observed increase in the values for FDE as dimension increases.

Manifold | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|

0.1 | 0.999 | 0.999 | 0.999 | 0.999 | ||

0.5 | 0.993 | 0.995 | 0.995 | 0.996 | ||

0.1 | 0.988 | 0.994 | 0.996 | 0.996 | ||

0.5 | 0.971 | 0.974 | 0.978 | 0.980 | ||

0.1 | 0.932 | 0.957 | 0.977 | 0.980 | ||

0.5 | 0.906 | 0.948 | 0.955 | 0.958 | ||

Growth | 0.972 | 0.980 | 0.985 | 0.988 | ||

Yeast | 0.949 | 0.981 | 0.983 | 0.984 | ||

Mortality | 0.954 | 0.973 | 0.980 | 0.982 |

Applying the above check for isometry, we find that indeed the dimensions of the isometric manifold and the near-isometric manifold are correctly selected, while the first two dimensions of the isometric manifold approximation to the nonisometric manifold are not sufficient. Thus, the nonisometric nature of means that the dimension of the underlying functional manifold cannot be correctly identified and instead the proposed algorithm will find a higher-dimensional isometric manifold to represent . The price to pay for a suitable isometric approximation is increased dimensionality, which in this example ends up larger than 2 for the approximating isometric manifold. We note that an approximating isometric manifold can always be found, since the linear and therefore intrinsically isometric manifold of infinite dimensionality that is spanned by the eigenfunction basis contains the random functions of the sample, according to the Karhunen–Loève theorem, and is always applicable.

While we can always find a near-isometric manifold of large enough dimensionality with the proposed algorithm, when the data lie on a lower-dimensional nonisometric manifold, these approximating isometric manifolds may not be efficient, since they do not provide the lowest-dimensional possible description of the data. Nevertheless, an approximating isometric nonlinear manifold obtained by the proposed approach often will present a much improved and lower-dimensional description when compared to the alternative of classical linear basis representation. This is exemplified by the functional nonisometric manifold , which in the following subsection is shown to be much better represented by an isometric manifold than by a linear basis. So the price that the isometry assumption exacts in nonisometric situations is that the proposed approach leads to a more or less suboptimal representation, which however will often be substantially lower-dimensional than an equally adequate linear representation. We conclude that even in nonisometric situations the proposed approach can often be expected to lead to improved representations of functional data.

### 5.2 Simulation results

We simulate functional data from manifolds – as introduced in the previous subsection, aiming to study two questions. First, when the functional data lie on a manifold, whether it is isometric or not, does the proposed functional manifold approach lead to better (more parsimonious, better interpretable) representations of the data, compared to functional principal component analysis? Second, for noisy functional data that do not exactly lie on a manifold, how much improvement may one gain by adding the data-adaptive penalties implemented by P-ISOMAP, as described in Section 3.1?

For these simulations, the actual error-contaminated observations of the functional trajectories are generated as , i.i.d., , , where , equally spaced in with observations per trajectory, and the noise variance is such that the signal-to-noise ratio is or . We estimated manifold means (3), manifold modes of functional variation (9) and obtained predic-tions (20), which were compared with predictions obtained by functional principal component analysis.