Statistical Analysis of Longitudinal Data on Riemannian Manifolds

# Statistical Analysis of Longitudinal Data on Riemannian Manifolds

\setstretch

2

Modeling Longitudinal Data on Riemannian Manifolds

Xiongtao Dai, Zhenhua Lin and Hans-Georg Müller

Department of Statistics, Iowa State University, Ames, IA 50011 USA

Department of Statistics, University of California, Davis

Davis, CA 95616 USA

October 2018

ABSTRACT

\setstretch

1.1 When considering functional principal component analysis for sparsely observed longitudinal data that take values on a nonlinear manifold, a major challenge is how to handle the sparse and irregular observations that are commonly encountered in longitudinal studies. Addressing this challenge, we provide theory and implementations for a manifold version of the principal analysis by conditional expectation (PACE) procedure that produces representations intrinsic to the manifold, extending a well-established version of functional principal component analysis targeting sparsely sampled longitudinal data in linear spaces. Key steps are local linear smoothing methods for the estimation of a Fréchet mean curve, mapping the observed manifold-valued longitudinal data to tangent spaces around the estimated mean curve, and applying smoothing methods to obtain the covariance structure of the mapped data. Dimension reduction is achieved via representations based on the first few leading principal components. A finitely truncated representation of the original manifold-valued data is then obtained by mapping these tangent space representations to the manifold. We show that the proposed estimates of mean curve and covariance structure achieve state-of-the-art convergence rates. For longitudinal emotional well-being data for unemployed workers as an example of time-dynamic compositional data that are located on a sphere, we demonstrate that our methods lead to interpretable eigenfunctions and principal component scores, which are defined on tangent spaces. In a second example, we analyze the body shapes of wallabies by mapping the relative size of their body parts onto a spherical pre-shape space. Compared to standard functional principal component analysis, which is based on Euclidean geometry, the proposed approach leads to improved trajectory recovery for sparsely sampled data on nonlinear manifolds.

KEYWORDS: Longitudinal Compositional Data, Data on Spheres, Dimension Reduction, Functional Data Analysis, Principal Component Analysis, Sparse and Irregular Data. Research supported by NSF Grant DMS-1712864

## 1 Introduction

Functional data are usually considered as elements of a Hilbert space (Horvath & Kokoszka, 2012; Hsing & Eubank, 2015; Wang et al., 2016), a linear space with Euclidean geometry, where typical tools include functional principal component analysis (Kleffe, 1973; Hall & Hosseini-Nasab, 2006; Chen & Lei, 2015) and functional regression (Hall & Horowitz, 2007; Kong et al., 2016; Kneip et al., 2016). Considerably less work has been done on the analysis of nonlinear functional data, which are increasingly encountered in practice, such as -valued functional data (Telschow et al., 2016), recordings of densely sampled trajectories on the sphere, including flight trajectories (Anirudh et al., 2017; Dai & Müller, 2018), or functions residing on unknown manifolds (Chen & Müller, 2012).

Since functional data are intrinsically infinite-dimensional, dimension reduction is a necessity, and a convenient and popular tool for this is functional principal component analysis, which is geared towards linear functional data and is not suitable for functional data on nonlinear manifolds, for which Dai & Müller (2018) investigated an intrinsic Riemannian Functional Principal Component Analysis (Riemannian FPCA) for functions taking values on a nonlinear Euclidean submanifold, with a Fréchet type mean curve. The concept of Fréchet mean as a minimizer of the Fréchet function extends the classical mean in Euclidean spaces to data on Riemannian manifolds (Patrangenaru et al., 2018). Using Riemannian logarithm maps, data on manifolds can be mapped into tangent spaces identified with hyperplanes in the ambient space of the manifold. Then Riemannian FPCA can be conducted on the mapped data, where the Fréchet mean and Riemannian logarithm maps reflect the curvature of the underlying manifold, yielding representations that are intrinsic to the manifold, which is an advantage over extrinsic approaches.

A challenge is that functional data are often sparsely observed, i.e. each function is only recorded at an irregular grid consisting of a few points. Such sparse recordings are routinely encountered in longitudinal studies (Verbeke et al., 2014). For example, in a longitudinal survey of unemployed workers in New Jersey (Krueger & Mueller, 2011) that we analyze in Section 4, the number of longitudinal responses available per subject is less than 4 in more than a half of the subjects. For sparsely observed longitudinal/functional data such as these, observations need to be pooled across subjects in order to obtain sensible estimates of mean and covariance functions, as the data available for individual subjects are so sparse that meaningful smoothing is not possible. This pooling idea is at the core of the principal analysis by conditional expectation (PACE) approach (Yao et al., 2005), whereas for densely sampled functional data one can apply individual curve smoothing or cross-sectional strategies (Zhang & Chen, 2007).

An special case of longitudinal data are longitudinal compositional data, where at each time point one observes fractions or percentages for each of a fixed number of categories, which add up to one. Such data occur in many applications, eg., repeated voting, with counts transformed into percentages of votes for items, consumer preferences in terms of what fraction prefers a certain item, microbiome (Li, 2015), online prediction markets, soil or air composition over time, mood assessment, and shape analysis, where we will study data of the latter two types of longitudinal data in Section 4. While a classical approach for compositional data is to apply the simplex geometry in the form of the Aitchison geometry (Aitchison, 1986) or a variant (Egozcue et al., 2003; Talská et al., 2018), a disadvantage is that a baseline category needs to be identified, which cannot have null outcomes, due to the need to form quotients; this is is especially difficult to satisfy in longitudinal studies, where null outcomes may fluctuate between categories.

Motivated by the need to analyze sparsely sampled longitudinal data as for example found in the emotional well-being data collected in a longitudinal survey for unemployed workers in New Jersey and containing a substantial proportion of null outcomes, we develop a Riemannian principal analysis method geared towards sparsely and irregularly observed Riemannian functional data.

The main contributions of this paper are three-fold:

(1) We develop a principal component analysis for longitudinal compositional data, which we also illustrate with sparsely sampled body shape growth curves of Tammar wallabies, extending the scope of the approach of Dai & Müller (2018), which only applies to densely observed data. To our knowledge, no methods exist yet for the analysis of longitudinal data on manifolds.

(2) We extend Fréchet regression Petersen & Müller (2018) to functional data, while the approach in Petersen & Müller (2018) was restricted to nonfunctional data as dependence between repeated measurements is not taken into account.

(3) Concerning theoretical analysis, we extend the techniques developed in (Li & Hsing, 2010; Zhang & Wang, 2016) to manifold-valued data and obtain rates of uniform convergence for the mean function. The lack of a vector space structure makes this technically challenging.

To obtain intrinsic representations of the unobserved trajectories on a nonlinear Riemannian manifold from sparsely observed longitudinal data, we first pool data from all subjects to obtain estimates for the mean and covariance function, and then obtain estimates of the individual principal components and trajectories by Best Linear Unbiased Prediction (BLUP). We employ a manifold local linear smoothing approach to estimate the Fréchet mean curve, extending the approach of Petersen & Müller (2018) for sparsely observed Riemannian functional data. Local linear smoothing was originally studied in the context of Euclidean non-functional data (Fan & Gijbels, 1996) and later has been extended to curved non-functional data (Yuan et al., 2012). Observations of each function are then mapped into the tangent spaces around the estimated mean curve via Riemannian logarithm maps. As the log-mapped observations are vectors in the ambient space of the manifold, we proceed by adopting a scatterplot smoothing approach to estimate the covariance structure of the log-mapped data and then obtain a finitely truncated representation, where the principal component scores are estimated by PACE, or sometimes integration, depending on the sparseness of the observations available per function. Finally, a finite-dimensional representation for the original data is obtained by applying Riemannian exponential maps that pull the log-mapped data back to the manifold.

## 2 Methodology

### 2.1 Statistical Model

Let be a -dimensional, connected and geodesically complete Riemannian submanifold of , where and are positive integers such that . The dimension is the intrinsic dimension of the manifold , while is the ambient dimension. The Riemannian metric on , which defines a scalar product for the tangent space at each point , is induced by the canonical inner product of , and it also induces a geodesic distance function on . A brief introduction to Riemannian manifolds can be found in the appendix of Dai & Müller (2018), see also Lang (1995) and Lee (1997).

We define a -valued Riemannian random process, or simply Riemannian random process , as a -dimensional vector-valued random process defined on a compact domain such that , where we assume that the process is of second-order, in the sense that for every there exists , potentially depending on , such that the Fréchet variance is finite. For a fixed , if is a point on satisfying , then is a Fréchet mean of at . Under conditions described in Bhattacharya & Patrangenaru (2003), the Fréchet mean of a random variable on a manifold exists and is unique, which we shall assume for at all .

1. is of second-order, and the Fréchet mean curve exists and is unique.

Formally, we define the unique Fréchet mean function by

 μ(t)=argminp∈MM(p,t),t∈T. (1)

As is geodesically complete, by the Hopf–Rinow theorem, its exponential map at each is defined on the entire . To make injective, define the domain to be the interior of the collection of tangent vectors such that if is a geodesic emanating from with the direction , then is a minimizing geodesic. Then on the domain the map is injective, and its image is denoted by . The Riemannian logarithm map at , denoted by , is the inverse of restricted to . Specifically, if for some , then . To study the covariance structure of the random process on tangent spaces, we will assume

1. For some constant , , where denotes the set .

This condition requires to stay away from the cut locus of uniformly for all , which is necessary for the logarithm map to be well defined, and is not needed if is injective on for all . In the special case of a -dimensional unit sphere , if is continuous and the distribution of vanishes at an open set with positive volume that contains , (X1) holds. Under 2.1 and 1, is almost surely defined for all . We will write to denote the -valued random process and refer to as the log process of .

An important observation (Bhattacharya & Patrangenaru, 2003) is that . Furthermore, the second-order condition on passes on to , i.e., for every , where denotes the canonical Euclidean norm in . This enables us to define the covariance function of by

 Γ(s,t)=E{L(s)L(t)T},s,t∈T. (2)

This covariance function admits the eigendecomposition

 Γ(s,t)=∞∑k=1λkϕk(s)ϕTk(t),

where are orthonormal, , and . The logarithm process has the Karhunen-Loève expansion

 L(t)=∞∑k=1ξkϕk(t),

where

 ξk=∫TL(t)Tϕk(t)dt (3)

are uncorrelated random variables such that and .

A finite-truncated representation of intrinsic to the manifold is then given by

 XK(t):=Expμ(t)LK(t),LK(t)=K∑k=1ξkϕk(t) (4)

for some integer . It was demonstrated in Dai & Müller (2018) that this representation is superior in terms of trajectory approximation for densely/completely observed manifold valued functional data compared to functional principal component analysis (FPCA), which is not adapted to the intrinsic manifold curvature, and for the same reason the scores are better predictors for classification tasks when compared to traditional FPCs.

Suppose are i.i.d. realizations of a -valued Riemannian random process . To reflect the situation in longitudinal studies, we assume that each is only recorded at random time points , and each observation is furthermore corrupted by some intrinsic random noise. More specifically, we observe such that for some density supported on , and the are independent of the . Furthermore, conditional on and , the noisy observations are independent, where is independent of , with isotropic variance and . As , the assumption on implies that .

### 2.2 Estimation

For the case of sparse functional or longitudinal data that are the focus of this paper, it is not possible to estimate the mean curve using the cross-sectional approach of Dai & Müller (2018), as repeated observations at the same time are not available. Instead we develop a new method, for which we harness Fréchet regression (Petersen & Müller, 2018). Fréchet regression was developed for independent measurements, and for our purposes we need to study an extension that is valid for the case of repeated measurements.

For any and , where the kernel is a symmetric density function and is a sequence of bandwidths, with , we define the local weight function at by

 ^ωij(t,hμ)=1^σ20Khμ(Tij−t){^u2−^u1(Tij−t)},

where for , and . Defining the double-weighted Fréchet function

 Qn(y,t)=n∑i=1wimi∑j=1^ω(Tij,t,h)d2M(Yij,y),

which includes weights for individual subjects satisfying , we estimate the mean trajectory by

 ^μ(t)=argminy∈MQn(y,t).

Note that for the Euclidean special case, where , coincides with the loss function used in Zhang & Wang (2016) for linear functional data.

For the choice of the weights , two options have been studied in the Euclidean special case. One is to assign equal weight to each observation, i.e., with , used in Yao et al. (2005). The other is to assign equal weight to each subject, i.e., , as proposed in Li & Hsing (2010). We refer to the former scheme as “OBS” and to the latter as “SUBJ”, following Zhang & Wang (2016), who found that the OBS scheme is generally preferrable for non-dense functional data; the SUBJ scheme performs better for ultra-dense data; and an intermediate weighting scheme that is in between OBS and SUBJ performs at least as well as the OBS and SUBJ schemes in the Euclidean case. The latter corresponds to the choice for a constant with and , where and , and we refer to this choice as INTM.

To estimate the covariance structure, we first map the original data into tangent spaces, setting and treating as a column vector in . To smooth matrices for , we extend the scatterplot smoother (Yao et al., 2005) to matrix-valued data by finding minimizing matrices , and according to

 (^A0,^A1,^A2) (5) :=argminA0,A1,A2n∑i=1vi∑1≤j≠l≤mi∥Γijl−A0−(Tij−s)A1−(Til−t)A2∥2FKhΓ(Tij−s)KhΓ(Til−t),

where in the above weighted least squares error minimization step is the matrix Frobenius norm, is a bandwidth, and are weights with .

For the OBS weight scheme, , for the SUBJ scheme, , while for INTM, for a constant with and , where and , in analogy to Zhang & Wang (2016). We then use as obtained in (5) as an estimate of the population covariance function . For , the minimization is over symmetric matrices and symmetric semi-positive definite matrices . Estimates for the eigenfunctions and of are then obtained by the corresponding eigenfunctions and eigenvalues of .

In applications, one needs to choose appropriate bandwidths and , as well as the number of included components . To select for smoothing the mean function , we adopt a generalized cross-validation (GCV) criterion

 GCV(h)=∑ni=1∑mij=1d2M(^μ(Tij),Yij)(1−Kh(0)/N)2,

where is the total number of observations, and then choose as the minimizer of GCV. While a similar GCV strategy can be adopted to select the bandwidth for the covariance function , we propose to employ the simpler choice , which we found to perform well numerically and which is computationally efficient.

To determine the number of components included in the finite-truncated representation (4), it is sensible to consider the fraction of variation explained (FVE)

 FVE(K)=∑Kk=1λk∑∞j=1λj,ˆFVE(K)=∑Kk=1^λk∑∞j=1^λj, (6)

choosing the number of included components as the smallest such that the FVE exceeds a specified threshold ,

 K∗=min{K:FVE(K)≥γ},^K∗=min{K:ˆFVE(K)≥γ}, (7)

where common choices of are 0.90, 0.95, and 0.99.

### 2.3 Riemannian Functional Principal Component Analysis Through Conditional Expectation

The unobserved scores need to be estimated from the discrete samples or log-mapped samples . Approximating (3) by numerical integration is not feasible when the number of repeated measurements per curve is small, in analogy to the Euclidean case (Yao et al., 2005; Kraus, 2015). We therefore propose Riemannian Functional Principal Component Analysis Through Conditional Expectation (RPACE), generalizing the PACE procedure of Yao et al. (2005) for tangent-vector valued processes, where we apply best linear unbiased predictors (BLUP) to estimate the , obtaining the RFPC scores

 ~ξik=B[ξik∣Li]=λkϕTikΣ−1LiLi. (8)

Here denotes the best linear unbiased predictor. Writing for the vectorization operation, are the vectorized concatenated log-mapped observations for subject , , , and , where is the identity matrix. The entry of corresponding to is , where and denote the th or th entry in a vector or matrix , respectively. Substituting corresponding estimates for the unknown quantities in (8), we obtain plug-in estimates for ,

 ^ξik=^λk^ϕTik^Σ−1Li^Li, (9)

where ; , , and are obtained from , the minimizer of (5), and we define , where denotes the trace of a matrix . The -truncated processes

 LiK(t)=K∑k=1ξikϕk(t),XiK(t)=K∑k=1Expμ(t)(LiK(t)) (10)

are estimated by

 ^LiK(t)=K∑k=1^ξik^ϕk(t),^XiK(t)=K∑k=1Exp^μ(t)(^LiK(t)). (11)

The BLUP estimate coincides with the conditional expectation , or the best prediction of given observation , if the joint distribution of is elliptically contoured (Fang et al., 1990, Theorem 2.18), with the Gaussian distribution as the most prominent example.

## 3 Asymptotic Properties

To derive the asymptotic properties of the estimates in Section 2, in addition to conditions 2.1 and 1, we require the following assumptions.

1. The domain is compact and the manifold is a bounded submanifold of .

1. The kernel function is a Lipschitz continuous symmetric probability density function on .

1. Almost surely, the sample paths are twice continuously differentiable.

Note that the boundedness assumption on the manifold can be relaxed by imposing additional conditions on the random process , or by requiring a compact support for , . The assumptions on the manifold are satisfied for our data applications in Section 4 where the manifolds under consideration are spheres.

To state the next assumption, we define the following quantities. Let , where , , and for all by the Cauchy–Schwarz inequality. Note that the finiteness of is implied by the Lipschitz continuity of the kernel function and the compactness of the domain . Define and .

1. The Fréchet mean functions , , and exist and are unique, the latter almost surely for all .

2. The density of the random times when measurements are made is positive and twice continuously differentiable for .

Recall that denotes the tangent space at and is the Riemannian exponential map at , which maps a tangent vector onto the manifold . For , define a real-valued function , and , where is the Frechét variance function defined in Section 2.1. We assume

1. The Hessian of at is uniformly positive definite along the mean function, i.e.,

 inft∈Tλmin(∂2∂v2Gμ(t)(v,t)∣v=0)>0.

Conditions 3 is necessary to ensure a consistent estimate of the mean curve using -estimation theory, while 1 is a design density condition; both are standard in the literature (Zhang & Wang, 2016; Petersen & Müller, 2018). On a Riemannian manifold with sectional curvature at most , 3 and 2 are satisfied if the support of is within , where is a geodesic ball with center and radius (Bhattacharya & Bhattacharya, 2012); this specifically holds for longitudinal compositional data mapped to the positive orthant of a unit sphere. The next two conditions impose certain convergence rates for and , respectively. For simplicity, we shall assume , noting that results paralleling those in Zhang & Wang (2016) can be obtained for the general case.

1. and .

2. , , and .

The following result establishes the uniform convergence rate for estimates .

###### Theorem 1.

Assume conditions 2.12, 3, 3, 32 and 1 hold. Then

 supt∈Td2M(^μ(t),μ(t))=OP(h4μ+lognnmhμ+lognn). (12)

This result shows that the estimate enjoys the same rate as the one obtained in Zhang & Wang (2016) for the Euclidean case, even in the presence of curvature. The rate in (12) has three terms that correspond to three regimes that are characterized by the growth rate of relative to the sample size: (1) When , the observations per curve are sparse, and the optimal choice yields ; (2) When , corresponding to an intermediate case, the optimal choice leads to the uniform rate for ; (3) When , the observations are dense, and any choice gives rise to the uniform rate . The transition from (1) to (3) is akin to a phase transition, similar to the one observed in Hall et al. (2006).

The next result concerns the uniform rate for the estimator of , the covariance function of the log-mapped data, extending a result of Zhang & Wang (2016) for the Euclidean case to curved functional data.

###### Theorem 2.

Assume conditions 2.12, 3, 3, 32, 1 and 2 hold. Then

 sups,t∈T∥^Γ(s,t)−Γ(s,t)∥2F=OP(h4μ+h4Γ+lognnmhμ+lognn+lognnm2h2Γ+lognnmhΓ). (13)

Again, the above rate gives rise to three regimes that are determined by the growth rate of relative to the sample size: (1) When , the observations per curve are sparse, and with the optimal choice and , one has ; (2) When , with the optimal choice , the uniform rate for is ; (3) When , the observations are dense, and any choice yields the uniform rate .

Furthermore, according to Lemma 4.2 of Bosq (2000), one has . It can also be shown that , where denotes the Lebesgue measure of . Therefore, the rate for provides a convergence rate for all estimated eigenvalues . Furthermore, according to Lemma 4.3 of Bosq (2000), if and , then , where and for . Again, by utilizing the fact that , we can derive the convergence rate for . For example, if we assume polynomial decay of eigenvalue spacing, i.e., for some constants and , then where is the rate that appears on the right hand side of (13), and the term is uniform for all .

## 4 Data Applications

### 4.1 Emotional Well-Being for Unemployed Workers

We demonstrate RPACE for the analysis of longitudinal mood compositional data. These data were collected in the Survey of Unemployed Workers in New Jersey (Krueger & Mueller, 2011), conducted in the fall of 2009 and the beginning of 2010, during which the unemployment rate in the US peaked at 10% after the financial crisis of 2007–2008. A stratified random sample of unemployed workers were surveyed weekly for up to 12 weeks. Questionnaires included an entry survey, which assessed demographics, household characteristics and income, and weekly followups, including job search activities and emotional well-being. In each followup questionnaire, participants were asked to report the percentage of time they spent in each of the four moods: bad, low/irritable, mildly pleasant, and good. The overall weekly response rate was around 40%; see Krueger & Mueller (2011).

We analyzed a sample of unemployed workers enrolled in the study, who were not offered a job during the survey period. The measurement of interest is the longitudinal mood composition, where is the proportion of time a subject spent in the th mood in the previous 7 days, , recorded on day since the start of the study. The number of responses per subject ranged from 1 to 12, so the data is a mixture of very sparse and mildly sparse longitudinal observations; for 25% of all subjects only one response was recorded. As subjects responded at different days of the week, the observation time points were also irregular. The sparsity and irregularity of the observations poses difficulties for classical analyses and prevents the application of the presmooth-and-then-analyze method (Dai & Müller, 2018), motivating the application of RPACE, which is geared towards such sparse and irregularly sampled manifold-valued functional data.

We applied RPACE for the square-root transformed compositional data

 X(t)={X1(t),…,X4(t)}={√Y1(t),…,√Y4(t)},

which lie on the sphere for , since compositional data are non-negative and sum to 1, using bandwidths and days, as selected by GCV, and the Epanechnikov kernel on . The mood composition trajectories for four randomly selected subjects are displayed in the left panel of Figure 1. The solid dots denote the reported moods, which are slightly jittered vertically if they overlap, and dashed curves denote the fitted trajectories when selecting components, selected according to the FVE criterion (7) with threshold , which is a reasonable choice in view of the large sample size. A substantial proportion of the mood compositions is zero, which is no problem for the square-root transformation approach in contrast to the alternative log-ratio transformation (Aitchison, 1986), which is undefined when the baseline category is 0.

As the self-reported moods contain substantial aberrations from smooth trajectories that we view as noise, the fitted trajectories do not go through the raw observations, and are drawn towards the observations for subjects with more repeated measurements. The mean trajectory is displayed in the right panel of Figure 1, indicating that the emotional well-being of subjects tends to deteriorate as the period of unemployment lengthens, with an overall increase in the proportion of bad mood and a decrease in the proportion of good mood.

The first four eigenfunctions for mood composition trajectories are shown in Figure 2, where the first eigenfunction corresponds to the overall contrast between neutral-to-positive mood (good and mild) and negative moods (low and bad); the second eigenfunction represents emotional stability, which is a contrast between more neutral moods and extreme emotions (good and bad); the third eigenfunction corresponds to a shift of mood compositions to more positive moods, namely from bad to low and from mild to good; the fourth eigenfunction encodes an increase of positive feelings and a decrease of negative ones over time. Here it is important to note that the sign of the eigenfunctions is arbitrary and could be reversed. The first four eigenfunctions together explain 95% of the total variation.

As an example to demonstrate that the scores obtained from RFPC are useful for downstream tasks such as regression, we explored the association between the second RFPC score, corresponding to the proportion of extreme moods, and annual household income in 2008, a measure of financial stability. We extracted the RFPC scores for each subject and constructed kernel density estimates for within each income category; see Figure 3. Participants with higher household income before losing their job and thus higher financial stability tend to have higher emotion stability, as demonstrated by the right-shifted distributions of and larger means (colored dots). The relationship between prior income and emotional stability appears to be nonlinear especially for the lower income groups.

### 4.2 Wallaby Body Shape Growth

Quantifying the shapes of organisms has been a long-standing statistical and mathematical problem (Thompson, 1942; Kendall et al., 2009). We apply RPACE to analyze the longitudinal development of body shapes of a sample of Tammar wallabies (Macropus eugenii), a small macropod native to Australia (data courtesy of Dr Jeff Wood, CSIRO Biometrics Unit INRE, Canberra, and data cleaning and corrections were performed by Professor Heike Hofmann, Department of Statistics, Iowa State University, in 2008). For each of measured wallabies from two locations, longitudinal measurements of the length (in inches) of six body parts were available at age in the first 380 days after birth, for and . The measurement time points for the wallabies were highly irregular, and the number of measurements per wallaby varied from 1 to 26, with 14 wallabies having no more than 7 measurements. Typical measurement patterns with mixed sparse and dense observations for each curve are shown in the left panel of Figure 4. This measurement scheme requires methodology that can handle the high degreee of irregularity in the measurement times.

To quantify shapes of wallabies, we normalized the length measurements by the Euclidean norm, obtaining , thus emphasizing the relative size of each body part expressed as a percentage of total size, leading to longitudinal preshape data (Kendall et al., 2009) that lie on a sphere. The are shape characteristics of wallabies at their respective age. RPACE was then applied to the transformed data with bandwidths and selected by GCV, using the Epanechnikov kernel. The Fréchet mean trajectory as displayed in the right panel of Figure 4 shows that relative to the body size, the tail becomes larger, while head, arm and ear lengths become relatively smaller throughout the first year of birth; leg and foot lengths increase from birth to roughly 6 months, where relative leg length development peaks before that of foot development.

To decompose the variation of individual shape trajectories, the first three eigenfunctions are displayed in Figure 5. The first eigenfunction corresponds to an overall contrast between tail and other body part development, and the second eigenfunction has a large component in the initial tail growth. The first two eigenfunctions together explain 64% of total variation, showing that tail length is a main driving force for shape differences. Pairwise scatterplots of the first three RFPC scores are shown in Figure 6, where each point stands for a single wallaby, and their patterns indicate two different geographic locations. Shape development differences between locations were mostly reflected in the second component, corresponding to initial tail growth, while the first and third components were less dissimilar.

## 5 Simulation Studies

We demonstrate the performance of the sparse RPACE method for scenarios with varying sample size, sparsity, and manifolds, for which we choose or SO. Here is the 2-sphere and SO is the manifold consisting of the orthogonal matrices with determinant 1. For each random trajectory on , , we sample observations , , where follows a uniform distribution on . The number of observations follows a discrete uniform distribution on , where is the maximum number of observations per curve that differs between scenarios.

The sparse observations were generated according to , , with manifold-specific mean function and eigenfunctions ; RFPC scores that follow independent Gaussian distributions with mean zero and variance , for ; and independent Gaussian errors with mean and isotropic variance on the tangent space . The cumulative FVE for the first components, defined as , are 63.2%, 86.4%, 95.0%, 98.15%, 99.3%, and 99.8%, respectively. For , we set where and ; eigenfunctions , with being the rotation matrix from to , and the orthonormal cosine basis on . For , and , where is the matrix exponential and maps a vector to a skew-symmetric matrix whose lower diagonal elements (ordered by column) are . We investigated three settings with varying sparsity and sample size: Scenario 1 (baseline): , ; Scenario 2 (sparse): , ; Scenario 3 (small ): , .

Three different FPCA approaches were evaluated for these scenarios, namely an extrinsic FPCA by Chiou et al. (2014), an extrinsic multivariate FPCA via componentwise FPCA (CFPCA) by Happ & Greven (2018), and the proposed RPACE. The extrinsic FPCA is a multivariate FPCA applied to the sparse manifold-valued data as if they are objects in the ambient Euclidean space. In the CFPCA approach one first fits an FPCA to each of the components functions, and then applies a second PCA to the pooled component scores to obtain summarized scores and multidimensional eigenfunctions. For sample trajectories with small variation around the mean, the extrinsic FPCA methods (FPCA and CFPCA) can be regarded as linear approximations to RPACE. The Epanechnikov kernel was used for the smoothers, with bandwidth selected by GCV and .

Using 200 Monte Carlo experiments, we report the average Root Mean Integrated Squared Errors (RMISE) for the fitted trajectories, defined as

 RMISE= ⎷1200200∑b=11D∫10dM(^XK(t),X(t))2dt,

for in Table 1. Since the fitted trajectories using FPCA and CFPCA lie in the ambient space but not on , we projected them back to the manifold by normalizing the norm of for or the eigenvalues of the matrix representation of for . RPACE was the overall best performer across the various scenarios, especially for the more parsimonious models. Scenario 2 (sparse) is considerably more difficult than Scenario 1, and smaller models with performed better. RPACE also works well for the smaller sample size in Scenario 3.

## Appendix

#### Proofs of Main Results

###### Proof of Theorem 1.

According to Lemma 3, where all auxiliary results are given in the next section,

 supp∈Msupt∈T|~Qhμ(p,t)−M(p,t)|=O(h2μ). (14)

With Lemma 4, by similar arguments as in Theorem 3 in Petersen & Müller (2018),

 supt∈Td2M(~μ(t),μ(t))=O(h4μ) (15)

as for