Joint Clustering and Registrationof Functional Data

# Joint Clustering and Registration of Functional Data

Abstract

Curve registration and clustering are fundamental tools in the analysis of functional data. While several methods have been developed and explored for either task individually, limited work has been done to infer functional clusters and register curves simultaneously. We propose a hierarchical model for joint curve clustering and registration. Our proposal combines a Dirichlet process mixture model for clustering of common shapes, with a reproducing kernel representation of phase variability for registration. We show how inference can be carried out applying standard posterior simulation algorithms and compare our method to several alternatives in both engineered data and a benchmark analysis of the Berkeley growth data. We conclude our investigation with an application to time course gene expression.

Keywords: Curve registration; Dirichlet process, Functional data clustering; Time course microarray data.

## 1 Introduction

Functional data is often characterized by both shape and phase variability. A typical example where these two sources of variation are clearly identified and interpreted are data arising from the study of human growth. Panel (a) and (b) of Figure 3 shows growth velocity curves of 39 boys and 54 girls from the Berkeley Growth Study (Tuddenham and Snyder, 1954). An overall pattern is observed that growth velocity decelerates to zero from infancy to adulthood, with some subtle acceleration-deceleration pulses during late childhood and a prominent pubertal growth spurt. In this setting, phase variability is identified as variation in the timing of subject-specific growth. Explicit consideration of phase variability is necessary in order to obtain consistent estimation of typical growth patterns.

The formal analytical treatment of this problem has a long history in Statistics and Engineering. Initial contributions focused on curve alignment (registration) via dynamic time warping (Sakoe and Chiba, 1978; Wang and Gasser, 1997, 1999) or landmark registration (Gasser and Kneip, 1995). Model-based alternatives represent subject-specific profiles as a parametric transformation of a common smooth regression function, evaluated over random functionals of time (Lawton et al., 1972; Kneip and Gasser, 1988). Several of these methods involve a transformation of both the and axes, essentially defining the mean profile for curve as , where is a monotone transformation function accounting for phase variability. In longitudinal settings, Brumback and Lindstrom (2004) introduced a mixed effect formulation of these models, formally accounting for dependence within subject. Similary, Telesca and Inoue (2008) proposed a Bayesian hierarchical curve registration (BHCR) model allowing for posterior inference on both the shape function and transformation functions . Whereas these considerations are valid for any function argument , it is most natural to think of as a time scale. In the following, we will therefore focus on the case of functional data observed over time.

Besides technical differences, these models of curve registration share a fundamental assumption, implying that all observed functional profiles are generated through semi-parametric transformations of a common shape . While this assumption is likely to be warranted in standard applications, the increasing popularity of these methods for the analysis of more general data classes (Telesca et al., 2009, 2012) motivates a methodological extension, conceiving the possible existence of shape-invariant subgroups, with group shapes .

We are not the first to recognize a need for combing clustering and registration. Stepwise procedures, where first curves are registered and then clustered according to a chosen heuristic, have been explored in several applications (Liu and Müller, 2004; Tang and Müller, 2009; Slaets et al., 2012). Joint clustering and registration procedures have been discussed by Gaffney and Smyth (2005) and Liu and Yang (2009). While stepwise procedures are likely to provide suboptimal estimation, available joint clustering and registration techniques have only been developed under the assumption of linear shape invariant time transformations, where . Furthermore, model complexity, conceived as the number of clusters, is only treated as a nuisance parameter and fixed in a post-hock fashion via BIC or pBIC (Chou and Reichl, 1999).

We extend the BHCR model of Telesca and Inoue (2008) to allow for shape-subgroups. Our proposal is based on a reproducing kernel (B-spline) representation of both shape and time transformation functions. To relax the homomorphic assumption, we define a non-parametric prior over shape functionals via a Dirichlet process (DP) mixture (Ferguson, 1973; Antoniak, 1974; Quintana, 2006). Clustering is achieved implicitly and is interpreted in terms of shape similarities. The number of clusters is subject to direct estimation and inferences account fully for this layer of uncertainty, without the need for post-hock adjustments. Furthermore, we show how posterior simulation remains straightforward via a simple extension to standard Metropolis within Gibbs MCMC transitions.

Following Liu and Müller (2004), we show how this modeling approach is particularly useful for the analysis of time-course gene expression. While it is known that co-expressed genes are likely to be co-regulated, various regulation mechanisms, such as feedback loops and regulation cascades, may warp the timing of expression for genes involved in the same process or regulatory pathway (Weber et al., 2007). It is therefore desirable to have a model that can assign genes with similar, yet time-warped, expression profiles to the same cluster (Qian et al., 2001; Qin, 2006). In other words, it is important to have a model that is phase-variation tolerant when defining curve subgroups.

The remainder of this article is organized as follows. In section 2, we describe the the sampling model and priors. A posterior simulation strategy via MCMC is described in Section 3. In section 4, we apply the joint model to simulated datasets and compare it with single-purpose models: a clustering only model and a registration only model. In section 5 we apply the model to the Berkeley Growth Study data. In section 6, we apply the model to time course microarray data of response of human fibroblasts to serum stimulation. Finally in section 7, we conclude the paper with a critical discussion.

## 2 Model Formulation

### 2.1 Sampling Model

Let denote the observation of curve at time , where and . The sampling model is specified as follows:

 yi(t)=ci+aimi{μi(t,\boldmathϕ \unboldmathi),\boldmathθ \unboldmathi}+ϵi(t), (1)

where and is the precision parameter.

In formula (1), is the curve specific time transformation function, characterizing the latent time scale of curve , and is the curve specific shape function. The apparent lack of identifiability between and will be resolved in §2.2 by specifying a random probability functional prior for , implicitly producing functional clusters.

To achieve flexible modeling of both time transformation and shape functions, we use B-splines (De Boor, 1978). We model , where is the B-spline basis vector at time and is the curve specific basis coefficient vector. is a monotone function mapping the sampling time interval to the interval , with expansion constraint to allow the time scale to be transformed outside the observed sampling time interval . To ensure monotonicity and function image boundaries, we impose the following constraints

 (t1−Δ)≤ϕi1<…<ϕiq<ϕi(q+1)<…<ϕiQ≤(tn+Δ). (2)

We model shape functions as where is a B-spline basis vector and is the curve specific basis coefficient vector. No constraints are usually imposed on , unless specific shapes are preferred a-priori (see for example, Telesca et al. (2012)).

We note that the stochastic functionals and are now fully described by the distributions of and respectively. Identifiability of is ensured by modeling as a Dirichlet process mixture. In this setting, realizations of are discrete with probability one, with unique component vectors , (). These component vectors, in turn, define cluster specific shape functions , to which member curves are aligned through . Details are discussed in the following section.

### 2.2 Prior Model

We assume that shape function parameters and precisions to follow a Dirichlet process mixture prior. Let be a base distribution absolutely continuous with respect to the Lebesque measure on and a Dirac mass at . Using a predictive Pòlya urn scheme (Blackwell and MacQueen, 1973), we specify the prior distribution as follows:

 \boldmathθ \unboldmathi,τi|\boldmathθ \unboldmath−i,τ−i∼α(α+N−1)G0(% \boldmathθ \unboldmathi,τi)+1(α+N−1)∑j≠iδ(\boldmathθ \unboldmathj,τj), (3)

where is the set all the indices other than and is the weight parameter of the Dirichlet process model. This prior generates the shape and error precision for curve , from a mixture involving a random draw from the base density or the point mass ’s, .

Realizations from the prior in (3) define a discrete distribution, implying ties among ’s, . These ties are naturally interpreted as clusters among the curves, namely, curve and belong to the same cluster if . As a result, only unique values are observed, each of which is associated with a cluster and is denoted by , . In this setting, we can re-express formula (3) as:

 \boldmathθ \unboldmathi,τi|\boldmathθ \unboldmath∗k,τ∗k∼α(α+N−1)G0(\boldmathθ \unboldmathi,τi)+1(α+N−1)K−i∑k=1nk(−i)δ(\boldmathθ \unboldmath∗k,τ∗k), (4)

where is the size of cluster and is number of clusters when curve is excluded. The representation above implies that a complete sample of , is in one to one correspondence with a set of unique values, , , through cluster labels . Specifically, if and if is a new sample from , which means curve forms a new cluster of its own. As a result, the number of clusters is also determined implicitly.

We note that if we omit the time transformation modeling with and use time directly in the shape functions , our model reduces to standard functional clustering via Dirichlet process mixtures.

We assume that the base DP mixture density factors as , where and , a Gamma distribution with mean . The specific form of the precision matrix is determined by a second-order shrinkage process: with () where is the dimension of and (Lang and Brezger, 2004). In this setting, the product can be interpreted as a smoothing parameter for curve .

Similarly, we also use a penalized B-spline prior on the time transformation function parameters . In particular, letting be the vector associated with identity transformation so that , we assume with () where is dimension of and , implying . In the foregoing, is deterministic and is interpreted as a smoothing parameter.

Following Telesca et al. (2009), when modeling cluster specific common shape functions, we let the number of spline knots equal to the number of sampling time points. For the curve specific time transformation functions structural smoothness is imposed by their monotonicity (2), suggesting parsimony in the choice of the number of knots. In many application contexts, to equally spaced interior knots allow for enough flexibility in the representation of time transformation.

For ease of computation, we complete our model with priors and hyperpriors following principles of conditional conjugacy. Specifically, curve specific mean levels parameters are specified as and curve specific amplitude parameters . The assumption of strictly positive amplitudes is appropriate if synchronous but negatively correlated curves are to be clustered separately. Removing positivity restrictions will imply clustering of synchronous profiles. We complete our prior specifications assuming , , and . Smoothing parameters priors are specified as and . Finally, the weight parameter of the Dirichlet process mixture is defined as .

## 3 Posterior Inference

### 3.1 Posterior Simulation

Markov Chain Monte Carlo simulation from the posterior distribution is conceptually straightforward and obtained as a simple sequence of Metropolis-Hastings within Gibbs transitions.

For ease of notation, we let and . Without loss of generality, we also assume that curves are of the same length . The proposed Markov transition sequence is implemented by: (i) sampling given all other parameters, (ii) resampling given cluster indicators and all other parameters and (iii) sampling and remaining parameters from their full conditional posteriors. We outline details as follows.

(i) Sampling . The full conditional posterior of is a Dirichlet process mixture with updated mixing probabilities and components (Escobar, 1994; West et al., 1994):

 \boldmathη \unboldmathi∣\boldmathη % \unboldmath∗k,\boldmathϕ \unboldmathi,% \boldmathy \unboldmathi∼qi0Gi(\boldmathη \unboldmathi∣\boldmathϕ \unboldmathi,% \boldmathy \unboldmathi)+K−i∑k=1qikδ(\boldmathη \unboldmath∗k)qi0+K−i∑k=1qik, (5)

where is times the marginal likelihood of , is the full conditional density of and is the product of cluster size and the likelihood associated with .

To improve mixing rates, we combine the sampling of , and . Specifically, we use copies of , , one for each cluster. We update , assuming , with the Metropolis-Hastings algorithm as in Telesca and Inoue (2008), so that the appropriate time transformation is found for curve to be registered with the common shape function of cluster . We calculate each in (5) using the corresponding and . When calculating , we use the value of from the previous iteration of the Gibbs sampler.

Specifically, let and , and define the following summaries: , , and . To sample from its full conditional (5), we follow the procedure below:

1. Sample cluster membership which takes values on with probabilities proportional to .

2. If , we keep unchanged. Curve forms a new cluster, and a draw of from is obtained by first sampling and then sampling . If , we use the corresponding as a draw of and let .

(ii) Resampling given cluster indicators . After a sample of and is generated, to improve mixing rates, we update each from its full conditional , where is the set of curves in cluster (MacEachern and Müller, 1998). Furthermore, , s.t.

 \boldmathη \unboldmath∗k∣\boldmathy \unboldmathi∈Sk∼N(\boldmath% E \unboldmath−1k\boldmathμ \unboldmathk,(τ∗k\boldmathE \unboldmathk)−1)×Ga⎛⎝12∑i∈Skni+a,12⎛⎝∑i∈Sk(\boldmathy \unboldmathi−\boldmathc \unboldmathi)T(\boldmathy % \unboldmathi−\boldmathc \unboldmathi)−\boldmathμ \unboldmathTk\boldmathE \unboldmath−1k% \boldmathμ \unboldmathk⎞⎠+b⎞⎠, (6)

where and .

(iii) Sampling and all hyper parameters. To develop the full conditional of , we note that (Antoniak, 1974). Following (West, 1992), we define an auxiliary random quantity and a mixing probability :

 πx1−πx=aα+K−1N(bα−log(x)).

Conditioning on , it is easily shown that the full conditional distribution of is a mixture of gamma densities. Specifically,

 α∣x,K∼πxGa(aα+K,bα−log(x))+(1−πx)Ga(aα+K−1,bα−log(x)) (7)

The rest of the model parameters are simulated directly from their full conditional posterior distributions. Detailed results are reported in Web Appendix A.

### 3.2 Posterior Inference

We base our inference on MCMC samples from the posterior distribution of the model parameters. Inference for functional quantities is obtained by post-processing these finite-dimensional posterior samples. To get a point estimate of the clustering structure we use the maximum a-posterior (MAP) clustering.

Given posterior samples of , , posterior samples of the time transformation function at any time point can be calculated as:

 μ(j)i(t)=μ(j)i(t,\boldmathϕ \unboldmath(j)i)=\boldmathB \unboldmathTμ(t)\boldmathϕ% \unboldmath(j)i. (8)

Here, the posterior mean function provides an point estimate of , and curves are registered on the transformed time scales within each cluster.

Similar estimators are defined for cluster-specific shape functions:

 mk(t)=c0+a0\boldmathB \unboldmathTm(t)% \boldmathθ \unboldmath∗k,(k=1,…,K); (9)

and curve specific profiles:

 mi(μi(t))=ci+ai\boldmathB \unboldmathTm(%\boldmath$B$\unboldmathTμ(t)\boldmathϕ % \unboldmathi)\boldmathθ \unboldmathi. (10)

Point-wise credible intervals and functional bands are easily obtained as empirical quantiles. Alternatively, the simultaneous credible band for a function can be obtained as described in Crainiceanu et al. (2007) and Telesca and Inoue (2008).

To assess the model fit, we use the Conditional Predictive Ordinate (CPO) (Geisser and Eddy, 1979; Pettit, 1990). The theoretical CPO for curve is defined as

 p(\boldmathy \unboldmathi|\boldmathy \unboldmath−i)=∫p(\boldmathy \unboldmathi|Θ)p(Θ|% \boldmathy \unboldmath−i)dΘ, (11)

where denotes the collection of all model parameters. A Monte Carlo estimate, based on posterior draws is defined as:

 CPOi={M−1M∑j=1p(\boldmathy % \unboldmathi|Θ(j))−1}−1. (12)

Overall model fit is assessed using the log pseudo marginal likelihood (LPML), computed as:

 LPML=N∑i=1log(CPOi). (13)

## 4 A Monte Carlo Study of Engineered Data

We carry out a simulation study aimed at assessing the merits of joint clustering and registration and comparing the performance of our modeling strategy to common clustering techniques. We consider 100 datasets, each consisting of, curves in clusters. Each curve is generated as: , (if ), with , and . We simulate realizations at equidistant time points within interval . We use the following cluster specific shape functions: , , and . Cluster serves as a noise cluster with no signal. Finally, time transformations are generated as a monotone linear combination of B-spline basis, defined by one interior knot at .

We fit our model overparametrizing functional forms and fix equidistant interior knots between to to model common shapes spline bases, and interior knots at to model time transformation spline bases. Precisions and the Dirichelet mixture weight are assigned diffuse priors, (mean=1, variance=100).

To assess the joint model’s ability to simultaneous cluster and register curves, we compare the model with a registration only (BHCR) model and the clustering only model as described in section 2.2. For both the clustering only model and the registration only model, iterations is run for the MCMC with the first as burn-in. We also compare our model with model-based clustering (MCLUST) (Fraley and Raftery, 2002) and functional clustering (FCM) (James and Sugar, 2003).

Figure 1 shows the results from one of the simulations. Panel (a) shows curves color-coded by cluster membership. Panel (b) shows estimated individuals curves clustered and registered within each cluster, with superimposed cluster-specific shape functions (solid black). Panel (c) shows posterior expected cluster-specific shape functions (black) against the simulation truth (gray). The model is able to accurately recover cluster specific shapes. Panel (d)-(f) show results for three individual curves, each from one of the first three clusters. Posterior estimates of individual curves (solid black) are close to the simulation truth (solid gray) and simultaneous credible bands achieve calibrated coverage. Also shown are profile estimates from the the registration only model (dotdash) and the clustering only model (dotted). As expected, since the registration only model assumes all the curves share a common shape function, cluster-specific functional features are confounded and model fits tend to exhibit spurious features. The clustering only model is inherently highly flexible, as small sub-clusters are allowed to form and fit specific profiles. However, since there tend to be only few curves in each cluster, the loss of information results in noisier estimates.

We also compared our joint model with MCLUST and FCM in terms of both curve estimation accuracy and clustering accuracy. We apply MCLUST and FCM on the same 100 datasets, allowing for up to 10 clusters. Figure 2 summarizes comparison results. Panel (a) shows boxplots of the log pseudo marginal likelihood (LPML) comparing the three Bayesian models. In panel (b) we show boxplots of the simulation mean squared error (MSE) between the estimated and true individual curves for all five models. The joint model exhibits best performance in terms of MSE, and the three Bayesian model perform better than MCLUST and FCM. To compare the clustering performance we used adjusted Rand index (Hubert and Arabie, 1985). Panel (c) shows the boxplots of adjusted Rand indices for the four clustering models. The joint model leads to much higher indices, when compared to the other models considered. The clustering only model does as well as MCLUST and considerably better than FCM. Panel (d) shows a bar plot for the number of clusters identified by the four models. Out of the 100 datasets, the joint model identifies 4 clusters in 38 datasets and 5 clusters in 33 datasets. FCM also does well in identifying the correct cluster numbers, specifically it identifies 4 clusters in 46 datasets and 3 clusters in 27 datasets. The clustering only model and MCLUST tend to overestimate the number of clusters.

We repeated the joint clustering and registration analysis under several prior specifications, in order to assess sensitivity. While the formal task is daunting, due to the large number of parameters in the model, we have found that reasonable variations in prior choice has little impact on final inference, detailed results are reported in Web Appendix A. Clearly, different considerations may apply under different sample size scenarios.

## 5 A Cluster Analysis of the Berkeley Growth Data

We apply the proposed model to the well known Berkeley Growth data and compare it with the clustering only model, registration only model, MCLUST and FCM. As discussed in Section 1, the Berkeley Growth Study (Tuddenham and Snyder, 1954) recorded the height of 39 boys and 54 girls for 27 time points between age 2 to 18, with one measurement a year before age 9 and two measurements a year after. To construct the growth velocity curves from the original growth curves, a smoothing spline model was fitted to each growth curve, and the first degree derivatives were obtained from the model and used in our comparisons. In Figure 3(a) and (b), the growth velocity curves of boys (blue) and girls (pink) are plotted against age with superimposed cross-section means (black). Within each sex, curves have similarities in shape, while each curve shows individual time and amplitude variation. As pointed out by Ramsay and Li (1998) and Gervini and Gasser (2004), failing to account for time variability, produces inconsistent estimates of sex-specific growth velocities. Our analysis is non-standard, as we use sex as a hidden label to assess clustering performance. While illustrative, this exercise finds justification in the fact that sex is expected to explain a large portion of variation in adolescent growth patterns.

Shape functions basis are constructed fixing and placing 27 equidistant interior knots between to . To model time transformation functions, we place four interior knots at and partition the interval into five subintervals. Priors on precisions and mixture weight are set as in Sec. 4. Our inferences are based on MCMC iterations, with burn-in.

The model identifies two clusters, seemingly discriminative according to sex.If we label the first cluster as the ”boy” cluster and the second cluster as the ”girl” cluster, then out of girls are clustered correctly and out of boys are clustered correctly. The overall classification accuracy is . Estimated time transformation functions, common shape functions and registered curves are shown in Figure 3. Panel (c) and (d) show the registered curves for the clusters, superimposed by their corresponding common shape functions from the joint model (black solid), MCLUST (red), FCM (green) and the cross sectional mean curves (black dashed). Individual curves are colored by their true gender information, blue for boys and pink for girls. Therefore, pink curves in panel (c) and blue curves in panel (d) show the misclassified cases. Panel (e) and (f) show the estimated curve specific time transformation functions for the two clusters.

We compared the joint model with the clustering only model, the registration only model, MCLUST and FCM, and the results are shown in Figure 4. Panel (a) shows the boxplots of CPO of the individual growth curves by the three Bayesian models. It shows that the joint model fits the data best, followed by the registration only model and the clustering only model. When the curves are not too dramatically different, the registration only model can fit the data accurately by finding a common shape function representing all the curves well.

As a comparable measure of model fit we compute the squared error (SE) between each curve and its fitted profiles.Panel (b) shows the boxplots of the SE over all the growth curves for the five models. The joint model gives the smallest SE, and the three Bayesian models fit the data better than MCLUST and FCM in terms of SE. Panel (c) and (d) show the model fitting results of two individual curves of a boy and a girl.

Interpreting sex as a clustering lable, we compare the joint model, the clustering only model, MCLUST and FCM using adjusted Rand indices (RIs). We find the following: FMC (RI = 0.61), MCLUST (RI = 0.47), JM (RI = 0.43) and CO (RI = 0.10). By this measure FCM and MCLUST seem to outperform our joint clustering and registrations model, with FCM giving the best clustering results. We note that when fitting MCLUST, we set the candidates of cluster numbers to be between 2 to 10, because when 1 is included as a candidate, MCLUST chooses it as the optimal cluster number, which leads to an adjusted Rand index of 0. On the other hand, as shown in panel (a) and (b) in Figure 3, FMC and MCLUST seem to provide unsatisfactory estimates of cluster specific shape functions, which the joint clustering and registration model estimates consistently with the findings of Ramsay et al. (1995) and Telesca and Inoue (2008), supporting the existence of the mid growth spurts.

## 6 Response of Human Fibroblasts to Serum

In this section, we apply the joint model to time course expression data of the response of human fibroblasts to serum in a microarray experiment of genes (Iyer et al., 1999). For human fibroblasts to proliferate in culture, they require growth factors provided by fetal bovine serum (FBS). In their study, after inducing primary cultured human fibroblasts to enter a quiescent state by serum deprivation for hours, the authors stimulated fibroblasts by adding medium containing FBS. A microarray experiment was then conducted to measure temporal gene expression levels at time points, from minutes to hours after serum stimulation. Furthermore, they selected genes with substantial time course expression change in response to serum and formed clusters using K-means clustering (Eisen et al., 1998). In our analysis, we consider a subset of genes, since they are associated with clear biological function categories as described in the original paper, and this provides a standard for us to validate the biological relevance of the clustered identified by our model.

We use the same prior setup as in previous sections. To model shape functions we use a maximum expansion constraint and place interior knots at the sampling time points and equidistant points in two intervals from to and from to respectively. To estimate the time transformation functions, we place four interior knots at in the sampling interval . Our inferences are based on MCMC iterations, with burn-in.

Panel (a) of Figure 5 shows the unregistered temporal expression curves of the 78 genes selected from the microarray experiment of human fibroblasts’ response to serum. Panel (b) shows the registered expression curves which are clustered into groups. Panel (c)-(f) show the clusters of registered expression curves separately, superimposed by their cluster specific common shape functions.

As shown in Figure 5 (c), genes in cluster are down-regulated at first and reach their lowest expression levels between and hours after serum stimulation. They begin to express about hours after the serum treatment, which is also the time when the stimulated fibroblasts replicate their DNA and reenter into the cell-division cycle. Several genes in cluster are known to be involved in mediating cell cycle and proliferation, for instance, PCNA, Cyclin A, Cyclin B1, CDC2 and CDC28 kinase, as shown in Table 1. Cluster in Figure 5(d) shows similar expression pattern to cluster , except they expression level are lower than those in cluster through the time window. Genes in cluster are also involved in cell cycle and proliferation, such as LBR. Figure 5(e) shows that genes in cluster respond immediately to serum stimulation, reach their expression peaks around 10 hours later and remain induced towards the end. They are known to be transcription factors and involved in coagulation and hemostasis because of fibroblasts’ role in clot remodeling. Typical genes include PAI1, PLAUR and ID3. As shown in Figure 5(f), genes in cluster are also induced quickly by serum treatment, reach their peaks at about 2 hours, and then gradually return to a quiescent state. Several of the genes here are known to encode transcriptional factors and other proteins involved in signal transduction, such as MINOR, JUNB, CPBP, TIGF, SGK and NET1.

## 7 Discussion

We propose a Bayesian hierarchical model for joint curve registration and clustering. Compared to previous methods, our proposal comes with several advantages. First, the model provides flexible nonlinear modeling for both components of variation. The Dirichlet process mixture prior over shape functionals strikes an automatic balance between complexity and parsimony. The implied posterior identifies subgroups of homomorphic curves, without the need to specify the number of clusters a priori. Finally, the increased model flexibility is still amenable to straightforward posterior simulation via MCMC, which provides exact inferences about a rich set of quantities of interest, without the need for simplifications or approximations.

The proposed B-spline representation of both shape and time transformation functions requires the a priori specification of the number and placement of spline knots. Our experiences suggests that a set of knots reproducing the original sampling time points works well for shape functions and 1 to 4 equidistant interior knots are enough for time transformation functions, as they carry smoothing properties through monotonicity constraints. Our simulation study shows that the model is robust to different prior choices. We however maintain, that different considerations may apply to ultra-sparse or, conversely, ultra-dense data settings.

The proposed modeling strategy has potentially broad applications to functional data analysis; especially when curve registration and clustering are of joint interest, as shown in our applications. In the first case study of the Berkeley Growth Data, our model is able to accurately separate growth curves into two clusters labelled by sex, and to correctly estimate the overall growth patterns for both sexes after registering curves in each cluster. In the second case study of time course expression data of human fibroblasts’ response to serum, our model identifies fours clusters of genes involved in distinct biological functions.

The proposed estimator of the clustering structure is the MAP clustering. Because Dirichlet process mixtures fully account for stochasticity in the potential alternative assignment of individual profiles to functional groups, it is possible, in principle, that the clustering structure with the second highest posterior probability is only a little less probable than the MAP clustering, yet it provides quite a different grouping structure.

We have not detected this type of phenomenon in our analyses. However, when it happens, some care is needed in summarizing complex posterior evidence. A possible alternative strategy to MAP is based on the estimation of a pairwise probability matrix whose elements are estimated probabilities that two curves are in the same cluster. Such a matrix can be easily generated by averaging the sampled association matrices from the MCMC output. Elements of an association matrix takes values , if two corresponding curves are in the same cluster, and otherwise. Hierarchical clustering may be used subsequently as a way to explore grouping structures (Medvedovic and Sivaganesan, 2002). Alternatively, Dahl (2006) proposed a least squares clustering by selecting the sampled clustering which minimizes the sum of squared deviations of its association matrix from the pairwise probability matrix.

Finally, when covariate information is available, the proposed model is easily extended to include a dependent Dirichlet process prior, using covariates to inform clustering.

## Supplementary Materials

Supplementary information is available from the authors.

### References

1. C E Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann. Statist, 2:1152–1174, 1974.
2. D. Blackwell and J.B. MacQueen. Ferguson distributions via pólya urn schemes. Ann. Statist., 1:353–355, 1973.
3. Lyndia C. Brumback and Mary J. Lindstrom. Self modeling with flexible, random time transformations. Biometrics, 60(2):461–470, 2004.
4. W Chou and W Reichl. Decision tree state typing based on penalized bayesian information criterion. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process, pages 345–348, 1999.
5. C M Crainiceanu, D Ruppert, R J Carroll, A Joshi, and B Goodner. Spatially adaptive bayesian penalized splines with heteroscedastic errorsr. Journal of Computational and Graphical Statistics, 16:265–288, 2007.
6. D. B. Dahl. Model-based clustering for expression data via a dirichlet process mixture model. In K.-A. Do, P. Müller, and M. Vannucci, editors, Bayesian Inference for Gene Expression and Proteomics, pages 201–218. Cambridge University Press, 2006.
7. C. De Boor. A Practical Guide to Splines. Berlin: Springer-Verlag, 1978.
8. M B Eisen, P T Spellman, P O Brown, and D Botstein. Cluster analysis and display of genome-wide expression patterns. volume 95, pages 14863–14868, USA, 1998.
9. M Escobar. Estimating normal means with a dirichlet process prior. Journal of the American Statistical Association, 89:268–277, 1994.
10. T.S. Ferguson. A bayesian analysis of some nonparametric problems. Ann. Statist, 1:209–230, 1973.
11. C. Fraley and A.E. Raftery. Model-based clustering discriminant analysis and density estimation. J. Amer. Statist. Assoc., 97:611–631, 2002.
12. S J Gaffney and P Smyth. Curve clustering with random effects regression mixture. In L. K. Saul, Y. Weiss, L. Bottou, and Eds., editors, Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, pages 473–580. MIT Press, 2005.
13. Theo Gasser and Alois Kneip. Searching for structure in curve samples. Journal of the American Statistical Association, 90:1179–1188, 1995.
14. Seymour Geisser and William F Eddy. A predictive approach to model selection. Journal of the American Statistical Association, 74(365):153–160, 1979.
15. Daniel Gervini and Theo Gasser. Self-modelling warping functions. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 66(4):959–971, 2004.
16. L Hubert and P Arabie. Comparing partitions. Journal of Classification, 2:193–218, 1985.
17. V R Iyer, M B Eisen, and P O Brown. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83–87, 1999.
18. G M James and C A Sugar. Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc., 98:397–407, 2003.
19. Alois Kneip and Theo Gasser. Convergence and consistency results for self-modeling nonlinear regression. The Annals of Statistics, 16:82–112, 1988.
20. Stefan Lang and Andreas Brezger. Bayesian P-splines. Journal of Computational and Graphical Statistics, 13(1):183–212, 2004.
21. W. H. Lawton, E. A. Sylvestre, and M. S. Maggio. Self modeling nonlinear regression. Technometrics, 14:513–532, 1972.
22. X Liu and H Müller. Modes and clustering for time-warped gene expression profile data. Bioinformatics, 19:1937–1944, 2004.
23. X Liu and M C K Yang. Simultaneous curve registration and cliustering for fucntional data. Computational Statistics and Data Analysis, 53:1361–1376, 2009.
24. S. N. MacEachern and P. Müller. Estimating mixture of dirichlet process models. J. Comput. Graphic. Statist., 7(2):223–338, 1998.
25. M. Medvedovic and S. Sivaganesan. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatrics, 18:1194–1206, 2002.
26. L I Pettit. The conditional predictive ordinate for the normal distribution. Journal of the Royal Statistical Society. Series B, 52(1):175–184, 1990.
27. J Qian, M Dolled-Filhart, J Lin, H Yu, and M Gerstein. Beyond synexpression relationships: Local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. Journal of Molecular Biology, 314:1053–1066, 2001.
28. Z. Qin. Clustering microarray gene expression data using weighted chinese restaurant process. Bioinformatics, 22:1988–1997, 2006.
29. F. A. Quintana. A predictive view of bayesian clustering. J. Statist. Plan. Infer, 136:2407–2429, 2006.
30. J. O. Ramsay and Xiaochun Li. Curve registration. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 60:351–363, 1998.
31. J. O. Ramsay, R. D. Bock, and T. Gasser. Comparison of height acceleration curves in the fels, zurich and berkeley growth data. Annals of Human Biology, 22:413–426, 1995.
32. H. Sakoe and S. Chiba. Dynamic programming optimization for spoken word recognition. IEEE Transactions of Acoustic, Speech and Signal Processing, (1):43–49, 1978.
33. L Slaets, G Claeskens, and M Hubert. Phase and amplitude-based clustering for functional data. Computational Statistics and Data Analysis, 56:2360–2374, 2012.
34. R Tang and H Müller. Time-synchronized clsutering of gene expression trajectories. Biostatistics, 10:32–45, 2009.
35. D. Telesca and L.Y.T. Inoue. Bayesian hierarchical curve registration. Journal of the American Statistical Association, 103:328–339, 2008.
36. D Telesca, L Y T Inoue, M Neira, R Etzioni, M Gleave, and C Nelson. Differential expression and network inferences through functional data modeling. Biometrics, 65(3):793–804, 2009.
37. D Telesca, E Erosheva, D A Kreager, and R L Matsueda. Modeling criminal careers as departures from a unimodal age-crime curve. the case of marijuana use. Journal of the American Statistical Association, 107:1427–1440, 2012.
38. R. D. Tuddenham and M. M. Snyder. Physical growth of califorfornia boys and girls from birth to eighteen years. University of California Publications in Child Development, 1:183–364, 1954.
39. Kongming Wang and Theo Gasser. Alignment of curves by dynamic time warping. The Annals of Statistics, 25(3):1251–1276, 1997.
40. Kongming Wang and Theo Gasser. Synchronizing sample curves nonparametrically. The Annals of Statistics, 27(2):439–460, 1999.
41. W. Weber, B. Kramer, and M. Fussenegger. A genetic timedelay circuitry in mammalian cells. Biotechnology and Bioengeneering, 98, 2007.
42. M West. Hyperparameter estimation in dirichlet process mixture models. ISDS Discussion Paper 92-A03, 1992.
43. M West, P Müller, and D Escobar. Hierarchical priors and mixture models, with applications in regression and density estimation. In A F M Smith and P Freeman, editors, in Aspects of Uncertainty: A Tribute to D. V. Lindley, pages 363–386, New York, 1994. John Wiley.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters