A deconvolution approach to estimation of a common shape in a shifted curves model

A deconvolution approach to estimation of a common shape in a shifted curves model

[ [    [ [ Université de Toulouse Institut de Mathématiques de Toulouse et CNRS
Université de Toulouse
31062 Toulouse Cedex 9
France
\printeade1
E-mail: \printead*e2
\smonth4 \syear2009\smonth1 \syear2010
\smonth4 \syear2009\smonth1 \syear2010
\smonth4 \syear2009\smonth1 \syear2010
Abstract

This paper considers the problem of adaptive estimation of a mean pattern in a randomly shifted curve model. We show that this problem can be transformed into a linear inverse problem, where the density of the random shifts plays the role of a convolution operator. An adaptive estimator of the mean pattern, based on wavelet thresholding is proposed. We study its consistency for the quadratic risk as the number of observed curves tends to infinity, and this estimator is shown to achieve a near-minimax rate of convergence over a large class of Besov balls. This rate depends both on the smoothness of the common shape of the curves and on the decay of the Fourier coefficients of the density of the random shifts. Hence, this paper makes a connection between mean pattern estimation and the statistical analysis of linear inverse problems, which is a new point of view on curve registration and image warping problems. We also provide a new method to estimate the unknown random shifts between curves. Some numerical experiments are given to illustrate the performances of our approach and to compare them with another algorithm existing in the literature.

[
\kwd
\doi

10.1214/10-AOS800 \volume38 \issue4 2010 \firstpage2422 \lastpage2464 \newproclaimassAssumption

\runtitle

Deconvolution for shifted curves

{aug}

A]\fnmsJérémie \snmBigotlabel=e1]Jeremie.Bigot@math.univ-toulouse.fr\corref and A]\fnmsSébastien \snmGadatlabel=e2]Sebastien.Gadat@math.univ-toulouse.fr

\pdfauthor

Jeremie Bigot, Sebastien Gadat

class=AMS] \kwd[Primary ]62G08 \kwd[; secondary ]42C40. Mean pattern estimation \kwdcurve registration \kwdinverse problem \kwddeconvolution \kwdMeyer wavelets \kwdadaptive estimation \kwdBesov space \kwdminimax rate.

1 Introduction

1.1 Model and objectives

In many fields of interests including biology, medical imaging, or chemistry, observations are coming from individuals curves or graylevel images. Such observations are commonly referred to as functional data, and models involving such data have been recently extensively studied in statistics (see ramsil (), ramsil02 () for a detailed introduction to functional data analysis). In such settings, it is reasonable to assume that the data at hand , satisfy the following white noise regression model:

(1)

where is a subset of , are unknown regression functions, and are independent standard Brownian motions on with representing different levels of additive noise. In many situations, the individual curves or images have a certain common structure which may lead to the assumption that they are generated from some semi-parametric model of the form

(2)

where represents an unknown shape common to all the ’s. This shape function (also called mean pattern) may depend on unknown individual random parameters , belonging to a compact set of , which model individual variations. Such a semi-parametric representation for the ’s is the so-called self-modeling regression framework (SEMOR) introduced by Kneip and Gasser kg (). Shape invariant models (SIM) are a special class of such models for which (see, e.g., kg ())

(3)

where for any , the function is a smooth diffeomorphism of and is a known function. Models such as (3) are useful to account for shape variability in time between curves (see, e.g., gerga (), liumuller ()) or in space between images, which is the well-known problem of curve registration or image warping (see glasbey () and the discussion therein for an overview, BGL (), BGV () and references therein). SIM models (3) also represent a large class of statistical models to study the difficult problem of recovering a mean pattern from a set of similar curves or images in the presence of random deformations and additive noise, which corresponds to the general setting of Grenander’s theory of shapes Gre (). The overall objective of this paper is to discuss the fundamental problem of estimating of the mean pattern which can then be used to learn nonlinear modes of variations in time or shape between similar curves or images.

1.2 Previous work on mean pattern estimation

Very few results exist in the literature on nonparametric estimation of for SIM models (3) based on noisy data from (1). The problem of estimating the common shape of a set of curves that differ only by a time transformation is usually referred to as the curve registration problem in statistics, and it has received a lot of attention over the last two decades; see, for example, big (), gaskneip (), kneipgas (), liumuller (), ramli (), wanggas (). However, in these papers, an asymptotic study as the number of curves grows to infinity is generally not considered. Estimation of the shape function for SEMOR models related to (1) and (2) is studied in kg () with a double asymptotic in the number of curves and the number of observed time points per curve. In the simplest case of shifted curves, various approaches have been developed. Based on a model with a fixed number of curves, semiparametric estimation of the deformation parameters and nonparametric estimation of the shape function is proposed in mazaloubgam () and vimond (). A generalization of this approach for the estimation of scaling, rotation and translation parameters for two-dimensional images is proposed in BGV (). Estimation of a common shape for randomly shifted curves and asymptotic in is also considered in ronn (). There is also a huge literature in image analysis on mean pattern estimation, and some papers have recently addressed the problem of estimating the common shape of a set of similar images with asymptotic in the number of images; see, for example, AAT (), BGL (), ma () and references therein. However, in all the above cited papers rates of convergence and optimality of the proposed estimators for have not been studied.

1.3 A benchmark model for nonparametric estimation of a mean pattern

The simplest SIM model is the case of randomly shifted curves, namely

that has recently received some attention in the statistical literature loubcast (), mazaloubgam (), ronn (), vimond (). In this paper, it will thus be assumed that we observe realizations of noisy and randomly shifted curves coming from the following Gaussian white noise model:

(4)

where is the unknown mean pattern of the curves, are independent standard Brownian motions on , the ’s represent levels of noise which may vary from curve to curve, and the ’s are unknown random shifts independent of the ’s. The aim of this paper is to study some statistical aspects related to the problem of estimating , and to propose new methods of estimation.

Model (4) is realistic in many situations where it is reasonable to assume that the observed curves represent replications of almost the same process and when a large source of variation in the experiments is due to transformations of the time axis. Such a model is commonly used in many applied areas dealing with functional data such as neuroscience IRT () or biology ronn (). More generally, the model (4) represents a kind of benchmark model for studying the problem of recovering the mean pattern in SIM models. The results derived in this paper show that the model (4), although simple, already provides some new insights on the statistical aspects of mean pattern estimation.

The function is assumed to be periodic with period , and the shifts are supposed to be independent and identically distributed (i.i.d.) random variables with density with respect to the Lebesgue measure on . Our goal is to estimate nonparametrically the shape function on as the number of curves goes to infinity.

Let be the space of squared integrable functions on with respect to , and denote by the squared norm of a function . Assume that represents some smoothness class of functions (e.g., a Sobolev or a Besov ball), and let be some estimator of the common shape , that is, a measurable function of the random processes . For some , the risk of the estimator is defined to be

where the above expectation is taken with respect to the law of . In this paper, we propose to investigate the optimality of an estimator by introducing the following minimax risk

where the above infimum is taken over the set of all possible estimators in model (4). One of the main contributions of this paper is to derive asymptotic lower and upper bounds for which, to the best of our knowledge, has not been considered before.

Indeed, we show that there exist constants , a sequence of reals tending to infinity, and an estimator such that

However, the construction of may depend on unknown quantities such as the smoothness of , and such estimates are therefore called nonadaptive. Since it is now recognized that wavelet decomposition is a powerful tool to derive adaptive estimators (see, e.g., DJKP95jrssb ()), a second contribution of this paper is thus to propose wavelet-based estimators that attain a near-minimax rate of convergence in the sense there exits a constant such that

1.4 Main result

Minimax risks will be derived under particular smoothness assumptions on the density . The main result of this paper is that the difficulty of estimating is quantified by the decay to zero of the Fourier coefficients of the density of the shifts defined as

(5)

for . Depending how fast these Fourier coefficients tend to zero as , the reconstruction of will be more or less accurate. This comes from the fact that the expected value of each observed process is given by

This expected value is thus the convolution of by the density which makes the problem of estimating an inverse problem whose degree of ill-posedness and associated minimax risk depend on the smoothness assumptions on .

This phenomenon is a well-known fact in deconvolution problems (see, e.g., JKPR04jrssb (), sapatinaspensky (), PV99aos ()), and more generally for linear inverse problems as studied in cavgopitsy (). In this paper, the following type of assumption on is considered. {ass} The Fourier coefficients of have a polynomial decay, that is, for some real , there exist two constants such that for all .

In standard inverse problems, such as deconvolution, the optimal rate of convergence we can expect from an arbitrary estimator typically depends on such smoothness assumptions. The parameter is usually referred to as the degree of ill-posedness of the inverse problem, and it quantifies the difficult of inverting the convolution operator. The following theorem shows that a similar phenomenon holds for the minimax risk associated to model (4). Note that to simplify the presentation, all the theoretical results are given for the simple setting where the level of noise is the same for all curves, that is, for all and some . Finally, one also needs the following assumption on the decay of the density . {ass} There exists a constant and a real such that the density satisfies .

Note that Assumption 1.4 is not a very restrictive condition as is supposed to be an integrable function on .

Theorem 1

Suppose that the smoothness class is a Besov ball of radius with and smoothness parameter (a precise definition of Besov spaces will be given later on). Suppose that satisfies Assumptions 1.4 and 1.4. Let and assume that . If , then

Hence, Theorem 1 shows that under Assumption 1.4 the minimax rate is of polynomial order of the sample size , and that this rate deteriorates as the degree of ill-posedness increases. Such a behavior is well known for standard periodic deconvolution in the white noise model JKPR04jrssb (), sapatinaspensky (), and Theorem 1 shows that a similar phenomenon holds for the model (4). To the best of our knowledge, this is a new result which makes a connection between mean pattern estimation and the statistical analysis of deconvolution problems.

1.5 Fourier analysis and an inverse problem formulation

Let us first remark that the model (4) exhibits some similarities with periodic deconvolution in the white noise model as described in JKPR04jrssb (). For , let us define the following density function:

(6)

Note that exists for all provided has a sufficiently fast decay at infinity; see Assumption 1.4. Since is periodic with period 1, one has

and note that . Hence, if one defines and , then taking the mean of the equations in (4) yields the model

with

(8)

and where is a standard Brownian motion .

The model (1.5) differs from the periodic deconvolution model investigated in JKPR04jrssb () by the error term . Asymptotically, is a Gaussian variable, so this suggests to use the wavelet thresholding procedures developed in JKPR04jrssb () to derive upper bounds for the minimax risk. However, it should be noted that the additive error term significantly complicates the estimating procedure as the variance of clearly depends on the unknown function . Moreover, deriving lower bounds for the minimax risk in models such as (1.5) is significantly more difficult than in the standard white noise model without the additive term .

Now let us formulate models (4) and (1.5) in the Fourier domain. Supposing that , we denote by its Fourier coefficients for , namely . The model (4) can then be rewritten as

with , and where are i.i.d. variables, that is, complex Gaussian variables with zero mean and such that .

Thus, we can compute the sample mean of the th Fourier coefficient over the curves as

(10)

with , and where the ’s are i.i.d. complex Gaussian variables with zero mean and such that . The average Fourier coefficients in equation (10) can thus be viewed as a set of observations which is very close to a sequence space formulation of a statistical inverse problem as described, for example, in cavgopitsy (). As in model (1.5), the additive error term is asymptotically Gaussian, however its variance is which is obviously unknown as it depends on .

If we assume that the density of the random shifts is known, one can perform a deconvolution step by taking

(11)

to estimate the Fourier coefficients of since, for large , is close to by the strong law of large numbers.

Based on the ’s, two types of estimators are studied. The simplest one uses spectral cut-off with a cutting frequency depending on the smoothness assumptions on , and is thus nonadaptive. The second estimator is based on wavelet thresholding and is shown to be adaptive using the procedure developed in JKPR04jrssb (). Note that part of our results are presented for the case where the coefficients are known. Such a framework is commonly used in nonparametric regression and inverse problems to obtain consistency results and to study asymptotic rates of convergence, where it is generally supposed that the law of the additive error is Gaussian with zero mean and known variance ; see, for example, JKPR04jrssb (), sapatinaspensky (), cavgopitsy (). In model (4), the random shifts may be viewed as a second source of noise and for the theoretical analysis of this problem the law of this other random noise is also supposed to be known.

1.6 An inverse problem with unknown operator

If the density is unknown, one can view the problem of estimating in model (4) as a deconvolution problem with unknown eigenvalues which complicates significantly the estimation procedure. Such a framework corresponds to the general setting of an inverse problem with a partially unknown operator. Recently, some papers have addressed this problem; see, for example, cavraim (), EfromovichK01 (), hoffreiss (), neu97 (), assuming that an independent sample of noisy eigenvalues or noisy operator is available which allows an estimation of the ’s. However, such an assumption is not applicable to our model (4). Therefore, we introduce a new method for estimating is the case of an unknown density which leads to a new class of estimators to recover a mean pattern.

1.7 Organization of the paper

In Section 2, we consider a linear but nonadaptive estimator based on spectral cut-off. In Section 3, a nonlinear and adaptive estimator based on wavelet thresholding is studied in the case of known density , and upper bound for the minimax risk are studied over Besov balls. In Section 4, we derive lower bounds for the minimax risk. In Section 5, it is explained how one can estimate the mean pattern when the density is unknown. Finally, in Section 6, some numerical examples are proposed to illustrate the performances of our approach and to compare them with another algorithm proposed in the literature. All proofs are deferred to a technical Appendix at the end of the paper.

2 Linear estimation of the common shape and upper bounds for the risk for Sobolev balls

2.1 Risk decomposition

For , a linear estimator of the ’s is given by , where is a sequence of nonrandom weights called a filter. An estimator of is then obtained via the inverse Fourier transform , and thanks to the Parseval’s relation, the risk of this estimator is given by . The problem is then to choose the sequence in an optimal way. The following proposition gives the bias-variance decomposition of .

Proposition 1

For any given nonrandom filter , the risk of the estimator can be decomposed as

(12)

Note that the decomposition (12) does not correspond exactly to the classical bias-variance decomposition for linear inverse problems. Indeed, the variance term in (12) differs from the classical expression of the variance for linear estimator in statistical inverse problems which would be in our notation . Hence, contrary to classical inverse problems, the variance term of the risk depends also on the Fourier coefficients of the unknown function to recover.

2.2 Linear estimation

Let us introduce the following smoothness class of functions which can be identified with a periodic Sobolev ball:

for some constant and some smoothness parameter , where . Now consider a linear estimator obtained by spectral cut-off, that is, for a projection filter of the form for some integer . For an appropriate choice of , the following proposition gives the asymptotic behavior of the risk .

Proposition 2

Assume that belongs to for some real and , and that satisfies Assumption 1.4, that is, polynomial decay of the ’s. Then, if is chosen such that , then there exists a constant not depending on such that as

The above choice for depends on the smoothness of the function which is generally unknown in practice and such a spectral cut-off estimator is thus called nonadaptive. Moreover, the result is only suited for smooth functions since Sobolev balls for are not suited to model shape functions which may have singularities such as points of discontinuity.

3 Nonlinear estimation with Meyer wavelets and upper bounds for the risk for Besov balls

Wavelets have been successfully used for various inverse problems donoho (), and for the specific case of deconvolution Meyer wavelets, a special class of band-limited functions introduced in M92 (), have recently received special attention in nonparametric regression: see JKPR04jrssb () and sapatinaspensky ().

3.1 Wavelet decomposition and the periodized Meyer wavelet basis

This wavelet basis is derived through the periodization of the Meyer wavelet basis of (see JKPR04jrssb () for further details on its construction). Denote by and the Meyer scaling and wavelet functions at scale and location . For any function of , its wavelet decomposition can be written as: , where , and denotes the usual coarse level of resolution. Moreover, the squared norm of is given by . It is well known that Besov spaces can be characterized in terms of wavelet coefficients (see, e.g., JKPR04jrssb ()). Let denote the usual smoothness parameter, then for the Meyer wavelet basis and for a Besov ball of radius with , one has that with the respective above sums replaced by maximum if or .

Meyer wavelets can be used to efficiently compute the coefficients and by using the Fourier transform. Indeed, thanks to the Plancherel’s identity, one obtains that

(13)

where denote the Fourier coefficients of and . As Meyer wavelets are band-limited, is a finite subset set of with (see JKPR04jrssb ()), and fast algorithms for computing the above sum have been proposed in K94 () and rast07 (). The coefficients can be computed analogously with instead of and instead of .

Hence, the noisy Fourier coefficients given by (11) can be used to quickly compute the following empirical wavelet coefficients of as

(14)

3.2 Nonlinear estimation via hard-thresholding

It is well known that adaptivity can be obtained by using nonlinear estimators based on appropriate thresholding of the estimated wavelet coefficients (see, e.g., DJKP95jrssb ()). A nonlinear estimator by hard-thresholding is defined by

(15)

where the ’s are appropriate thresholds (positive numbers), and is the finest resolution level used for the estimator. As shown by JKPR04jrssb (), for periodic deconvolution the choice for and the thresholds typically depends on the degree of ill-posedness of the problem. Following Theorem 1 in JKPR04jrssb (), to derive rate of convergence for one has to control moments of order and of and the probability of deviation of from .

Proposition 3

Let , , and . Assume that satisfies Assumptions 1.4 and 1.4. Then there exist positive constants and such that for any , and all , and .

Proposition 3 shows that the variance of the empirical wavelet coefficients is proportional to which comes from the amplification of the noise by the inversion of the convolution operator. The choice of the threshold is done by controlling the probability of deviation of the empirical wavelet coefficients from the true wavelet coefficient which is given by the following proposition.

Proposition 4

Let , and . Suppose that satisfies Assumption 1.4. Let . For and define the following threshold:

(16)

with . Then, for all sufficiently large ,

Note that the level-dependent threshold (16) corresponds to the usual universal thresholds for deconvolution problem based on wavelet decomposition as in JKPR04jrssb (). Then, using the thresholds , we finally arrive at the following theorem which gives an upper bound for the minimax risk over a large class of Besov balls.

Theorem 2

Assume that satisfies Assumptions 1.4 and 1.4. Let and be the largest integers such that and . Let be the nonlinear estimator obtained by hard-thresholding with the above choice for and , and using the thresholds defined by (16) with . Let , and . Let , , and assume that .

If , then

If , then

In standard periodic deconvolution in the white noise model (see, e.g., JKPR04jrssb ()), there exist two different upper bounds for the minimax rate which are usually referred to as the dense case [] when the hardest functions to estimate are spread uniformly over , and the sparse case [] when the worst functions to estimate have only one nonvanishing wavelet coefficient. Theorem 2 shows that a similar phenomenon holds for the model (4), and to the best of our knowledge, this is a new result.

4 Minimax lower bound

The following theorem gives an asymptotic lower bound on the minimax risk for a large class of Besov balls.

Theorem 3

Let , and . Suppose that satisfies Assumption 1.4. Let . Assume that and .

If and (dense case), there exits a constant depending only on such that

In the dense case, the hardest functions to estimate are spread uniformly over the interval , and the proof is based on an adaptation of Assouad’s cube technique (see, e.g., Lemma 10.2 in HKPT ()) to the specific setting of model (4). Lower bounds for minimax risk are classically derived by controlling the probability for the likelihood ratio (in the statistical model of interested) of being strictly greater than some constant uniformly over an appropriate set of test functions. To derive Theorem 3, we show that one needs to control the expectation over the random shifts of the likelihood ratio associated to model (4), and not only the likelihood ratio itself. Hence, the proof of Theorem 3 is not a direct and straightforward adaptation of Assouad’s cube technique or Lemma 10.1 in HKPT () as used classically in a standard white noise model to derive minimax risk in nonparametric deconvolution in the dense case. For more details, we refer to the proof of Theorem 3 in the Appendix.

Deriving minimax risk in the dense case for the model (4) is rather difficult and the proof is quite long and technical. In the sparse case, finding lower bounds for the minimax rate is also a difficult task. We believe that this could be done by adapting to model (4) a result by korotsy () which yields a lower bound for a specific problem of distinguishing between a finite number of hypotheses (see Lemma 10.1 in HKPT ()). However, this is far beyond the scope of this paper and we leave this problem open for future wok.

5 Estimating when the density is unknown

Obviously, assuming that the density of the shifts is known is not very realistic in practice. However, estimating when the density is unknown falls into the setting of inverse problems with an unknown operator which is a difficult problem. Recently, some papers cavraim (), EfromovichK01 (), hoffreiss (), neu97 () have considered nonparametric estimator for inverse problem with a partially unknown operator, by assuming that an independent sample of noisy eigenvalues is available which allows to build an estimator of the ’s. In the settings of these papers, the distribution of the noisy eigenvalues sample is supposed to be known (typically Gaussian). However, in model (4), such assumptions are not realistic, and therefore a data-based estimator of has to be found. For this purpose, we propose to make a connection between mean pattern estimation in model (4) and well-known results on Frechet mean estimation for manifold-valued variables; see, for example, batach1 (), batach2 ().

5.1 Frechet mean for functional data

Suppose that denote i.i.d. random variables taking their values in a vector space . As is a linear space (with addition well defined), an estimator of a mean pattern for the ’s is given by the usual linear average . However, in many applications, some geometric and statistical considerations may lead to the assumption that two vectors in are considered to be the same if they are equal up to certain transformations which are represented by the action of some group on the space . A well-known example (see batach1 (), batach2 () and references therein) is the case where , the space of points in the plane , and is generated by composition of scaling, rotations and translations of the plane, namely

for , with . In this setting, two vectors represent the same shape if

which leads to Kendall’s shape space consisting of the equivalent classes of shapes in under the action of scaling, rotations and translations (see, e.g., batach1 (), batach2 () and references therein). Since the space is a nonlinear manifold, the usual linear average does not fall into due to the fact that the Euclidean distance is not meaningful to represent shape variations. A better notion of empirical mean of shapes in is given by (see, e.g., batach1 ()): . More generally, Frechet frech () has extended the notion of averaging to general metric spaces via mean squared error minimization in the following way: if are i.i.d. random variables in a general metric space , with a distance , then the Frechet mean of a collection of such data points is defined as the minimizer (not necessarily unique) of the sum-of-squared distances to each of the data points, that is

Now let us return to the randomly shifted curve model (4). Define as the translation group acting on periodic functions with period 1 by

Let be functions (possibly random) in . Following the definition of Frechet mean, a notion of average for functional data taking into account the action of the translation group would be

If the ’s are noisy curves generated from the randomly shifted curve model (4), a presmoothing step of the observed curves seems natural to compute a consistent Frechet mean estimate. In the case of the translation group, this smoothing step and the definition of Frechet mean can be expressed in the Fourier domain as

(17)

where and is some frequency cut-off parameter whose choice will be discussed later. A smoothed Frechet mean is then given by . Then define the following criterion for :

(18)

and remark that the computation of can be made in two steps since it can be checked that , where

(19)

Therefore, computing the Frechet mean of the smoothed curves requires minimization of the above criteria which automatically yields estimators of the random shifts in model (4). This allows us to construct an estimator of the common shape by in the case of an unknown density , and the estimates of the random shifts can be used to estimate the density itself and the eigenvalues . The goal of this section is thus to study some statistical properties of such a two-step procedure which, to the best of our knowledge, has not been considered before in the setting of model (4) and in connection with Frechet mean for functional data. Moreover, it will be shown in our numerical experiments that the criterion (19) can be minimized using a gradient-descent algorithm which leads to a new and fast method for estimating in the case of an unknown density .

5.2 Upper bound for the estimation of the shifts

Recall that our model (4) in the Fourier domain is

(20)

where are i.i.d. variables, the random shifts , are i.i.d. variables with density , and . Model (20) is clearly nonidentifiable, as for any , one can replace the ’s by and the ’s by without changing the formulation of the model. Let us thus introduce the following identifiability conditions. {ass} The density has a compact support included in the interval and has zero mean, that is, is such that . {ass} The unknown shape function is such that .

Let . Using the identifiability condition given by Assumption 5.2, it is natural to define estimators of the true shifts as

that is, by considering the estimators that minimize the empirical criterion on the constrained set . Then the following theorem holds.

Theorem 4

Suppose that Assumptions 5.2 and 5.2 hold. Then for any

(21)

with ,, where is a positive constant depending only on the shape function and the frequency cut-off parameter ,

and

Theorem 4 provides an upper bound (in probability) for the consistency of the estimators of the true random shifts , using the standard squared distance. Note that since the minimum of