# The limit distribution of the -error of Grenander-type estimators

###### Abstract

Let be a nonincreasing function defined on . Under standard regularity conditions, we derive the asymptotic distribution of the supremum norm of the difference between and its Grenander-type estimator on sub-intervals of . The rate of convergence is found to be of order and the limiting distribution to be Gumbel.

10.1214/12-AOS1015 \volume40 \issue3 2012 \firstpage1578 \lastpage1608 \newproclaimremarkRemark[section]

The -error of Grenander-type estimators

a]\fnmsCécile \snmDurotlabel=e1]cecile.durot@gmail.com, b]\fnmsVladimir N. \snmKulikovlabel=e2]vladimir.kulikov@asr.nl and c]\fnmsHendrik P. \snmLopuhaä\correflabel=e3]h.p.lopuhaa@tudelft.nl

class=AMS] \kwd[Primary ]62E20 \kwd62G20 \kwd[; secondary ]62G05 \kwd62G07.

Supremum distance \kwdextremal limit theorem \kwdleast concave majorant \kwdmonotone density \kwdmonotone regression \kwdmonotone failure rate.

## 1 Introduction

After the derivation of the nonparametric maximum likelihood estimator (NPMLE) of a monotone density and a monotone failure rate by Grenander grenander1956 (), and the least squares estimator of a monotone regression function by Brunk brunk1958 (), it has taken some time before the distribution theory for such estimators entered the literature. The limiting distribution of the NPMLE of a decreasing density on at a fixed point in the interior of the support, has been established by Prakasa Rao prakasarao1969 (). Similar results were obtained for the NPMLE of a monotone failure rate in prakasarao1970 () and for an estimator of a monotone regression function in brunk1970 (). Woodroofe and Sun woodroofe-sun1993 () showed that the NPMLE of a decreasing density is inconsistent at zero. The behavior at the boundary has been further investigated in kulikov-lopuhaa2006 (), balabdaouietal (). Smooth estimation has been studied in mammen1991 (), for monotone regression curves, and in vdvaart-vdlaan2003 () for monotone densities; see also eggermont-lariccia2000 () and anevski-hossjer2006 (). The limit distribution of the NPMLE of a decreasing function in the Gaussian white noise model was obtained in zilberburg2007 (). Related likelihood ratio based techniques have been investigated in banerjeewellner2001 (), pal2009 ().

Groeneboom groeneboom1985 () reproved the result in prakasarao1969 () by introducing a new approach based on inverses. This approach has become a cornerstone in deriving pointwise asymptotics of several shape constrained nonparametric estimators, for example, for the distribution function of interval censored observations (see groeneboom-wellner1992 ()) or for estimators of a monotone density and a monotone hazard under random censoring (see huang-wellner1995 ()); see also huang-zhang1994 () for the limiting distribution of the NPMLE of a monotone density under random censoring and lopuhaanane2011 () for similar results on isotonic estimators for a monotone baseline hazard in Cox proportional hazards model. The limit distribution of these estimators involves an argmax process connected with two-sided Brownian motion with a parabolic drift. This process has been studied extensively in groeneboom1989 (), where it is also claimed that the approach based on inverses should be sufficiently general to deal with global measures of deviation, such as the -distance or the supremum distance between the estimator and the monotone function of interest. Indeed, the limiting distribution of the -distance between a decreasing density and its NPMLE was obtained in groeneboom-hooghiemstra-lopuhaa1999 (), and a similar result can be found in durot2002 () in the monotone regression setup. These results were extended to general -distances in kulikov-lopuhaa2005 () and durot2007 (). In durot2007 (), the limiting distribution of -distances is obtained in a very general framework that includes, among others, the monotone density case, monotone regression and monotone failure rate.

Little to nothing is known about the behavior of the supremum distance. In jonker-vdvaart2001 (), the rate of the supremum distance is established in a semi-parametric model for censored observations, and it is suggested that the same rate should hold in the monotone density case. In hooghiemstra-lopuhaa1998 () an extremal limit theorem has been obtained for suprema of the process over increasing intervals. However, a long-standing open problem remains, although this problem has important statistical applications: what is the limiting distribution of the supremum distance between a monotone function and its isotonic estimator? Indeed, while pointwise confidence intervals for a decreasing density, a monotone regression function or a monotone hazard are available using the limiting distribution of the isotonic estimator at the fixed point, nonparametric confidence bands have remained a formidable challenge; they could be built if the limiting distribution of the supremum distance between a monotone function and its isotonic estimator were known. It is the purpose of this paper to settle this question in the same general framework as considered in durot2007 (). The precise construction of a nonparametric confidence band requires additional technicalities that are beyond the scope of the present paper. It is only briefly discussed here, and details are deferred to a separate paper.

We consider Grenander type estimators for decreasing functions with compact support, say . These are estimators that are defined as the left-hand slope of the least concave majorant of an estimator for the primitive of . This setup includes Grenander’s grenander1956 () estimator of a monotone density, Brunk’s brunk1958 () estimator for a monotone regression function, as well as the estimator for a monotone failure rate under random censoring, considered in huang-wellner1995 (). We obtain the rate of convergence for the supremum of over subintervals of . The rate is shown to be of the order , even on subintervals that grow toward , as long as one stays away sufficiently far from the boundaries, so that the inconsistency at the boundaries (see, e.g., woodroofe-sun1993 ()) is not going to dominate the supremum. The rate that we obtain coincides with the one suggested in jonker-vdvaart2001 () for Grenander’s grenander1956 () estimator for a decreasing density, but it is now proven rigorously in a more general setting under optimal conditions on the boundaries of the intervals over which is taken. Moreover, we show that the rate is sharp. Our main result is Theorem 2.2, in which we show that a suitably standardized supremum of converges in distribution to a standard Gumbel random variable.

Our results are obtained following the same sort of approach as that used in groeneboom1985 (), huang-wellner1995 (), groeneboom-hooghiemstra-lopuhaa1999 (), durot2002 (), durot2007 (), among others. We first establish corresponding results for the supremum of the inverses of and , and then transfer them to the supremum of and themselves. A major difference with deriving asymptotics of -distances is, that in these cases one can benefit from the linearity of the integral and handle several approximations pointwise with Markov’s inequality. This is no longer possible with suprema. With suprema, to transfer results for inverses to results for , a key ingredient is a precise uniform bound on the spacings between consecutive jump points of .

The paper is organized as follows. In Section 2, we list the assumptions under which our results can be obtained and state our main results concerning the rate of convergence and the limiting distribution of . We also briefly discuss the construction of confidence bands. We formulate corresponding results for the supremum distance between the inverses of and in Section 3. This is the heart of the proof, which is carried out in Section 4. Finally, in Section 5, we provide a uniform bound on the spacings between consecutive jump points of and then transfer the results obtained in Section 3 for the inverses of and to the supremum distance between the functions themselves.

To limit the length of the paper, the rigorous proofs of several preliminary results needed for the proofs in Sections 4 and 5 have been put in a supplement durotkulikovlopuhaa2012 ().

## 2 Assumptions and main results

Based on independent observations, we aim at estimating a function subject to the constraint that it is nonincreasing. Assume we have at hand a cadlag (right continuous with finite left-hand limits at every point) stepwise estimator of

with finitely many jump points. In the case of i.i.d. observations with a common density function , a typical example is the empirical distribution function with discontinuity points located at the observations. In the following, we shall consider the monotone estimator of as defined in durot2007 (), that is, the estimator is the left-hand slope of the least concave majorant of with

As detailed in Section 2.1 below, this definition generalizes well-known monotone estimators, such as the Grenander estimator of a nonincreasing density, or the least-squares estimator of a monotone regression function. It should be noted that is nonincreasing, left-continuous and piecewise constant. We are interested in the limiting behavior of the supremum distance between the monotone estimator and the function .

### 2.1 Uniform rate of convergence

We first show that the rate of convergence of to in terms of the supremum distance is of order . To this end, we make the following assumptions. Unless stated otherwise, for a function defined on , we write . {longlist}[(A3)]

The function is decreasing and differentiable on with

Let be either a Brownian bridge or a Brownian motion. There exist , , and versions of and such that

for all . Moreover, is increasing and differentiable on with and .

There exists such that for all and ,

These conditions are similar to the ones used in durot2007 (). Assumption (A1) is completely the same as the one in durot2007 (). Assumption (A2) is similar to (A4) in durot2007 (), but now we only require and bounds on the first derivative of . Here we can relax the condition on , because in the current situation the error terms have to be of smaller order than instead of in durot2007 (). The existence of , as imposed in (A4) in durot2007 (), is not needed to establish Theorem 2.1. Finally, assumption (A3) is equal to (A2) in durot2007 (). Assumption (A2) in durot2007 () is no longer needed, since we are able to obtain sufficient bounds on particular tail probabilities with our current assumptions (A1)–(A2). See Lemma 6.4 and also the proof of Lemma 6.10 in durotkulikovlopuhaa2012 ().

A typical example that falls into the above framework is the problem of estimating a nonincreasing density on . Assume we observe i.i.d. random variables with common nonincreasing density function , and let be the corresponding empirical distribution function. In this case, the monotone estimator of coincides with the Grenander estimator. Assumption (A1) is equal to the ones in groeneboom-hooghiemstra-lopuhaa1999 (), kulikov-lopuhaa2005 (), durot2007 (), and is standard when studying -distances between and . The existence of a second derivative of is not needed to obtain Theorem 2.1. In the monotone density model, assumption (A2) is satisfied for all , with being the distribution function corresponding to and a Brownian bridge, due to the Hungarian embedding of komlosmajortusnady1975 (). From Theorem 6 in durot2007 () it follows that assumption (A3) holds in the monotone density model. Another example that falls into the above framework is the problem of estimating a monotone regression function. Assume for instance that we observe , , where the ’s are i.i.d. centered random variables with a finite variance , and is nonincreasing. Let be the partial sum process given by

In this case, the monotone estimator of coincides with the Brunk estimator. Assumption (A1) is equal to the ones in durot2002 (), durot2007 () and is standard when studying -distances in this model. Assumption (A2) is satisfied for all such that with and a Brownian motion, due to embedding of sak1985 (). Thus, (A2) is satisfied in the above regression model provided . From Theorem 5 in durot2007 () it follows that assumption (A3) holds in the above regression model. Other examples of statistical models that fall in the above framework, with corresponding and , are discussed in durot2007 ().

The uniform rate of convergence of to for general Grenander-type estimators is given in the following theorem.

###### Theorem 2.1

Assume (A1), (A2) and (A3). Let and be sequences of positive numbers such that

(1) |

for some that do not depend on . Then,

The rate in Theorem 2.1 coincides with the one found for the maximum likelihood estimator in a semi-parametric model for censored data by Jonker and van der Vaart jonker-vdvaart2001 (), who suggest that this rate should also hold for Grenander’s grenander1956 () estimator for a decreasing density. They consider and constant, which is a slightly stronger assumption than the one in Theorem 2.1. Note that condition (1) in Theorem 2.1 is sharp. If , for some , then converges in distribution, according to Theorem 3.1(i) in kulikov-lopuhaa2006 (), so that

In fact, for sequences such that , it can be shown similarly that converges in distribution, which would yield .

### 2.2 Limiting distribution

Whereas the previous theorem only provides a bound on the rate of convergence, it is nevertheless crucial for deriving the actual asymptotics of the supremum norm of on suitable intervals. For this purpose, we need an additional Hölder assumption on and . {longlist}[(A4)]

The function in (A2) is twice differentiable and there exist and such that for all

(2) |

The condition on in assumption (A4) is a bit stronger than the one in durot2007 (). This is needed to guarantee that the difference between the values of at and its nearest point of jump of is negligible. The condition on in assumption (A4) is the same as (4) in durot2007 (), who already observed that the existence of , as assumed in groeneboom-hooghiemstra-lopuhaa1999 (), kulikov-lopuhaa2005 (), is no longer needed. Note that in the monotone density model , in which case (A4) reduces to a Hölder condition on only. In the monotone regression model, is linear so that (A4) again reduces to a Hölder condition on only.

In order to formulate the limit distribution, we need the following definition:

(3) |

where is a standard two-sided Brownian motion on originating from zero, and argmax denotes the greatest location of the maximum. For fixed , properly scaled versions of converge in distribution to the random variable (see, e.g., prakasarao1969 () or groeneboom1985 ()). Moreover, serves as the limit process for properly scaled versions of (see, e.g., Theorem 3.2 in groeneboom-hooghiemstra-lopuhaa1999 ()), where and are the inverse functions of and respectively, as defined in Section 3 below. Properties of the process can be found in groeneboom1989 (); for example, the process is a stationary process. According to Corollary 3.4 in groeneboom1989 (), the tails of the density of satisfy the following expansion:

(4) |

as , where and are positive constants.

We now present the main result of this paper. It states that the limit distribution of the supremum distance between and , if properly normalized, is Gumbel. By we mean , as .

###### Theorem 2.2

Assume that (A1), (A2), (A3) and (A4) hold. Consider fixed. Then, for any sequence of real numbers and both satisfying

(5) |

we have that for any ,

as , where

(6) |

with

and and taken from (4).

### 2.3 Confidence bands

Our main motivation for proving Theorem 2.2 is to build confidence bands for a monotone function . Indeed, this theorem ensures that for any , with probability tending to , we have

simultaneously for all Combining this with either plug-in estimators of and or bootstrap methods would provide a confidence band for , at the price of additional technicalities. Indeed, the use of plug-in estimators for the derivatives and may lead to inaccurate intervals for small sample sizes , so that bootstrap methods should be preferable. But it is known that the standard bootstrap typically does not work for Grenander-type estimators; see kos2008 (), banerjee-sen-woodroofe2010 (). Thus, we shall use a smoothed bootstrap, which will raise the question of the choice of the smoothing parameter. In view of all this, we believe that the precise construction of a confidence band is beyond the scope of the present paper and is deferred to a separate paper.

Note that the conditions of Theorem 2.2 do not cover the supremum distance over the whole interval . However, this is to be expected. For instance, consider the monotone density model. This model is one of the examples that is covered by our general setup (see Section 2.1) and it is well known that the Grenander estimator in this model is inconsistent at 0 and 1 (e.g., see woodroofe-sun1993 ()). Therefore, a distributional result can only be expected if the supremum is taken over subintervals of that do not include 0 and 1. Let us notice, however, that we can obtain a confidence band for on any sub-interval with fixed (by considering ), and that the largest interval on which our result allows to build a confidence band is , where and similarly, . In order to obtain a confidence band on the whole interval , we would have to slightly modify the Grenander-type estimator in order to make it consistent near the boundaries. For instance, we conjecture that, if we consider either the modified estimator in kulikov-lopuhaa2006 () or the penalized estimator in woodroofe-sun1993 () instead of , then the limit distribution of the supremum distance between this modified estimator and over the whole interval is the same as the limit distribution of the supremum distance between and over the largest interval allowed in Theorem 2.2. Thus, such modified estimators would provide a confidence band for over the whole interval . As mentioned above, the precise construction of confidence bands is deferred to a separate paper, and we will do similarly with the precise study of modified estimators at the boundaries.

## 3 The inverse process

To establish Theorems 2.1 and 2.2, we use the same approach as in groeneboom1985 (), groeneboom-hooghiemstra-lopuhaa1999 (), durot2002 (), durot2007 (). We first obtain analogous results (i.e., rate of convergence and limit distribution) for the supremum between the inverses of and , and then transfer them to the supremum between the functions and themselves. Let be the upper version of defined as follows: and for every ,

Let denote the (generalized) inverse of , defined for by , with the convention that the supremum of an empty set is zero. This is illustrated in Figure 1 below. From Figure 1, it can be seen that the value maximizes , so that

(7) |

The advantage of characterizing the inverse process by (7), is that in this way, it is more tractable than the estimator itself, as being the argmax of a relatively simple process. It is the purpose of this section to establish results analogous to Theorems 2.1 and 2.2 for the inverse process.

Let denote the (generalized) inverse function of . In Theorems 3.1 and 3.2, we give an upper bound for the rate of convergence of to , and an extremal limit result for the supremum distance between and . We derive the limit distribution of the supremum distance between and in Corollary 3.1.

###### Theorem 3.1

Assume that (A1) and (A2) hold. Then

###### Theorem 3.2

Assume that (A1), (A2) and (A4) hold, and define for the normalizing function

(8) |

Let fixed, and let and be sequences such that , and for sufficiently large. Define

(9) |

Then

(10) |

for any sequence such that in such a way that , where denotes the density of , as defined in (3).

The expansion in (4) allows us to provide a precise expansion of [see (4)] and to derive the following corollary from Theorem 3.2. According to this corollary, the limit distribution of is Gumbel.

###### Corollary 3.1

In order to transfer the results for to , we establish Lemma 5.2. This lemma does require conditions on sequences and that are stronger than the ones in Theorem 2.2. However, once we have established the limit distribution for such sequences, we will show that Theorem 2.2 can be extended to more general sequences satisfying (5).

## 4 Proofs of Theorems 3.1 and 3.2 and Corollary 3.1

We suppose in the sequel that assumptions (A1) and (A2) are fulfilled, and we denote by , , positive real numbers that depend only on , , , [and possibly on under the additional assumption (A4)]. These real numbers may change from one line to the other. We write and , for any real numbers and .

In order to deal simultaneously with the cases where is a Brownian bridge or a Brownian motion [see assumption (A2)], we shall make use of the representation

(11) |

where is a standard Brownian motion, if is a Brownian motion and a standard Gaussian variable that is independent of , in case is a Brownian bridge. To prove Theorem 3.1, we need some preliminary results on the tail probabilities of and its supremum. These results can be found in Supplement B in durotkulikovlopuhaa2012 (). A first result, which is similar to Lemmas 2, 3 and 4 in durot2007 (), is that there exist and such that for all and ,

(12) |

In particular, for all , this implies that . See Lemma 6.4 in durotkulikovlopuhaa2012 (). This is not sufficient to obtain Theorem 3.1, but it will be used for its proof.

Proof of Theorem 3.1 Recall that for all , for and is nonincreasing and takes values in . Hence, we can write

(13) |

and

(14) |

This means that

Therefore, to prove Theorem 3.1 it suffices to show that

According to Lemma 6.5 in durotkulikovlopuhaa2012 (), the bound in (12) can be extended such that for any ,

where . The latter upper bound tends to zero as for all since by assumption. This completes the proof of Theorem 3.1.

We suppose in the sequel that in addition to (A1) and (A2), assumption (A4) is fulfilled. The first step in proving Theorem 3.2 is to approximate an adequately normalized version of by the location of the maximum of a Brownian motion with parabolic drift. To this end define

(15) |

where

(16) |

with taken from representation (11). Then for and satisfying the conditions of Theorem 3.2, we obtain

where is defined by (9), and is taken from (A4). See Lemma 6.6 in durotkulikovlopuhaa2012 ().

Next, we proceed with localization. The purpose of this is that localized versions of and , can be approximated by independent random variables, if and are in disjoint intervals that are suitably separated. First note that the location of the maximum of a process is invariant under addition of constants or multiplication by . Therefore, from (7) it follows that for all we have

(17) |

where

(18) |

for every fixed, is the standard Brownian motion defined by

(19) |

with defined by (11), and

(20) | |||||

where is taken from representation (11), and for all and ,

(21) |

For all , we define the localized version of by

(22) |

We find that

for any that satisfies . See Lemma 6.7 in durotkulikovlopuhaa2012 ().

Finally, using the fact that, roughly speaking,

for all close enough to , we bound from above and below by the absolute value of the following quantities:

(23) |

and

(24) |

where and are defined in (18) and (19), is chosen sufficiently close to , and where is a sequence of positive numbers that converges to zero as , which is to be chosen suitably. The purpose of this is that when we will vary over a small interval and fix to be the midpoint of this interval, we will obtain variables that are defined with the same drift,

and the Browian motion only depending on . The case of is similar.

For , and satisfying the conditions of Theorem 3.2, we obtain

and

for any that satisfies , where is defined by (9) and in (24) and (23). See Lemma 6.8 in durotkulikovlopuhaa2012 ().

Note that in order to obtain the above approximations, we use the following lemma, which is a variation on Lemma 2.1 in kulikov-lopuhaa2006 (). Although very simple, it turns out to be a very useful tool to compare locations of maxima.

###### Lemma 4.1

Let be an interval. Let and be real valued functions defined on such that there exists with

Assume that both and are achieved. Denoting by an arbitrary point where the maximum is achieved, we have

Suppose the maximum of is achieved at , so that for all . It is assumed that for all such that we have . Therefore,

for all such that . It follows that the maximum of cannot be achieved at such a point , which means that

This completes the proof by definition of .

To relate the suprema of and with maxima of independent random variables, we will partition the interval into a union of disjoint intervals and of alternating length, and a remainder interval , in such a way that the length of the small blocks is

(25) |

and the length of the big blocks is . More precisely, for , where

(26) |

let

(27) | |||||

and let , so that and

(28) |

Now, suppose that , and satisfy the conditions of Theorem 3.2 and let be a sequence of independent processes, all distributed like given in (3). Then, using scaling properties of the Brownian motion, we can build (possibly dependent) copies , of , such that

(29) |

where