Logconcave density estimation with symmetry or modal constraints
Abstract
We study nonparametric maximum likelihood estimation of a logconcave density function which is known to satisfy further constraints, where either (a) the mode of is known, or (b) is known to be symmetric about a fixed point .
We develop asymptotic theory for both constrained logconcave maximum likelihood estimators (MLE’s), including consistency, global rates of convergence, and local limit distribution theory.
In both cases, we find the MLE’s pointwise limit distribution at (either the known mode or the known center of symmetry) and at a point .
Software to compute the constrained estimators is available in the R package logcondens.mode
.
The symmetryconstrained MLE is particularly useful in contexts of location estimation. The modeconstrained MLE is useful for moderegression. The modeconstrained MLE can also be used to form a likelihood ratio test for the location of the mode of . These problems are studied in separate papers. In particular, in a separate paper we show that, under a curvature assumption, the likelihood ratio statistic for the location of the mode can be used for hypothesis tests or confidence intervals that do not depend on either tuning parameters or nuisance parameters.
mylongform \excludeversionmynotes \NewDocumentCommand\tnbO t_#1
Modeconstrained MLE
and label=u1,url]http://www.stat.washington.edu/jaw/ \thankstextt1Supported in part by NSF Grants DMS1104832 and a University of Minnesota GrantInAid grant. \thankstextt2Supported in part by NSF Grants DMS1104832 and DMS1566514, and NIAID grant 2R01 AI29196804
class=AMS] \kwd[Primary ]62G07 \kwd[; secondary ]62G05 \kwd62G20
mode \kwdconsistency \kwdconvergence rate \kwdempirical processes \kwdconvex optimization \kwdlogconcave \kwdshape constraints \kwdsymmetric
Contents
 1 Introduction and overview
 2 Maximum likelihood estimator finite sample properties: unconstrained and modeconstrained
 3 Simulation results
 4 Hellinger consistency and rates
 5 Local limit processes and limiting distributions at fixed points
 6 Outlook and further problems
 7 Proof sketches and outlines
 8 Proofs
 9 Technical lemmas
1 Introduction and overview
The classes of logconcave densities on (and on ) have great importance in statistics for a variety of reasons including their many natural closure properties, including closure under convolution, affine transformations, convergence in distribution, and marginalization. These classes are also unimodal and serve as important nonparametric generalizations of the class of Gaussian distributions.
Nonparametric estimation in the unconstrained classes of logconcave densities has developed rapidly in the past 10–15 years. Existence of maximum likelihood estimators for logconcave densities on was provided by Walther [2002], while Pal, Woodroofe and Meyer [2007] established consistency. Dümbgen and Rufibach [2009] gave rates of convergence in certain uniform metrics, and provided efficient algorithms based on “active set” methods (see also Dümbgen and Rufibach [2011]). Balabdaoui, Rufibach and Wellner [2009] established pointwise limit distribution theory for the MLE’s, while Doss and Wellner [2016a] established rates of convergence of the MLE in the Hellinger metric. There has also been rapid progress in estimation of logconcave densities on ; see e.g. Cule, Samworth and Stewart [2010], Cule and Samworth [2010a], Dümbgen, Samworth and Schuhmacher [2011], Seregin and Wellner [2010], and Han and Wellner [2016].
Interesting uses of the unconstrained logconcave MLE’s in more complicated models, mostly in mixture modeling and clustering, have been considered by Chang and Walther [2007], Eilers and Borgdorff [2007], Walther [2009], and Cule and Samworth [2010b].
On the other hand, for a number of important statistical problems it is of great interest to understand estimation in several important subclasses of the class of all logconcave densities on .

For testing that a logconcave density on is symmetric about a known point, for example , we need know how to estimate the logconcave density both with and without the constraint of symmetry.

For the basic problem of estimation of location with a symmetric error density, it is important to know how to estimate a symmetric logconcave density with mode (and median and mean) equal to .

For inference about the mode of a logconcave density it is necessary to understand how to estimate a logconcave density with a known mode (but without the constraint of symmetry).
Once the properties of nonparametric estimators within these subclasses is understood, then the estimators can be used to develop statistical methods with known properties for other more complex statistical problems. For example: the basic procedures we study here can be viewed as building blocks to be used for, among others:
 (a)

Testing the hypothesis of symmetry of a logconcave density.
 (b)

Estimation of the location of a symmetric logconcave density.
 (c)

Inference about the mode of a logconcave density.
 (d)

Nonparametric modal regression (as in Chen et al. [2016], but using logconcavity).
 (e)
 (f)

Modal clustering (as in Chacón [2016], but using logconcavity).
 (g)

Estimation of a spherically symmetric multivariate logconcave density, which is pursued in Xu and Samworth [2017].
 (h)

Inference about the center of an elliptical multivariate distribution based on the assumption of a logconcave underlying shape.
Thus our focus here is on estimation of a logconcave density in two important subclasses:
Let denote the class of all logconcave densities on the real line .
The two subclasses we study here are:
(1) The class of all logconcave densities with mode a
fixed number .
(2) The class of all logconcave densities symmetric at .
We rely on the methods and properties developed here for the subclass to derive new inference procedures for the mode in Doss and Wellner [2016b]. The subclass has already been used in Balabdaoui and Doss [2018] to study semiparametric mixture models. The methods developed here for are also being used in an ongoing study by Laha [2018] of efficient estimation of a location parameter in the classical semiparametric symmetric location model with the (very natural) assumption of a symmetric logconcave error distribution. In ongoing work, Doss (2017) studies new methods for modal regression based on estimation in the class .
Thus our main goals here are the following:

To show that the modeconstrained MLE’s and exist and to provide useful characterizations thereof.

Establish useful finitesample properties of and .

Establish consistency of the modeconstrained and symmetric modeconstrained MLE’s with respect to the Hellinger metric.

Establish local rates of convergence of the constrained estimators and and establish the (pointwise) asymptotic distributions of the constrained estimators.

Establish global rates of convergence of the constrained estimators.
Here is a brief summary of the paper: In Section 2 we show that the constrained estimators exist and satisfy useful characterizations. Section 3 provides plots of the constrained estimators and provides comparisons to each other and to the unconstrained maximum likelihood estimators . In Section 4 we summarize results concerning consistency and global rates of convergence, while Section 5 addresses local rates of convergence and limiting distributions at fixed points. Section 6 summarizes some problems and difficulties concerning extensions to higher dimensions. All the proofs are given in the Appendix.
2 Maximum likelihood estimator finite sample properties: unconstrained and modeconstrained
2.1 Notation and terminology
Several classes of concave functions will play a central role in this paper. In particular, we let
(2.1) 
and, for any fixed ,
(2.2) 
is the class of concave functions on with mode at . We also let
(2.3) 
Here proper and closed concave functions are as defined in Rockafellar [1970], pages 24 and 50. We will follow the convention that all concave functions are defined on all of and take the value off of their effective domains where (Rockafellar [1970], page 40). The classes of unconstrained and constrained logconcave densities are then
where is Lebesgue measure on . We let be the observations, independent and identically distributed with density with respect to Lebesgue measure. Here we assume throughout that and frequently that for some or . We let denote the order statistics of the ’s, and write for the order statistics of . We let denote the empirical measure, let denote the empirical distribution function, and let denote the empirical distribution function of .
We define the loglikelihood criterion function by
(2.4) 
where we have used the standard device of including the Lagrange term in to avoid the normalization constraints involved in the classes and . This is as in Silverman [1982], Dümbgen and Rufibach [2009], and other current literature.
We will denote the unconstrained MLE’s of , , and by , , and respectively. The corresponding constrained estimators with mode and symmetric estimators with mode at will be denoted by , , , and , , respectively. Thus
Before proceeding to results concerning existence and uniqueness of the constrained estimators and , we first explain some undesirable properties of “naive” constrained estimators based on the unconstrained MLE’s and .
2.2 Naive Estimators
We can easily construct “naive” estimators under our two classes of constraints. For instance, a naive modeconstrained estimator based on the unconstrained logconcave MLE is , where is the mode of . Then indeed has mode . Let . Unfortunately, these estimators have quite undesirable properties. For example, when , we can see that
since by Theorem 3.6 of Balabdaoui, Rufibach and Wellner [2009], so the first summand is by Corollary 2.2 of Balabdaoui, Rufibach and Wellner [2009] (and its proof, see their (4.34)). Thus, away from the mode, this naive estimator in fact converges at a slower rate than .
Similarly, a naive symmetric estimator can be constructed. Let . Then is symmetric about its mode . (It is not necessarily a bona fide density that integrates to , but its integral converges to .) Again, unfortunately, a similar analysis as above shows that if , then
since . Since and , we again see that the naive estimator converges at a slower rate than .
In summary, naive plugin estimation for the mode and symmetry constraints does not work. The poor performance of these and other “naive” or “plugin” estimators motivates study of the constrained MLE’s, which we now pursue.
2.3 The unconstrained and the constrained MLE’s
To develop theory for the modeconstrained estimators , , and it will be helpful to consider modeaugmented data with or as follows:
 (1)

If for some then for and .
 (2)

If for some (where and ), then we define for , , and for . In this case
and .
Theorem 2.1.
The following statements hold almost surely when are i.i.d. from a density on .

(Doss [2013b]) For the modeconstrained MLE exists and is unique. It is piecewise linear with knots at the ’s and domain . If is not a data point, then at least one of or is .

The constrained MLE exists for and is unique. It is piecewise linear with knots contained in the set of points
, and is for . If , then .
The previous result shows that the MLE’s exist. Unfortunately, there is no closed form expression for the MLE’s. However, since they are solutions to optimization problems, they satisfy certain optimality conditions. Thus, the next two theorems we present provide systems of inequalities and equalities that characterize the MLE’s.
Theorem 2.2.

(Doss [2013b]) Suppose that . Then is the MLE over if and only if
(2.5) for all such that for some .

Suppose that and . Then is the MLE over if and only if
(2.6) for all such that for some .
To state the second characterization theorem for the MLE’s, we first introduce some further notation and definitions. For a continuous and piecewise linear function we define its knots to be
Note that , , and are all continuous and piecewise linear functions (with , in the case of and , and with , in the case of ), and we have
Now suppose that is piecewise linear with knots at the (modeaugmented) data, let , and assume that . For define
(2.7) 
Definition 2.3.
With considered as a possible knot of we say that is a left knot (or LK) if and that is a right knot (or RK) if . We say that is not a knot (or NK) if . All other knots are considered to be left knots (LKs) or right knots (RKs) depending on whether they are strictly smaller or strictly larger than .
Theorem 2.4.

is the MLE if and only if satisfies, with and ,
Remark 2.5.
Remark 2.6.
These characterization theorems have two important corollaries. (Recall that denotes the empirical distribution function of the ’s.)
Corollary 2.7.
(MLE’s related to at knot points) Each of the following holds almost surely.

on .

on .

on .
Now for any distribution function on let and .
Corollary 2.8.
(Mean and variance inequalities)
A. and .
B. and .
Because does not have mode , and because only has mode if , we cannot make comparisons between the mean and variances of and .
3 Simulation results
Software to compute the modeconstrained estimator, and also to implement the
likelihood ratio test and corresponding confidence intervals studied in
Doss and
Wellner [2016b], is available in the package logcondens.mode
[Doss, 2013a] in R [R Core Team, 2016]. Here we illustrate the
existence and characterization results on simulated data in
Figure 1. There are two columns of four plots. The
left column includes the modeconstrained logconcave MLE. The right column
includes the symmetric logconcave MLE. The data points are represented
by vertical hash lines along the bottom of each plot. The density, log
density, and distribution function are plotted in the top three rows, with
the unconstrained logconcave MLE in red
and the true (unknown) function in black.
On the left the modeconstrained MLE is in blue, and on the right the symmetric MLE is in blue.
The empirical df is plotted in green in the third row.
In the last row,
we plot and in blue to illustrate Theorem 2.4 2 (left plot) and
Theorem 2.4
3 (right plot),
and in red
to illustrate Theorem 2.4 1.
In all the plots, dashed vertical red lines give
and dashed vertical blue lines give knots of the constrained estimator (which frequently overlap).
The solid blue line is the specified mode value for the modeconstrained MLE.
Plots from further simulations can be found on pages 10–13 of Doss and
Wellner [2016c].
Figure 2 gives plots of (“SC”), (“MC”), (“UC”), and (“E”). The left and right plot are each one simulation with sample size and , respectively, from a distribution. The plots show improvements by and over . The MC and UC lines are indistinguishable when since one needs to plot locally to the mode to see differences between and when is large.
4 Hellinger consistency and rates
Pal, Woodroofe and Meyer [2007] showed that the unconstrained MLE’s are a.s. consistent in the Hellinger metric where , and their methods also yield consistency for the MLE’s over any subclass for which the MLE’s exist and satisfy
This nicely includes the subclass when ; i.e. the mode has been correctly specified. Further consistency results are due to Cule and Samworth [2010a], Rufibach [2006], and Dümbgen and Rufibach [2009]. To the best of our knowledge, this is the first treatment of the consistency and global rate properties of the constrained estimators.
Theorem 4.1.
Remark 4.2.
Kim and Samworth [2016] extend Part A of Theorem 4.1 by upper bounding the maximal risk of : their Theorem 5 implies that is (considering squared Hellinger rather than Hellinger distance). They also provide a matching lower bound: their Theorem 1 implies
for some , where the infimum is over all (measurable) estimators of . Neither upper nor lower bounds for the (Hellinger) minimax risk are known for either of the constrained density classes we consider in the present paper, although we conjecture that is the minimax rate of convergence in both cases.
Remark 4.3.
When , then we can show that where satisfies
see Figure 4 of Doss and Wellner [2016c]. Similarly, when , then we can show that where satisfies
but we will not pursue this here since our goal in this paper is to understand the null hypothesis (or correctly specified) behavior of the constrained estimators , , and . See Doss and Wellner [2016b] for some initial steps concerning the power of the likelihood ratio test based on when .
In addition to considering Hellinger distance, one can consider the sup norm (on compact sets) as a metric for global convergence. It turns out that the proofs in Doss and Wellner [2016b] rely crucially on knowing the rate of supnorm convergence for (as well as for ). Thus we study the supnorm rate of convergence for in that paper. In Theorem 4.1 of that paper we find, when the true logdensity satisfies a Hölder condition of order and , that the rate of convergence is on compact sets interior to the support of .
5 Local limit processes and limiting distributions at fixed points
Our goal in this section is to describe the limiting distributions of our estimators, both unconstrained and constrained, at fixed points (and and ) at which the true density satisfies a curvature condition. We also want to compare and contrast the behavior of the three different estimators.
5.1 The limit processes, unconstrained and constrained
We first need to introduce the local limit proceses which we need to treat the local (at a single point or in a neighorhood of a point) limiting distributions of the estimators, unconstrained and constrained. For all of our estimators (including the unconstrained and the two different modeconstrained estimators), the limit distributions are not Gaussian. But they are defined in terms of socalled invelope processes of integrated Brownian motion. We first recall the invelope process related to the limit distribution for the unconstrained estimators; this process was first presented and studied in Groeneboom, Jongbloed and Wellner [2001a] (and shown to yield the limit distribution in several convex function estimation problems in Groeneboom, Jongbloed and Wellner [2001b]). Let be a twosided standard Brownian motion starting at and for any let
(5.1) 
Theorem 5.1 (Groeneboom, Jongbloed and Wellner [2001a]).
Let , , and be as in (5.1).
Then there
exists an almost surely uniquely defined random continuous function
satisfying the following conditions:
(i) The function is everywhere below :
(ii) has a concave second derivative.
(iii) satisfies
The random variables and give the universal component of the limit distribution of and ; see Theorem 5.5, below.
Theorem 5.1 concerns a process , related to the unconstrained concave estimation problem. In the mode constrained estimation problem, , instead of having one process we have two, one for the lefthand side of (negative axis) and one for the righthand side of (positive axis). (Here, corresponds to the mode , by a translation.) The definitions of the left and righthand processes depend on a random starting point for the corresponding integrals involved, which we will eventually denote and (this is made clear in (8.29), (8.30), and (8.31), below). To define and , we must define rigorously the possible ‘bend points’ of . To describe the situation exactly, we also will define ‘bend points’ and , satisfying and , where the inequality may or may not be strict; these bend points arise in (5.10) below.
Theorem 5.2.
Assume , are random processes where . Define the ‘bend points’ by
(5.2) 
Next, define
(5.3)  
(5.4)  
(5.5) 
Let be a standard twosided Brownian motion with , and for let
(5.6) 
With these definitions, we assume that:
(i) and and
(5.7) 
(ii)
(5.8)  
(5.9) 
(iii)
(5.10) 
Then, and are unique, as are and .
Theorem 5.2 shows that processes with the given properties are unique; that they exist will follow from Section 5.3.
If is, in fact, piecewise linear, then is just the last knot point of with . By Theorem 23.1 of Rockafellar [1970], a finite, concave function on such as has welldefined right and leftderivatives at all of ; the specification of left and right derivatives in the definitions of and are for concreteness but not necessary since we consider all .
The distinction between and depends only on the behavior of at , and can be understood by considering the case where the infima in (5.4) and in (5.5) are actually minima (the infima are attained). In that case, we see that can be thought of as the smallest “rightknot” in the sense that has a strictly negative slope to the right of . And can be thought of as the smallest positive knot. Note that (by concavity) all positive knots are rightknots, so that . Note that the infimum defining in (5.5) is taken over knots that are strictly larger than , so that (when the infima are attained) we have . On the other hand, if is a rightknot then , so that then and are distinct. If is not a rightknot, then we will have . These statements are slightly complicated by the fact that and are defined as infima rather than minima, but the intuitive differences are captured by the previous description. Corresponding statements hold for and .
The distinction between the two sets of knots pairs is important because many of our arguments depend on constructing “perturbations” of , and we can use different types of perturbations at each pair. This means that the different knot pairs have different properties: if we replace by in (5.7), then that display may not hold, and similarly, if we replace by in (5.10), then that display may not hold. The following lemma holds for but not necessarily for .
Lemma 5.3.
With the definitions and assumptions as in Theorem 5.2,
(5.11) 
Now we introduce the appropriate limit processes for the symmetric about modeconstrained estimators. The characterization is similar to that for the modeconstrained (but not symmetric) processes, but since it is defined only on the processes are not the same.
Note: We decide to always let . Explicitly excluding in the definition of (including in the set of knots) is somewhat redundant since by definition is not defined. But we make it explicit.
This definition seems like the appropriate analog for the twosided case (in the twosided modeconstrained case we may have as a “knot” even though it is not a rightknot. This entails only that , which is effectively tautologically true in this onesided setting). Thus, just as in the twosided case, we do not generally have that .
Theorem 5.4.
Assume is a random process on , and assume that . Define
For , let be a onesided standard Brownian motion with , let , and let
Suppose that and
(i)
(5.12) 
(ii)