Methods for Estimation of Convex Sets

Methods for Estimation of Convex Sets

\fnmsVictor-Emmanuel \snmBrunellabel=e1]vebrunel@mit.edu [ \printeade1 Massachusetts Institute of Technology, Department of Mathematics
Abstract

In the framework of shape constrained estimation, we review methods and works done in convex set estimation. These methods mostly build on stochastic and convex geometry, empirical process theory, functional analysis, linear programming, extreme value theory, etc. The statistical problems that we review include density support estimation, estimation of the level sets of densities or depth functions, nonparametric regression, etc. We focus on the estimation of convex sets under the Nikodym and Hausdorff metrics, which require different techniques and, quite surprisingly, lead to very different results, in particular in density support estimation. Finally, we discuss computational issues in high dimensions.

[
\kwd
\startlocaldefs\endlocaldefs\runtitle

Estimation of Convex Sets

class=MSC] \kwd[Primary ]62-02 \kwd62G05

Convex body \kwdSet estimation \kwdNikodym metric \kwdHausdorff metric \kwdSupport function

1 Preliminaries

1.1 Introduction

In nonparametric inference, the unknown object of interest cannot be described in terms of a finite number of parameters. Examples include density estimation, nonparametric and high dimensional regression, support estimation, etc. Since the number of observations is only finite, it is necessary to make assumptions on the object of interest in order to make statistical inference significant. Two types of assumptions are most common: Smoothness assumptions and shape constraints. A smoothness assumption usually imposes differentiability up to some fixed order, with bounded derivatives (the reader could find an introduction to the estimation of smooth density or regression functions in [Tsy09, Chapter 1]; [MT95] imposes smoothness assumptions on the boundary of the support of an unknown density or on the boundary of an unknown set in image reconstruction from random observations). Shape constraints rather impose conditions such as monotonicity, convexity, log-concavity, etc. (e.g., [KS16] assumes log-concavity of the unknown density; [CGS15] imposes monotonicity or more general shape constraints on the unknown regression function; [KST95a] imposes a monotonicity or a convexity constraint on the boundary of the support of the unknown density; [KT94, Bru16] impose convexity on the support of the unknown distribution).

Smoothness is a quantitative condition, whereas a shape constraint is usually qualitative. Smoothness classes of functions or sets depend on meta parameters, such as the number of existing derivatives or upper bounds on some functional norms. However, in statistical applications, these meta parameters are unlikely to be known to the practitioner. Yet, statistical inference usually requires to choose tuning parameters that depend on these meta parameters. One way to overcome this issue is to randomize the tuning parameters and apply data driven adaptive procedures such as cross validation. However, such procedures are often technical and computationally costly. On the opposite, shape constraints usually do not introduce extra parameters, which makes them particularly attractive.

Many different shape constraints can be imposed on sets. For instance, [Tsy94, KST95b, KST95a] consider boundary fragments, which are the subgraphs of positive functions defined on a hypercube (or, more generally, on a metric space). Shape constraints on such sets directly translate into shape constraints on their edge functions. For general sets, convexity is probably the most simple shape constraint, even though it leads to a very rich field in geometry. Convexity can be extended to the notion of -convexity, where an -convex set is the complement of the union of open Euclidan balls of radius , (see, e.g., [ML93] and [RC07, PL08] for set estimation under -convexity and, more generally, [Cue09, CFPL12] for broader shape constraints in set estimation). Informally, convexity is the limit of -convexity as goes to . In set estimation, if it is assumed that the unknown set is -convex for some , the meta parameter may also be unknown to the practitioner and [RCSN16] defines a data-driven procedure that adapts to . In the present article, we only focus on convexity, which is a widely treated shape constraint in statistics. On top of convexity, two additional constraints are common in statistics: the rolling ball condition and standardness. A convex set is said to satisfy the -rolling ball condition () if, for all on the boundary of , there is a Euclidean ball such that , where is the closure of (see [Wal97, Wal99] for characterizations of the rolling ball condition, connections with -convexity and statistical applications in set estimation). An equivalent condition is that the complement of has reach at least . The reach of a set is the supremum of all positive numbers such that any point within a distance of that set has a unique metric projection onto the closure of that set (see [Thä08, Definition 11]). A convex set is called -standard () if for all on its boundary, , for all small enough. This roughly means that the set does not have peaks.

In general, two main types of convex bodies are distinguished in the literature.

• Convex bodies with smooth boundary: The boundary of a convex body is smooth if for all , has a unique supporting hyperplane that contains . In that case, let be the unique supporting hyperplane containing and let be the unit vector orthogonal to and pointing towards the inside of . Identify the -dimensional linear subspace with ; Then, every that is in some neighborhood of can be written uniquely as , where and is a nonnegative convex function defined in a neighborhood of in . If the Hessian of at is positive definite, is said to have positive curvature at . Otherwise, has zero curvature at .

• Convex polytopes: A convex polytope (in short, a polytope) is the convex hull of finitely many points in . By the Minkowski–Weyl theorem, a polytope can also be represented as the intersection of finitely many closed halfspaces. The supporting hyperplane of a polytope containing is unique if is not in a -dimensional face of for some , and has zero curvature at all such boundary points .

We refer the readers who are interested in learning more about convex bodies to [Sch93], and to [Zie95] for a comprehensive study of convex polytopes.

In the field of nonparametric statistics, the problem of set estimation arose essentially with the works [Gef64] (On a geometric estimation problem) and [Che76], which deal with the estimation of the support of a density in a general setup. A simple and natural estimator of the support of an unknown density was introduced in [DW80], where the estimator is defined as the union of small Euclidean balls centered around the data points. In fact, this estimator is equal to the support of a kernel density estimator for the kernel that is the indicator function of the Euclidean unit ball.

The scope of this survey is the estimation of convex sets. We aim to give an exposition of several methods that build on stochastic and convex geometry, empirical process theory, functional analysis, linear programming, order statistics and extreme value theory, etc. Different models associated with the estimation of convex sets include density support estimation [KT93a, KST95b, KST95a, Bru16, Bru18b, Bru], density level set estimation [Har87, Pol95, Tsy97], inverse problems in density support estimation [BKY], estimation of the support of a regression function [KT93b, Tsy94, Bru13], estimation of the level sets of the Tukey depth function [Bru18a], estimation of support functions [GKM06, Gun12], etc.

Throughout this survey, a set estimator is a set-valued statistic, i.e., a set which depends on the observed random variables. A precise definition would be necessary in order to rule out measurability issues. However, in order to keep the focus on convex set estimation, we rather choose not to mention these issues, and all probabilities (resp. expectations) should be understood as outer probabilities (resp. expectations). For detailed accounts on set-valued random variables, we refer to [Mol05].

Before going more into the details, let us introduce some notation and definitions.

1.2 Notation and Definitions

In the sequel, is a positive integer, standing for the ambient dimension. For a positive integer , the closed -dimensional Euclidean ball with center and radius is denoted by . If , we may omit the subscript . The -dimensional unit sphere is denoted by and the volume of the -dimensional unit Euclidean ball is denoted by . The Euclidean norm in is denoted by , the Euclidean distance is and we write for the canonical dot product.

A convex body is a compact and convex set with nonempty interior. We denote by the collection of all convex bodies and by the collection of all convex bodies included in . The support function of a convex body is defined as , for all : It is the signed distance of tangent hyperplanes to the origin.

The volume of a measurable set is denoted by .

The Nikodym distance between two measurable sets is the volume of their symmetric difference: . The Hausdorff distance between any two sets is defined as .

The cardinality of a finite set is denoted by .

When i.i.d. random points have a density with respect to the Lebesgue measure, we denote by their joint distribution and by the corresponding expectation operator, where we omit the dependency on for simplicity. When is the uniform density on a compact set , we simply write and . The convex hull of is denoted by .

In this article, most, if not all, set-valued estimators are polytopes that depend on a finite random sample. Nonetheless, in order to be consistent with the literature, we reserve the name random polytope for only.

1.3 Outline

In order to assess the quality of a set estimator, the Nikodym and the Hausdorff metrics are most commonly used. Depending on which of these two metrics is to be used, the techniques in estimation of convex sets may differ a lot.

Section 2 is devoted to the estimation of convex sets under Nikodym-type metrics, especially in density support estimation. We first review essential properties of random polytopes and we relate them to the problem of support estimation under the Nikodym metric. We also recall well known results on the covering numbers of classes of convex bodies and show how these can be used in order to obtain deviation inequalities in convex support estimation. Then, we review extensions of these results to the estimation of density level sets under convexity and we discuss other convex set estimation problems under the Nikodym metric.

In Section 3, we switch to the estimation of convex bodies under the Hausdorff metric. An elementary, yet essential result, stated in Lemma 2, shows that the Hausdorff distance between two convex bodies can be computed through their respective support functions. We review important properties related to the support functions of convex bodies and we show how they apply to the estimation of convex sets under the Hausdorff metric.

Finally, in Section 4, we briefly discuss the computational aspects of convex set estimation in high dimensions. We show, through two examples, how to reduce the computational cost without affecting the rate of convergence of convex set estimators.

2 Estimation of convex sets under the Nikodym metric

2.1 Random polytopes and density support estimation

The most common representation of random polytopes consists of taking the convex hull of i.i.d. random points in . Stochastic and convex geometry have provided powerful tools to understand the properties of random polytopes, since the seminal works [RS63, RS64]. In these two papers, and the random polygon is the convex hull of i.i.d. random points with the uniform distribution in a planar convex body. The expectation of the missing area and of the number of vertices of the random polygon are computed, up to negligible terms as goes to infinity. The results substantially depend on the structure of the boundary of the support. Namely, the expected missing area decreases significantly faster when the support is itself a polygon than when its boundary has positive curvature everywhere. The missing area is exactly the Nikodym distance between the random polygon and the support of the random points. Hence, [RS63, RS64] give an approximate value of the risk of the random polygon as an estimator of the convex support. Later, much effort has been devoted to extend these results to higher dimensions, starting with [Efr65], that proves integral formulas for the expected missing volume, surface area, number of vertices, etc. in dimension 3. Among most general results, a ground breaking one is due to [BL88]. Define the -floating body of a convex body as the set of points such that any closed halfspace containing has an intersection with whose volume is at least a fraction of the total volume of , i.e., satisfies , where (see [Dup22, Bla23, SW90]). The -wet part of , denoted by , is the complement of the -floating body of in . If one thinks of as an iceberg seen from above, the floating body is the part of that is above the surface of the water, whereas the wet part is the immersed part of the iceberg.

Theorem 1 ([Bl88]).

Let have volume one. Then,

 c1|G(1/n)|≤EG[|G∖^Kn|]≤c2(d)|G(1/n)|,∀n≥n0(d),

where is a universal positive constant, is a positive constant that depends on only and is a positive integer that depends on only.

As a consequence, computing the expected missing volume of asymptotically reduces to computing the volume of the -wet part of , which is no longer a probabilistic question. In addition, it is also known [BL88] that if has volume one and goes to zero, is of the order at least and at most [BL88]. The former rate is achieved when is a polytope whereas the latter rate is achieved when has a smooth boundary with positive curvature everywhere.

In fact, when is a smooth convex body with positive curvature everywhere, it is shown in [Sch94] that

 n2/(d+1)EG[|G∖^Kn||G|]⟶c(d,G),n→∞, (1)

where is an explicit positive constant that depends on and and that is affine invariant in , i.e., for all invertible affine transormations . [Sch94] actually shows that this convergence holds for all convex bodies , by noting that all convex bodies have a unique supporting hyperplane at almost all their boundary points (e.g., almost all boundary points of a polytope lie on a -dimensional face), and where is equal to zero if and only if has zero curvature almost everywhere (e.g., if is a polytope).

An interesting result, due to [Gro74], shows that the quantity is maximum when is an ellipsoid. In that case, it can be derived from [Sch94] that the constant in (1) is of the order , as becomes large. As a consequence, when the dimension becomes too large, the random polytope performs poorly as an estimator of in the worst case, because it suffers the curse of dimensionality, both in the rate and in the constant factor . Yet, it is known that the rate cannot be improved in a minimax sense. The following result is proven in [Bru16] where, for two sequences and of positive numbers, we write if , for some positive constant that depends on a parameter . In the sequel, we also write with no subscript if the involved constant is universal.

Theorem 2 ([Bru16]).

The following inequalities hold:

 n−2d+1≲dinf~GnsupG∈KdEG[|G△~Gn||G|]≤supG∈KdEG[|G△^Kn||G|]≲dn−2d+1,

where the infimum is taken over all estimators based on i.i.d. observations.

As a consequence, the random polytope is rate optimal over the class in a minimax sense, with respect to the Nikodym metric. The upper bound in Theorem 2 is a direct consequence of Theorem 1, together with Groemer’s result [Gro74]: It suffices to evaluate the volume of the -wet part of a Euclidean ball of volume one. However, it is not clear that is optimal in terms of the constant factors that become exponentially large with the dimension. Note that with probability one, hence, always underestimates the support . This is why the estimation of through a dilation of could be appealing. It has been considered, e.g., [RR77] in the planar case for Poisson polytopes, and in [Moo84] for , but only heuristics are given in the general case, except for the estimation of the volume of in [BR16]. [BR16] poses the question of the performance of a dilated version of compared to that of itself, but the question remains open.

Note that the lower bound in Theorem 2 is also used in log-concave density estimation. The uniform density on any convex body is log-concave, and for any two convex bodies and of volume , the corresponding uniform densities and satisfy , where stands for the norm with respect to the Lebesgue measure in . Hence, some proof techniques for lower bounds on minimax risks in [KS16] are based on similar arguments as those used to prove the lower bound in Theorem 2.

As we already mentioned earlier, an attractive feature of most shape constraints is that no meta parameters are needed to describe the objects of interest, unlike in smoothness classes. Nonetheless, classes of functions or sets with a shape constraint usually contain parametric subclasses that correspond to simpler structures, which may depend on meta parameters. For instance, classes of monotone (resp. convex) functions contain piecewise constant (resp. affine) functions. A desirable property of an estimator is adaptation to these simpler structures: If the unknown object belongs to a subparametric class, then the rate of convergence of the estimator should be nearly as good as that of an estimator that would not be agnostic to that simpler structure. In recent years, there have been considerable efforts put in understanding this automatic adaptive features in shape constrained estimators [KGS17, CL15, Zha02, CGS15, Bel18, HWCS17, HW16].

Turning to the case of convex set estimation, the class contains subclasses of polytopes with bounded number of vertices, hence, whose support functions are piecewise linear with a bounded number of pieces, each piece corresponding to a vertex.

For the estimation of the convex support of a uniform distribution, the random polytope is the maximum likelihood estimator on the class . Indeed, the likelihood function is given by , for all which is maximized when (note that with probability as long as ). Recall that, as a consequence of Theorem 2, in the Nikodym metric, estimates at the speed in the worst case, i.e., when has a smooth boundary. When is a polytope, Theorem 1 implies that estimates at a much faster speed, namely, . A more refined (but not uniform in ) result was proven in [BB93]. For a polytope , let be the number of flags of , i.e., the number of increasing sequences of faces of where is a -dimensional face of , . For example, if is the -dimensional hypercube, or if is the -dimensional simplex.

Theorem 3 ([Bb93]).

Let be a polytope. Then,

 limn→∞n(lnn)d−1EP[|P∖^Kn||P|]=T(P)(d+1)d−1(d−1)!.

In particular, if is a polytope, then there is a significant gain in the speed of convergence of , which becomes nearly parametric up to logarithmic factors. In other words, adapts to polytopal supports. However, its rate still suffers the curse of dimensionality because of the factor. In [Bru16], it is shown that this rate is not optimal over subclasses of polytopes with given number of vertices in a minimax sense. The idea is that maximizes the likelihood function over the class of all convex bodies, which would too rich if it was known in advance that is a polytope with a given number of vertices. If has at most vertices, where is known a priori, [Bru16] considers the maximum likelihood estimator over the corresponding subclass of polytopes. Namely, denote by the class of all polytopes with at most vertices. The maximum likelihood estimator of in the class is defined as : It is a polytope with at most vertices that contains and has minimum volume. Note that, unlike , the maximum likelihood estimator may not be uniquely defined. However, the rate of this estimator no longer suffers the curse of dimensionality when .

Theorem 4 ([Bru16]).

Let . Then,

 1n≲dinf~GnsupP∈P(r)nEP[|P△~Gn||P|]≤supP∈P(r)nEP⎡⎢⎣|P△^P(r)n||P|⎤⎥⎦≲drlnnn,

where the infimum is taken over all estimators based on i.i.d. observations.

In [Bru16], a better lower bound is proven when , namely,

 inf~GnsupP∈P(r)nEP[|P△~Gn||P|]≳rn.

The proof of the upper bound in Theorem 4 builds on a simple discretization of the class , obtained by considering polytopes with vertices on a finite grid in , and applying similar methods to those presented in Section 2.4.

The estimator is not computable in practice, but it gives a benchmark for the optimal rate in estimation of , under the Nikodym metric. It is still not clear whether the logarithmic factor could be dropped in the upper bound (see [Bru16, Section 3.2]). A drawback of is that it requires the knowledge of , whereas is completely agnostic to the facial structure of . In order to fix this issue, [Bru16] proposes a fully adaptive procedure and defines an estimator that is agnostic of the facial structure of and yet performs at the same rate as when for some integer , and as for general supports (see [Bru16] and [Bru14] for more details). However, the estimators and are not computationally tractable, and when the dimension is not too large, the convex hull is a more realistic estimator of .

2.3 More results on Random Polytopes

Even though this survey focuses on the statistical aspects of random polytopes, it is worth mentioning many works that have tackled other probabilistic and geometric properties, which are indirectly related to the statistical estimation of the support and pose new statistical challenges.

In [RS63, RS64], the expected number of vertices of is computed in the planar case, up to some negligible terms as . [Efr65] shows a very elegant identity which relates the missing volume of and its number of vertices. It can be stated in a very general setup as follows. Given a sequence of i.i.d. random points from some arbitrary probability measure in , let be the convex hull of and be the number of vertices of , for . Then, for all ,

 E[1−μ(^Kn)]=E[Nn+1]n+1.

When is the uniform probability measure on a convex body , this identity becomes . Extensions of this inequality to higher moments of can be found in [Buc05].

In [Rei03], more results about the random polytope , involving variance bounds, are proven using Efron–Stein jackknife inequalities [ES81]. Very importantly, [Rei03] compares the random polytope to best polytopal approximations of smooth convex bodies. Let be a smooth convex body and let be a polytope with at most vertices, included in , with minimum missing volume . With probability one,

 |G∖G∗Nn||G∖^Kn|⟶cd,n→∞,

where is a positive constant that only depends on the dimension . Moreover, [Rei03] shows that as . This shows that in high dimensions, with probability , performs nearly as well as the best approximating inscribed polytope with same number of vertices, as becomes large.

Central limit theorems for the volume, number of vertices, or, more generally, number of -dimensional faces for , of random polytopes are proven in [Rei05, Par11, Par12]. A worth mentioning technique that is used in the proofs of these central limit theorems could be called Poissonization-depoissonization. The idea is to first consider a Poisson polytope, defined as the convex hull of a Poisson point process [BR10] supported on a convex body, with growing intensity. These are somewhat easier to work with, and it is shown that their behavior is close enough to that of the random polytope . Hence, the central limit theorems are first proven for the Poisson polytope, and the results are transferred to the random polytope by a depoissonization step. At a high level, this idea relies on the fact that if has volume one and if is a Poisson point process with constant intensity supported on , then is a Poisson random variable with parameter , hence, and with high probability, and conditional on , are i.i.d. random points uniformly distributed in .

Asymptotic properties of the intrinsic volumes of the random polytope are studied in [BÃ 92, Rei04, BHH08] under different assumptions on the boundary of the underlying convex body. The intrinsic volumes of a convex body can be defined through Steiner formula [Sch93, Section 4.1]. For and , let be the set of all points that are within a distance at most of . Steiner formula states that is a degree polynomial in . Namely, one can write, for all ,

 |Gε|=d∑j=0βd−jvj(G)εj, (2)

where is called the -th intrinsic volume of , for . For instance, is the volume of , is its surface area, is its mean width and . In [BHH08], it is shown that if is a smooth convex body satisfying the -rolling ball condition, then for all , as , where is a positive constant that depends on both the dimension and . In particular, the plug-in estimator is a consistent estimator of , and it converges at the same rate as the rate of convergence of in the Nikodym metric. Whether the plug-in estimator is an optimal estimator of in a minimax sense is not known in general, except when , when the answer is negative. [Gay97] considers the general problem of minimax estimation of the volume of the support of an unknown density, not necessarily uniform. In the particular case of the uniform density on an unknown convex body , a sample splitting procedure is applied in order to correct the plug-in estimator . It is shown that the minimax risk for the estimation of the volume of is of order , and this rate of convergence is attained by the explicit estimator given in [Gay97]. The estimation of the volume of is also tackled in [BR16], where the same Poissonization-depoissonization procedure as mentioned above is used in order to obtain an estimator of based on a dilation of the random polytope .

2.4 Convex bodies and covering numbers

Covering numbers provide a powerful tool to describe the complexity of a class. In empirical process theory, they are often used in order to bound the statistical performance of an estimator in expectation or with high probability, when the estimator is obtained by optimizing a criterion, such as the likelihood function.

Consider the problem of estimating the support of a uniform distribution, with . Because the support of the likelihood function (see Section 2.2) depends on the unknown parameter itself, it is not valid to take its logarithm and it cannot be approached through the lens of empirical process theory. However, tools such as covering numbers can still be borrowed from that theory in order to prove deviation inequalities for .

Without loss of generality, one can assume that for some . This guarantees that , which is a bounded class of convex bodies, and that is uniformly bounded from below. This is due to John’s theorem (e.g., see [Bal92]) and affine equivariance of . John’s theorem (e.g., see [Bal92]) implies the existence an invertible affine transformation and a point with . Moreover, if we rather denote by the convex hull of , then, . Since are i.i.d. uniform random points in , are i.i.d. uniform random points in , and . As a consequence, the rescaled risk is bounded from above by and we only need to bound uniformly on instead of the whole unbounded class .

Let and let be a metric on (e.g., Nikodym or Hausdorff distance). An -net of with respect to the metric is a set such that for all , there is with . The -covering number of with respect to is the minimum cardinality of an -net of . The following theorem is an upper bound for the -covering number of with respect to the Hausdorff distance. By [Bru, Lemma 2], the Nikodym distance is dominated by the Hausdorff distance uniformly on : , for all , where is a positive constant that depends on only. This result is a direct consequence of Steiner formula for convex bodies (see Lemma 2). Hence, the following theorem also implies an upper bound for the -covering number of with respect to the Nikodym metric.

Theorem 5 ([Bro76]).

Let . The -covering number of with respect to the Hausdorff distance is at most , for some positive constants and that depend on .

We also refer to Section 8.4 in [Dud14] for more details on metric entropy for classes of convex sets. Building on this theorem combined with standard techniques from M-estimation and empirical processes, (see, e.g., [VdV00, VdG98]), [Bru] proves the following deviation inequality for , which holds uniformly for all .

Theorem 6 ([Bru]).

There exist positive constants and such that the following holds. Let and be an integer. For all ,

 |G∖^Kn||G|≤a1n−2d+1+xn

with -probability at least .

Using the same techniques, more general deviation inequalities are proven in [Bru], when the density of the ’s is not uniform, but only supported on a convex body . For all measurable sets and all densities on , denote by . Note that when is the uniform density on .

Theorem 7 ([Bru]).

There exist positive constants and , that depend on only, such that the following holds. Let and be an integer. Let and be a density supported in , with almost everywhere, for some positive number . Let be i.i.d. random points with density and be their convex hull. Then,

 df(G,^Kn)≤C1(M+1)n−2/(d+1)+xn

with probability at least .

It is not known whether a similar upper bound would hold without the assumption that almost everywhere. This open problem amounts to the following open question. Let be any probability measure supported in a convex body . Do there exist positive constants and that only depend on , such that the -covering of with respect to the metric is bounded from above by , for all ? If has a bounded density with respect to the Lebesgue measure, the answer is positive, and it is a consequence of Theorem 8 below.

In the uniform case, concentration inequalities for were proven in [Vu05], using geometric techniques. However, constants were not explicit and depended on the support , hence, could not be used in a minimax approach.

2.5 Application of empirical process theory to the estimation of density level sets

In this section, we show how similar ideas as in Section 2.4 can be used to estimate density level sets under a convexity restriction, in the Nikodym metric. The level sets of a density in are the sets , for . Estimation of density level sets and, more specifically, of convex level sets, has been tackled, e.g., in [Har87, Pol95, Tsy97]. As pointed by [Har87], estimation of density level sets may be useful in cluster analysis. It arises as a natural tool in testing for multimodality [MS91] and, more recently, it has been explored under the lens of topological data analysis [Was16, CM17]. Notice that the -level set of a density is its support, so support estimation is a particular case of density level set estimation. However, in this section we only treat the case of positive levels , where empirical process theory has proven to be a successful tool.

Let such that . The excess mass of a measurable set is defined as . Simple algebra shows that , for all measurable sets . The empirical excess mass of a set , given a sample , is naturally defined as . Hence, the main idea to estimate is to maximize over , where is a given class of measurable sets. In this section, we assume that and we take . For instance, convexity of is ensured if is log-concave or, more generally, quasiconcave. If is the uniform density on a convex body , then for all : In that case, support estimation is equivalent to level set estimation, for small levels and the methods presented here could be applied to estimate itself. In what follows, is a fixed number and we define the estimator .

In order to achieve consistency, an assumption is usually made about the behavior of around the boundary of its level set . Namely, should not be too flat near the boundary of . The assumption proposed in [Pol95] takes the following form, where is the continuous probability measure on with density .

Assumption 1.

There exist positive constants and such that

 μ({x∈Rd:|f(x)−λ|<η})≤cηγ,

for all small enough.

Assumption 1, also known as margin condition, is usually imposed for discriminant analysis [MT99, LM15], statistical learning [Tsy04], level set estimation (a stronger assumption is proposed in [Tsy97], see Assumption 2 below) or density support estimation [Bru].

In [Pol95], the notion of covering number with inclusion, slightly different from that of covering number, is used to prove the main results.

Definition 1 (Covering number with inclusion).

Let be a class of measurable subsets of , a probability distribution in and . The -covering number of with inclusion with respect to is the smallest integer such that there exists a collection of measurable sets, with , satisfying the following: For all , there exist with and . It is denoted by and is called the metric entropy with inclusion of the class with respect to .

Note that in this definition, need not be included in . Also note that a similar notion, called metric entropy with bracketing, is widely used in function estimation, especially in empirical process theory (e.g., see [VdV00, Section 19.2]). Let be a normed space of real-valued functions defined on a set and let . For any two functions , the bracket is defined as the set of all functions satisfying for all . For all , the -bracketing number of with respect to is the smallest numbers of brackets with needed to cover . It is denoted by and is called the metric entropy with bracketing of the class with respect to . It is easy to see that for all class of measurable sets , if we let , then and differ by at most a factor , where , for all measurable, bounded functions . The following estimate is available for the class :

Theorem 8 ([Dud14]).

Let be a continuous probability measure on with a density with respect to the Lebesgue measure. Assume that almost everywhere, where is a given number. Then, as ,

 lnNI(ε,K(1)d,μ)≲d,Mε−d−12.

Together with this estimate, [Pol95, Theorem 3.7] yields the following result.

Theorem 9 ([Pol95]).

Assume that . There exists a constant such that the following holds with probability tending to one, as goes to infinity. Let be a probability measure on with a bounded density with respect to the Lebesgue measure and let Assumption 1 hold. Let and let . Then,

 μ(^Gn△Gλ)≤⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩c(2)n−2γ3γ+4 if d=2,c(3)n−γ2γ+2lnn if d=3c(d)n−2γ(γ+1)(d+1) if d≥4.

In fact, this theorem is stated under more general assumptions than convexity of the level sets. If the level sets belong to a class of sets with metric entropy with inclusion of order , for some exponent (e.g., for the class ), the rates given in the theorem depend on , and . It is noticeable that the exponent in the metric entropy with inclusion of the class matches that of the class of sets with twice differentiable boundaries in some sense (see [Dud14]).

Note that the estimator defined in [Pol95] is a polytope, and its vertices are sample points. Indeed, for all , , where is the convex hull of the sample points contained in . In the two dimensional case, [Har87] designs an algorithm to compute . To the best of our knowledge, there is no algorithm to compute or an approximation of in higher dimensions.

Optimality of the upper bounds in the above theorem is not proven in [Pol95]. However, [Tsy97] proves lower bounds for the minimax risk in both Nikodym and Hausdorff metrics. In the Nikodym metric, the lower bounds proven by [Tsy97] match the upper bounds given in the above theorem only for (up to a logarithmic factor when ), and they are faster when . The estimation of convex level sets in the Hausdorff metric requires completely different techniques. It has been tackled in [Sag79, Tsy97]. In [Sag79], the author considers both level sets corresponding to a given level and level sets with given probability content (see also [CPP13] for the estimation of level sets with given probability content); The results are then applied to the estimation of the mode of the density, by considering the smallest estimated level set. Note that a control of the estimated level sets in the Nikodym metric could not yield consistent estimation of the mode, since two sets can have a very small Nikodym distance if they both have very small volumes, even if they are far apart from each other in the space. Optimal rates in estimation of convex density level sets in both Nikodym and Hausdorff metrics are given in [Tsy97] when and they are extended to higher dimensions. More generally, [Tsy97] proves optimal rates for density level sets whose boundaries satisfy some smoothness condition. In fact, it is noticed that if satisfies for some , then the boundary of is Lipschitz, in the sense that the radial function of , defined as , is Lipschitz. For completeness, we include the precise statement and its proof here.

Lemma 1.

Let satisfies for some . Then, the radial function satisfies , for all .

Proof.

Let be the polar body of , defined as . By standard properties of polar bodies (see [Sch93, Chapter 1]), one has and the radial function is the inverse of the support function of : , for all . Subadditivity of support functions yield , for all with . Since , , for all . This proves that is -Lipschitz. Now, since we also have that , , for all . Hence, is -Lipschitz. ∎

First, [Tsy97] computes the optimal rates for star shaped density level sets with smooth radial functions. Standard techniques from functional estimation are used, such as local polynomial approximations. Then, the author tackles the problem of estimating convex level sets. As shown in Lemma 1, the case of convex level sets is included in the case of star shaped level sets with Lipschitz radial functions. Hence, the optimal rates for convex sets are not larger than the ones corresponding to Lipschitz radial functions. Perhaps surprisingly, in the Hausdorff metric, convexity of the level set does not make the problem easier than just the Lipschitz property of its radial function, since [Tsy97] shows that the optimal rate under convexity matches the optimal rate under just the Lipschitz assumption, up to logarithmic factors. In the Nikodym metric, the situation is very different: [Tsy97] proves that at least in dimension , the optimal rate for convex sets is actually much faster than in the case of Lipschitz radial functions: This is a consequence of Theorem 9 above. It can be seen easily that the same holds when , and [Tsy97] suggests that this holds in arbitrary dimension, without a giving proof. Hence, in the Nikodym metric, convexity does contribute and improve the optimal rate from the Lipschitz assumption.

[Tsy97] does not exactly use the same margin condition as [Pol95], but makes the following assumption. Let be a density in and let be its level set with level . Assume that is star shaped around the origin, and let be its radial function.

Assumption 2.

Let with , . Then, for all and such that ,

 b1≤|f(ru)−λ||r−rλ(u)|ν≤b2.

Roughly, Assumption 2 is stronger than Assumption 1 if one takes . Under Assumption 2, [Tsy97] characterizes the optimal rates for the estimation of a convex level set that satisfies with when and suggest the following extensions to higher dimensions: in the Nikodym metric and (up to a logarithmic factor) in the Hausdorff metric. In the Nikodym metric, the upper bound follows directly from [Pol95] when but [Tsy97] does not give a proof for larger . For arbitrary , the rates suggested in [Tsy97] are actually faster than the upper bounds given in [Pol95]. In the Hausdorff metric and for any , as explained above, the upper bound follows directly from the Lipschitz case, by Lemma 1.

When dealing with level sets with smooth radial functions in arbitrary dimension, [Tsy97] proves that the minimax rates are exactly given by in the Nikodym metric and in the Hausdorff metric, where is a smoothness parameter that roughly corresponds to the number of bounded derivatives of the radial function (e.g., corresponds to the Lipschitz case). It is noticeable that for convex level sets, the minimax rate in the Hausdorff metric matches the one that corresponds to smoothness , as discussed above (and as predicted by Lemma 1), whereas in the Nikodym metric, the minimax rate for convex level sets matches the rate that corresponds to smoothness . This complements the remark we made earlier: The exponent in the metric entropy with inclusion for convex bodies is the same as for sets with twice differentiable boundary (see [Pol95] and [Dud14] for more details), and the corresponding minimax rates match. However, note that even though the boundary of any convex body is twice differentiable almost everywhere, the class of convex bodies with , where , contains polytopes with arbitrarily many vertices, which have very non-smooth boundaries, together with convex bodies with smooth boundaries and positive curvature everywhere, which yet can take arbitrarily large values.

Finally, note that the rates and given in [Tsy97] match (up to logarithmic factors) those obtained in the estimation of the support of a uniform distribution, i.e., at the limit . In the Nikodym metric, the minimax rate of estimation of convex bodies is (see Theorem 2 above), whereas in the Hausdorff metric, it is , as shown in Theorem 11 (with ), see [Bru18b].

2.6 Convex support estimation in nonparametric regression

Let the following model hold:

 Yi=f(Xi)+ξi,i=1,…,n,

where are deterministic or random points in , are i.i.d. random variables, with mean zero, independent of and . In this section, we are interested in the estimation of the support of , i.e., the closure of the set . Throughout the section, we assume that is a convex body included in . In [Bru13], the function is the indicator function of : for , otherwise. The design points are i.i.d., uniformly distributed in and the ’s are sub-Gaussian, i.e., , for all , where need not be known. [Bru13] considers a least squares estimator , where is a -net of , with respect to the Nikodym metric and . The following upper bound is shown in [Bru13], in which stands for the joint distribution of the sample with .

Theorem 10 ([Bru13]).

There exist three positive constants that depend on and and a positive integer that depends on only, such that the following holds:

For all , all and all ,

 |^Gn△G|≤C1n−2d