Estimation of the convex or polytopal support of a uniform density

Get Complete Project Material File(s) Now! »

The wet part and the floating convex body

In the late 1980’s, B`ar`any and Larman [BL88] (see [BBSW07] for a review) proposed a generalization of the results on the asymptotic expected missing volume of ˆKn that were known so far in particular cases. They made no assumption on the dimension d, on symmetry properties of K and on the structure of its boundary. They considered the ε-wet part of K, as defined by Dupin [Dup22] in fluid mechanics, and later by Blaschke [Bla23]. Let us call a cap of K the intersection of a closed halfspace of Rd with K.
Definition 2.1. Let K ∈ Kd be a convex body and ǫ ∈ (0, 1). The ǫ-wet part of K, denoted by K(ε), is the union of all caps of K of volume less or equal to ε|K|. The ǫ-floating body of K is K\K(ε).
To understand this definition and why it was introduced in fluid mechanics, let us set d = 2. One should imagine that R2 is an ocean, with an iceberg in it. That iceberg is seen from above, and K is what is seen of the iceberg, i.e., its projection on the horizontal plane. The part of the iceberg inside the water is the wet part of K, and the floating part of the iceberg is the floating body of K.
B`ar`any and Larman [BL88] proved that if K is of volume one, then the expected missing volume of ˆKn, i.e., EK h |K\ ˆKn| i , is of the same order as the volume of the 1/n-wet part. Theorem 2.1 ([BL88]). Assume that K ∈ Kd has volume 1. Then, c1|K(1/n)| ≤ EK h |K\ ˆKn| i ≤ c2(d)|K(1/n)|, ∀n ≥ n0(d).

Estimation of convex bodies

In Theorem 2.8, we gave an upper bound of the minimax risk Rn(K(1) d ), and on the minimax weighted risk Qn(Kd). These two upper bounds, of the order n−2/(d+1), were obtained using the convex hull estimator ˆKn, whose both risk on the class K(1) d , and weighted risk on the class Kd, were of that order. In this section, we show that n−2/(d+1) is exactly the rate of both Rn(K(1) d ) and Qn(Kd).
Theorem 2.11 ([Bru14a]). There exists a positive integer n0(d), which depends on d, such that the following statements hold.
• The minimax risk on the class K(1) d satisfies c1(d)n− 2 d+1 ≤ Rn(K(1) d ) ≤ c2(d)n− 2 d+1 , ∀n ≥ n0(d), for some positive constants c1(d) and c2(d) which depend on d only. In addition, the convex hull estimator has the minimax rate of convergence on K(1) d .
• The minimax weighted risk on the class Kd satisfies c3(d)n− 2 d+1 ≤ Qn(Kd) ≤ c4(d)n− 2 d+1 , ∀n ≥ n0(d), for some positive constants c3(d) and c4(d) which depend on d only – with c3(d) = c1(d) – . In addition, the convex hull estimator has the minimax rate of convergence on Kd, with respect to the minimax weighted risk.

Adaptation and misspecification

In the previous sections, we proposed estimators which highly depend on the structure of the boundary of the unknown support. In particular, when the support was supposed to be polytopal with at most r vertices, for some known integer r, our estimator was also, by construction, a polytope with at most r vertices. Now we will construct an estimator which does not depend on any other knowledge than the convexity of the unknown support, and the fact that it is included in [0, 1]d. This estimator will achieve the same rate as the estimators of Section 2.2.1 in the polytopal case, that is, r ln n/n, where r is the unknown number of vertices of the support, and the same rate as the convex hull estimator ˆKn, in the case where the support is not polytopal, or is polytopal but with a large number of vertices. Note that if the unknown support is a polytope with r vertices, with r larger than (ln n)−1n d−1 d+1 , then the rate of convergence of the risk of ˆKn, not larger than n−2/(d+1), is smaller than the rate r(ln n)/n of the upper bound of the risk of ˆ P(r) n . The idea which we develop here is inspired by Lepski’s method [Lep91]. The classes P(1) r , r ≥ d + 1, are nested, that is, P(1) r ⊆ P(1) r′ , for r ≤ r′. So if the true value of r is unknown, it seems better to overestimate, than to underestimate it. Intuitively, it makes sense to fit some polytope with more vertices to P, while the opposite may be impossible (e.g. on any triangle, it is possible to fit well some quadrilateral, but on a square, it is impossible to fit well a triangle). We use this idea in order to select an estimator among the preliminary estimators ˆ P(r) n , r ≥ d + 1, and ˆKn.

READ Effects of spray-drying conditions on the surface composition

Estimation of log-concave densities

Model (DS) consists of estimating the convex support of a uniform density in Rd. Such densities are log-concave, i.e. their logarithm, taking values in [−∞,∞), is a concave function. Estimation of log-concave densities in Rd is a particular case of density estimation under shape constraints. It is a challenging domain, and log-concave densities are of particular interest, especially for some of their properties, which are useful in many applications. First of all, some very important and useful densities are log-concave, such as Gaussian densities, logistic, Gumbel, Laplace, etc.
Their level sets are convex. They are unimodal, and their convolution with any unimodal density unimodal. In addition, the class of log-concave densities is closed under weak convergence. In some recent leading works [CS10, CSS10, DW13], the maximum likelihood estimator of a log-concave density has been studied. In particular, this estimator is proved to exist and to be unique, and it is a tent function: it is supported on the convex hull of the sample, and the subgraph of its logarithm on its support is a convex polytope. Its consistency and its rate of convergence, measured with the Hellinger distance, have been studied, mostly in small dimensions, d = 1, 2. In case of misspecification, i.e., when the true density is not log-concave, under mild assumptions, the maximum likelihood estimator converges to the projection of that density – with respect to the Kullback-Leiber divergence – on the class of log-concave densities. Some questions, related to our work, remain open. In particular, how many vertices does the graph of the maximum likelihood estimator have ? More generally, how many k-faces does it have, for k = 1, . . . , d − 1 ? Is it possible, similarly to Efron’s identity, to connect the risk of this estimator to its geometrical properties ? Does this estimator achieve the minimax rate of convergence, if measured using the Hellinger distance between densities ? Or using a Lp distance ?

Table of contents :

1 Introduction
1.1 Why set estimation ?
1.2 Definitions and Notation
1.2.1 Set-valued estimators
1.2.2 Notation and first definitions
1.2.3 The minimax setup
1.2.4 Adaptation and misspecification
1.3 The statistical models
2 Estimation of the convex or polytopal support of a uniform density
2.1 Random polytopes
2.1.1 The wet part and the floating convex body
2.1.2 Convergence of the random convex hull
2.1.3 Asymptotic facial structure
2.2 Minimax estimation
2.2.1 Estimation of polytopes
2.2.2 Estimation of convex bodies
2.2.3 Adaptation and misspecification
2.3 Further results and open problems
2.3.1 A universal deviation inequality for the random convex hull
2.3.2 Moments of the number vertices of the convex hull estimator
2.3.3 Efron’s identity revisited
2.3.4 Non uniform distributions
2.3.5 Estimation of log-concave densities
2.4 Proofs
2.5 Appendix to Chapter 2
2.5.1 Lemmata
2.5.2 Proofs of Lemmata and others
3 A nonparametric regression model
3.1 Estimation of polytopes
3.1.1 Upper bound
3.1.2 Lower bound
3.2 Estimation of convex bodies
3.3 Adaptation and misspecification
3.4 Discussion
3.5 The one-dimensional case and the change point problem
3.5.1 Detection of a segment
3.5.2 Estimation of a segment
3.5.3 Conclusion and discussion
3.6 Proofs
3.7 Appendix to Chapter 3
3.7.1 Lemmata
3.7.2 Proof of the lemmata
4 Estimation of functionals
4.1 The density support model
4.1.1 Estimation of the volume of a convex body
4.1.2 Estimation of the area of a triangle
4.1.3 Estimation of intrinsic volumes
4.2 The regression model
5 Conclusion, contributions