Uniform bounds for norms of sums of independent random functions

# Uniform bounds for norms of sums of independent random functions

[ [    [ [ University of Haifa and Université Aix–Marseille I Department of Statistics
University of Haifa
Haifa 31905
Israel
Laboratoire d’Analyse, Topologie
and Probabilités
Université Aix-Marseille 1
39, rue F. Joliot-Curie
13453 Marseille
France
\smonth4 \syear2009\smonth2 \syear2010
\smonth4 \syear2009\smonth2 \syear2010
\smonth4 \syear2009\smonth2 \syear2010
###### Abstract

In this paper, we develop a general machinery for finding explicit uniform probability and moment bounds on sub-additive positive functionals of random processes. Using the developed general technique, we derive uniform bounds on the {\mathbb{L}}_{s}-norms of empirical and regression-type processes. Usefulness of the obtained results is illustrated by application to the processes appearing in kernel density estimation and in nonparametric estimation of regression functions.

[
\kwd
\doi

10.1214/10-AOP595 \volume39 \issue6 2011 \firstpage2318 \lastpage2384 \newproclaimassumptionAssumption \newproclaimassmAssumption \newproclaimassEAssumption \newproclaimassKAssumption \newproclaimassWAssumption \newproclaimassLAssumption \newproclaimremarkRemark

\runtitle

Uniform bounds for norms

{aug}

A]\fnmsAlexander \snmGoldenshluger\thanksreft2label=e1]goldensh@stat.haifa.ac.il and B]\fnmsOleg \snmLepski\corref\thanksreft3label=e2]lepski@cmi.univ-mrs.fr

\thankstext

t2Supported by the ISF Grant 389/07. \thankstextt3Supported by the Grant ANR-07-BLAN-0234.

class=AMS] \kwd[Primary ]60E15 \kwd[; secondary ]62G07, 62G08. Empirical processes \kwdconcentration inequalities \kwdkernel density estimation \kwdregression.

## 1 Introduction

### 1.1 General setting

Let \mathfrak{S} and \mathfrak{H} be linear topological spaces, (\Omega,\mathfrak{A},\mathrm{P}) be a complete probability space, and let \xi_{\theta}\dvtx\Omega\to\mathfrak{S}, \theta\in\mathfrak{H} be a family of random mappings. In the sequel, \xi_{\bullet}(\omega) is assumed linear and continuous on \mathfrak{H} for any \omega\in\Omega. Let \Psi\dvtx\mathfrak{S}\to{\mathbb{R}}_{+} be a given sub-additive functional. Suppose that there exist functions A\dvtx\mathfrak{H}\to{\mathbb{R}}_{+}, B\dvtx\mathfrak{H}\to{\mathbb{R}}_{+} and U\dvtx\mathfrak{H}\to{\mathbb{R}}_{+} such that

 \mathrm{P}\{\Psi(\xi_{\theta})-U(\theta)\geq z\}\leq g\biggl{(}\frac{z^{2}}{A^% {2}(\theta)+B(\theta)z}\biggr{)}\qquad\forall\theta\in\mathfrak{H}, (1)

where g\dvtx{\mathbb{R}}_{+}\to{\mathbb{R}}_{+} is a monotone decreasing to zero function.

Let \Theta be a fixed subset of \mathfrak{H}. In this paper, under rather general assumptions on U,A,B and \Theta, we establish uniform probability and moment bounds of the following type: for any \epsilon\in(0,1), y>0 and some q\geq 1

 \displaystyle\mathrm{P}\Bigl{\{}\sup_{\theta\in\Theta}\bigl{[}\Psi(\xi_{\theta% })-u_{\epsilon}\bigl{(}1+\sqrt{y}\lambda_{A}+y\lambda_{B}\bigr{)}U(\theta)% \bigr{]}\geq 0\Bigr{\}} \displaystyle\leq \displaystyle P_{\epsilon,g}(y), (2) \displaystyle\mathrm{E}\sup_{\theta\in\Theta}\bigl{[}\Psi(\xi_{\theta})-u_{% \epsilon}\bigl{(}1+\sqrt{y}\lambda_{A}+y\lambda_{B}\bigr{)}U(\theta)\bigr{]}_{% +}^{q} \displaystyle\leq \displaystyle E_{\epsilon,g}(y). (3)

Here \lambda_{A} and \lambda_{B} are the quantities completely determined by U,A,\Theta and U,B,\Theta, respectively, and the inequalities (2) and (3) hold if these quantities are finite; P_{\epsilon,g}(\cdot) and E_{\epsilon,g}(\cdot) are continuous decreasing to zero functions completely determined by \epsilon and g; and the factor u_{\epsilon} is such that u_{\epsilon}\to 1,\epsilon\to 0. We present explicit expressions for all quantities appearing in (2) and (3).

In order to derive (2) and (3) from (1), we assume that the set \Theta is the image of a totally bounded set in some metric space under a continuous mapping. Namely, if (\mathfrak{Z},\mathrm{d}) is a metric space, and {\mathbb{Z}} is a totally bounded subset of (\mathfrak{Z},\mathrm{d}) then we assume that there exists a continuous mapping \phi from \mathfrak{Z} to \mathfrak{H} such that

 \Theta=\{\theta\in\mathfrak{H}\dvtx\theta=\phi[\zeta],\zeta\in{\mathbb{Z}}\}. (4)

Let N_{{\mathbb{Z}},\mathrm{d}}(\delta), \delta>0 be the minimal number of balls of radius \delta in the metric \mathrm{d} needed to cover {\mathbb{Z}}. The inequalities (2) and (3) are proved under some condition that relates N_{{\mathbb{Z}},\mathrm{d}}(\cdot) and g(\cdot). It is worth mentioning that in particular examples the parametrization \Theta=\phi[{\mathbb{Z}}] is often natural, while the metric \mathrm{d} may have a rather unusual form.

Inequalities (2) and (3) can be considered as a refinement of usual bounds on the tail distribution of suprema of random functions. In particular, probability and moment bounds for \sup_{\theta\in\Theta}\Psi(\xi_{\theta}) can be easily derived from (2) and (3). The well-known concentration results deal with deviation of the supremum of a random process from the expectation of this supremum, and estimation of the expectation is a separate rather difficult problem. In contrast, in this paper we develop explicit uniform bounds on the whole trajectory \{\Psi(\xi_{\theta}),\theta\in\Theta\}. The inequality in (1) provides the basic step in the development of such uniform probability bounds. The usual technique is based on the chaining argument that repeatedly applies inequality in (1) to increments of the considered random process [see, e.g., Ledoux and Talagrand (1991) and van der Vaart and Wellner (1996), Section 2.2].

The most interesting phenomena can be observed when a sequence of random mappings \{\xi^{(n)}_{\theta},\theta\in\mathfrak{H}\}, n\in{\mathbb{N}}^{*} is considered. There exists a class of problems where the quantities \lambda_{A} and \lambda_{B} depend on n, and \lambda_{A}\to 0, \lambda_{B}\to 0 as n\to\infty. Under these circumstances, one can choose y=y_{n}\to\infty and \epsilon=\epsilon_{n}\to 0 such that

and, at the same time,

 u_{\epsilon_{n}}\bigl{(}1+\sqrt{y_{n}}\lambda_{A}+y_{n}\lambda_{B}\bigr{)}U(% \cdot)\to U(\cdot),\qquad n\to\infty. (6)

The relation in (5) means that u_{\epsilon_{n}}(1+\sqrt{y_{n}}\lambda_{A}+y_{n}\lambda_{B})U(\cdot) is indeed a uniform upper bound for \Psi(\xi^{(n)}_{\theta}) on \Theta, while (6) indicates that for large n this uniform bound is nearly as good as a nonuniform bound U(\cdot) given in (1). Typically for a fixed y>0, we have P_{\epsilon,g}(y)\to\infty and E_{\epsilon,g}(y)\to\infty as \epsilon\to 0; therefore, in order to get (5) and (6), \epsilon_{n}\to 0 and y_{n}\to\infty should be calibrated in an appropriate way.

The general setting outlined above includes important specific problems that are in the focus of the present paper. We consider sequences of random mappings that are sums of real-valued random functions defined on some measurable space (here the parameter n\in{\mathbb{N}}^{*} is the number of summands). We are interested in uniform bounds on the norms of such random functions; thus the sub-additive functional of interest \Psi is the {\mathbb{L}}_{s}-norm, s\geq 1. First, the nonuniform bound (1) is established, and then the inequalities of the type (2) and (3) are derived. It is shown that (5) and (6) hold under mild assumptions on the parametric set \Theta. We also discuss sharpness of the nonuniform inequality in (1).

### 1.2 Norms of sums of independent random functions

Let (\mathcal{T},\mathfrak{T},\tau) and (\mathcal{X},\mathfrak{X},\nu) be \sigma-finite spaces, and let \mathcal{X} be a separable Banach space. Consider an \mathcal{X}-valued random element X defined on the complete probability space (\Omega,\mathfrak{A},\mathrm{P}) and having the density f with respect to the measure \nu. Let \varepsilon be a real random variable defined on the same probability space, independent of X and having a symmetric distribution.

For any (\mathfrak{T}\times\mathfrak{X})-measurable function w on \mathcal{T}\times\mathcal{X} and for any t\in\mathcal{T}, n\in{\mathbb{N}}^{*}, define the random functions

where (X_{i},\varepsilon_{i}), i\,{=}\,1,\ldots,n, are independent copies of (X,\varepsilon). Put for 1\,{\leq}\,s\,{<}\,\infty

We are interested in uniform bounds of the type (2) and (3) for \|\xi_{w}\|_{s,\tau} and \|\eta_{w}\|_{s,\tau} when w\in\mathcal{W}, where \mathcal{W} is a given set of (\mathfrak{T}\times\mathfrak{X})-measurable functions. This setup is a specific case of the general framework with \Psi(\cdot)=\mbox{$\|\cdot\|$}_{s,\tau}, \theta=w and \Theta=\mathcal{W}. More precisely, if \psi_{w} denotes either \xi_{w} or \eta_{w}, and if {\mathbb{P}} is the probability law of X_{1},\ldots,X_{n} (when \xi_{w} is studied) or of (X_{1},\varepsilon_{1}),\ldots,(X_{n},\varepsilon_{n}) (when \eta_{w} is studied) then we want to find a functional U(\psi_{w})=U_{\psi}(w,f) such that (1) holds and

 \displaystyle\qquad{\mathbb{P}}\Bigl{\{}\sup_{w\in\mathcal{W}}\bigl{[}\|\psi_{% w}\|_{s,\tau}-u_{\epsilon}\bigl{(}1+\sqrt{y}\lambda_{A}+y\lambda_{B}\bigr{)}U_% {\psi}(w,f)\bigr{]}\geq 0\Bigr{\}} \displaystyle\leq \displaystyle P_{\epsilon,g}(y), (8) \displaystyle\mathbb{E}\sup_{w\in\mathcal{W}}\bigl{[}\|\psi_{w}\|_{s,\tau}-u_{% \epsilon}\bigl{(}1+\sqrt{y}\lambda_{A}+y\lambda_{B}\bigr{)}U_{\psi}(w,f)\bigr{% ]}_{+}^{q} \displaystyle\leq \displaystyle E_{\epsilon,g}(y), (10) \displaystyle\eqntext{q\geq 1.}

Note that \{\xi_{w},w\in\mathcal{W}\} is the empirical process. In the sequel, we refer to \{\eta_{w},w\in\mathcal{W}\} as the regression-type process as it naturally appears in nonparametric estimation of regression functions. In the regression context, X_{i} are the design variables, \varepsilon_{i} are the random noise variables.

Uniform probability and moment bounds for empirical processes are a subject of vast literature; see, for example, Alexander (1984), Talagrand (1994), van der Vaart and Wellner (1996), Massart (2000), Bousquet (2002), Giné and Koltchinskii (2006) among many others. Such bounds play an important role in establishing the laws of iterated logarithm and central limit theorems [see, e.g., Alexander (1984) and Giné and Zinn (1984)]. However, we are not aware of works studying uniform bounds of the type (8) and (10) satisfying (5) and (6) for the {\mathbb{L}}_{s}-norms of such processes.

Apart from the pure probabilistic interest, development of uniform bounds on the {\mathbb{L}}_{s}-norms of processes \{\xi_{w},w\in\mathcal{W}\} and \{\eta_{w},w\in\mathcal{W}\} is motivated by problems of adaptive estimation arising in nonparametric statistics. In particular, the processes \{\xi_{w},w\in\mathcal{W}\} and \{\eta_{w},w\in\mathcal{W}\} represent stochastic errors of linear estimators with the weight w in the density estimation and nonparametric regression models, respectively. Uniform bounds on the error process are key technical tools in development of virtually all adaptive estimation procedures [see, e.g., Barron, Birgé and Massart (1999), Devroye and Lugosi (2001) Cavalier and Golubev (2006), Goldenshluger and Lepski (2008) and Golubev and Spokoiny (2009)].

The kernel density estimator process is a particular case of the empirical process \{\xi_{w},w\in\mathcal{W}\} that was frequently studied in the probabilistic literature. It is associated with the weight function w given by

 w(t,x)=\frac{1}{n\prod_{i=1}^{d}h_{i}}K\biggl{(}\frac{t-x}{h}\biggr{)},\qquad x% \in\mathcal{X}={\mathbb{R}}^{d},t\in\mathcal{T}={\mathbb{R}}^{d}, (11)

where K\dvtx{\mathbb{R}}^{d}\to{\mathbb{R}} is a kernel, h=(h_{1},\ldots,h_{d}) is the bandwidth vector, and u/v denotes the coordinate-wise division for u,v\in{\mathbb{R}}^{d}. Limit laws for the {\mathbb{L}}_{s}-norms of the kernel density estimators were derived in Beirlant and Mason (1995); Dümbgen and Fatalov (2002) study exact asymptotics for the large/moderate deviation probabilities. Giné, Mason and Zaitsev (2003) investigate weak convergence of the {\mathbb{L}}_{1}-norm kernel density estimator process indexed by a class of kernels under entropy conditions. For other closely related work, see Einmahl and Mason (2000), Giné, Koltchinskii and Zinn (2004), Giné and Nickl (2008) and references therein. We remark that the kernel density estimator process is naturally parametrized by \mathcal{W}=\mathcal{K}\times\mathcal{H}, where \mathcal{H} is a set of bandwidths and \mathcal{K} is a family of kernels. The convolution kernel density estimator process will be also studied in Section 3.4.

The inequalities (8) and (10) are useful for constructing statistical procedures provided that the following requirements are met.

1. [(iii)]

2. Explicit expression for U_{\psi}(w,f). Typically, the bound U_{\psi}(w,f) is directly involved in the construction of statistical procedures; thus, it should be explicitly given.

3. Minimal assumptions on \mathcal{W}. This condition is dictated by a variety of problems where the inequalities (8) and (10) can be applied. In particular, the sets \mathcal{W} may have a complicated structure (see, e.g., examples in Section 3.4).

4. Minimal assumptions on f. The probability measure {\mathbb{P}} (and the expectation \mathbb{E}) as well as the right-hand sides of (8) and (10) are determined by the density f. Therefore, we want to establish (8) and (10) under weak assumptions on f. In particular, we would like to emphasize that all our results are established for the set of all probability densities uniformly bounded by a given constant. No regularity conditions are supposed.

5. Minimal assumptions on the distribution of \varepsilon. If the process \{\eta_{w},w\in\mathcal{W}\} is considered, then the probability measure {\mathbb{P}} (and the expectation \mathbb{E}) is also determined by the distribution of \varepsilon. Therefore, we would like to have (8) and (10) under mild assumptions on this distribution. We will see that the function g given in (1) depends on the distribution tail of \varepsilon.

Let us briefly discuss some consequences of requirement (i) for the process \{\xi_{w},w\in\mathcal{W}\}. Using the Talagrand concentration inequality, we prove that (1) holds with U_{\xi}(w,f)=\mathbb{E}\|\xi_{w}\|_{s,\tau}, on the space of functions \mathfrak{H}=\{w\dvtx\sup_{x\in\mathcal{X}}\|w(\cdot,x)\|_{s,\tau}<\infty\}. However, this bound cannot be used in statistical problems at least for two reasons.

First, it is implicit and a reasonably sharp explicit upper bound \overline{U}_{\xi}(w,f) on U_{\xi}(w,f) should be used instead. Sometimes if the class \mathcal{W} is not so complex (e.g., \mathcal{W}=\mathcal{K}\times\mathcal{H}) one can find a constant c independent of w, f and n such that

 c\overline{U}_{\xi}(w,f)\leq U_{\xi}(w,f)\leq\overline{U}_{\xi}(w,f).

In such cases, \overline{U}_{\xi}(w,f) can be regarded as a sharp bound on U_{\xi}(w,f). We note, however, that establishing the above inequalities requires additional assumptions on \mathcal{W} and f and nontrivial technical work. It seems that for more complex classes \mathcal{W} the problem of finding an “optimal” upper estimate for U_{\xi}(w,f) cannot be solved in the framework of probability theory. Contrary to that, theory of adaptive nonparametric estimation is equipped with the optimality criterion, and an upper bound \overline{U}_{\xi}(w,f) can be regarded as sharp if it leads to the optimal statistical procedure. Thus, sharpness of \overline{U}_{\xi}(w,f) can be assessed through accuracy analysis of the resulting statistical procedure.

Second, U_{\xi}(w,f) [and presumably its sharp upper bound \overline{U}_{\xi}(w,f)] depends on f. In the density estimation context where the process \{\xi_{w},w\in\mathcal{W}\} appears, f is the parameter to be estimated. Therefore, bounds depending on f cannot be used in construction of estimation procedures. A natural idea is to replace U_{\xi}(w,f) by its empirical counterpart \hat{U}_{\xi}(w) whose construction is based only on the observations X_{1},\ldots,X_{n}. We adopt this strategy and establish the corresponding inequality

 \mathbb{E}\sup_{w\in\mathcal{W}}\bigl{[}\|\xi_{w}\|_{s,\tau}-v_{\epsilon}\bigl% {(}1+\sqrt{y}\lambda_{A}+y\lambda_{B}\bigr{)}\hat{U}_{\xi}(w)\bigr{]}^{q}_{+}% \leq\tilde{E}_{\epsilon,g}(y),\qquad q\geq 1,\hskip-37.0pt (12)

where \tilde{E}_{\epsilon,g}(\cdot) differs from E_{\epsilon,g}(\cdot) in (10) only by some absolute multiplicative factor, and, therefore, satisfies (5) if (6) holds for \overline{U}_{\xi}(w,f). Here v_{\epsilon} is bounded by some absolute constant and completely determined by \epsilon and \mathcal{W}. We provide an explicit expression for v_{\epsilon}.

Thus, requirement (i) leads to a new type of uniform bounds that are random. A natural question about sharpness of these bounds arises. In order to give an answer to this question, we prove that under mild assumptions on the class of weights \mathcal{W} one can choose \epsilon\,{=}\,\epsilon_{n}\,{\to}\,0 and y_{n}\,{\to}\,\infty as n\,{\to}\,\infty so that

and there exists \tilde{\epsilon}_{n}\to 0, n\to\infty such that for any subset \mathcal{W}_{0}\subseteq\mathcal{W} and any q\geq 1

 \mathbb{E}\Bigl{[}\sup_{w\in\mathcal{W}_{0}}\hat{U}_{\xi}(w)\Bigr{]}^{q}\leq% \Bigl{[}(1+\tilde{\epsilon}_{n})\sup_{w\in\mathcal{W}_{0}}\overline{U}_{\xi}(w% ,f)\Bigr{]}^{q}+R_{n}(\mathcal{W}_{0}),

where the remainder term R_{n}(\mathcal{W}_{0}) is asymptotically negligible in the sense that for any \ell>0 one has \limsup_{n\to\infty}\sup_{f\in\mathcal{F}}\sup_{\mathcal{W}_{0}\subseteq% \mathcal{W}}[n^{\ell}R_{n}(\mathcal{W}_{0})]=0. Here \mathcal{F} denotes the set of all probability densities uniformly bounded by a given constant [see (33)]. These results show that in asymptotic terms the random uniform bound is almost as good as the nonrandom one, and there is no loss of sharpness due to the use of the random uniform bound.

### 1.3 Summary of results and organization of the paper

In this paper, we develop a general machinery for finding uniform upper bounds on sub-additive positive functionals of sums of independent random functions. We start with the general setting as outlined in Section 1.1 above, and establish inequalities of the type (2) and (3) (see Proposition 2). Proofs of these results are based on the chaining and slicing/peeling techniques. The distinctive feature of our approach is that \Theta is assumed to be an image of a subset {\mathbb{Z}}, of a metric space under some continuous mapping \phi, that is, \Theta=\phi({\mathbb{Z}}) as in (4). Then chaining on \Theta is performed according to the distance induced on \Theta by the mapping \phi.

Section 3 is devoted to a systematical study of the {\mathbb{L}}_{s}-norm of the empirical process \{\xi_{w},w\in\mathcal{W}\}. First, we derive an inequality on the tail probability of \|\xi_{w}\|_{s,\tau} for an individual function w\in\mathcal{W} (see Theorem 1 in Section 3.1). Here we use the Bernstein inequality for empirical processes proved by Bousquet (2002) and inequalities for norms of integral operators. Then in Section 3.2 we proceed with establishing uniform bounds. In Theorem 2 of Section 3.2.1, we derive uniform nonrandom bounds for \|\xi_{w}\|_{s,\tau}, w\in\mathcal{W} that hold for all s\geq 1. In the case s>2, the nonrandom bound depends on the density f; therefore, for s>2 we construct a random bound and present the corresponding result in Theorem 3. Theorems 2 and 3 hold for classes of weights \mathcal{W} satisfying rather general conditions. In Section 3.3, we specialize results of Theorems 2 and 3 to the classes \mathcal{W} of weights depending on the difference of their arguments. This allows us to derive explicit both nonrandom and random uniform bounds on \|\xi_{w}\|_{s,\tau} under conditions on the weights which can be easily interpreted. The corresponding results are given in Theorems 4 and 5. We also present some asymptotic corollaries which demonstrate sharpness of the derived uniform bounds. Section 3.4 applies the results of Theorems 4 and 5 to special examples of the set \mathcal{W}. In particular, we consider the kernel density estimator process given by (11), and the convolution kernel density estimator processes. It turns out that corresponding results can be formulated in a unified way, and they are presented in Theorem 7.

In Section 4, we study {\mathbb{L}}_{s}-norm of the regression-type processes \{\eta_{w},w\in\mathcal{W}\} given in (7). First, we derive an inequality on the tail probability of \|\eta_{w}\|_{s,\tau} for an individual function w\in\mathcal{W} (Theorem 8 in Section 4.1). This theorem is proved under two different types of conditions on the tail probability of the random variable \varepsilon. In Section 4.2, we present a nonrandom uniform bound for \|\eta_{w}\|_{s,\tau} for all s\geq 1 over the class of weights depending on the difference of their arguments. The corresponding result is given in Theorem 9, and some asymptotic results that follow from Theorem 9 are formulated in Corollary 7. Sections 510 contain proofs of main results of this paper. Proofs of auxiliary lemmas are given in the Appendix.

## 2 Uniform bounds in general setting

In this section, we establish uniform probability bounds for the supremum of a general sub-additive functional of a random process from the probability inequality for the individual process.

Let \mathfrak{S} and \mathfrak{H} be linear topological spaces, (\Omega,\mathfrak{A},\mathrm{P}) be a complete probability space, and let \xi_{\theta}\dvtx\Omega\,{\to}\,\mathfrak{S},\theta\,{\in}\,\mathfrak{H} be a family of random mappings such that:

• \xi_{\bullet}(\omega) is linear and continuous on \mathfrak{H} for any \omega\in\Omega;

• \xi_{\theta}(\cdot) is \mathfrak{A}-measurable for any \theta\in\mathfrak{H}.

Let \Psi\dvtx\mathfrak{S}\to{\mathbb{R}}_{+} be a given sub-additive functional, and \Theta be a fixed subset of \mathfrak{H}. {assumption} There exist functions A\dvtx\mathfrak{H}\to{\mathbb{R}}_{+}, B\dvtx\mathfrak{H}\to{\mathbb{R}}_{+}, U\dvtx\mathfrak{H}\to{\mathbb{R}}_{+} and g\dvtx{\mathbb{R}}_{+}\to{\mathbb{R}}_{+} such that:

1. [(iii)]

2. for any z>0

 {\mathrm{P}\{\Psi(\xi_{\theta})-U(\theta)\geq z\}\leq g\biggl{(}\frac{z^{2}}{A% ^{2}(\theta)+B(\theta)z}\biggr{)}\qquad\forall\theta\in\mathfrak{H};}
3. the function g is monotonically decreasing to 0;

4. 0<r:=\inf_{\theta\in\Theta}U(\theta)\leq\sup_{\theta\in\Theta}U(\theta)=:R\leq\infty.

Condition (i) is a Bernstein-type probability inequality on \Psi(\xi_{\theta}) for a fixed \theta\in\mathfrak{H}. In particular, in examples of Sections 3 and 4 we have g(x)=ce^{-x^{\alpha}} and g(x)=cx^{-p} for some c,\alpha,p>0. Based on Assumption 2, our goal is to derive uniform probability and moment bounds of the type (2) and (3). For this purpose, we suppose that the set \Theta is parametrized in a special way; this assumption facilitates the use of the standard chaining technique and leads to quite natural conditions on the functions U,A and B. {assumption} Let (\mathfrak{Z},\mathrm{d}) be a metric space, and let {\mathbb{Z}} be a totally bounded subset of (\mathfrak{Z},\mathrm{d}). There exists a continuous mapping \phi from \mathfrak{Z} to \mathfrak{H} such that

 \Theta=\{\theta\in\mathfrak{H}\dvtx\theta=\phi[\zeta],\zeta\in{\mathbb{Z}}\}.
{remark}

In statistical applications the set \Theta is parametrized in a natural way. For instance, if, as in the introduction section, \Psi(\cdot)=\|\cdot\|_{s,\tau} and \xi_{\theta}=\xi_{w} with w given by (11), then \Theta is parametrized by the kernel and bandwidth (K,h)\in\mathcal{K}\times\mathcal{H}. The distance \mathrm{d} on \mathcal{K}\times\mathcal{H} may have a rather special form.

Let Z be a subset of {\mathbb{Z}}. Define the following quantities:

 \displaystyle\varkappa_{U}(Z) \displaystyle:= \displaystyle\sup_{\zeta_{1},\zeta_{2}\in Z}\frac{U(\phi[\zeta_{1}]-\phi[\zeta% _{2}])}{\mathrm{d}(\zeta_{1},\zeta_{2})}\vee\sup_{\zeta\in Z}U(\phi[\zeta]), (13) \displaystyle\Lambda_{A}(Z) \displaystyle:= \displaystyle\sup_{\zeta_{1},\zeta_{2}\in Z}\frac{A(\phi[\zeta_{1}]-\phi[\zeta% _{2}])}{\mathrm{d}(\zeta_{1},\zeta_{2})}\vee\sup_{\zeta\in Z}A(\phi[\zeta]), (14) \displaystyle\Lambda_{B}(Z) \displaystyle:= \displaystyle\sup_{\zeta_{1},\zeta_{2}\in Z}\frac{B(\phi[\zeta_{1}]-\phi[\zeta% _{2}])}{\mathrm{d}(\zeta_{1},\zeta_{2})}\vee\sup_{\zeta\in Z}B(\phi[\zeta]). (15)

Let N_{Z,\mathrm{d}}(\delta) denote the minimal number of balls of radius \delta in the metric \mathrm{d} needed to cover the set Z, and let \mathcal{E}_{Z,\mathrm{d}}(\delta)=\ln[N_{Z,\mathrm{d}}(\delta)] be the \delta-entropy of Z. For any y>0 and \epsilon>0, put

 L^{(\epsilon)}_{g,Z}(y)=g(y)+\sum_{k=1}^{\infty}[N_{Z,\mathrm{d}}(\epsilon 2^{% -k})]^{2}g(9y2^{k-3}k^{-2}).

### Key propositions

The next two statements are the main results of this section. Define

where \Lambda_{A} and \Lambda_{B} are given in (14) and (15).

###### Proposition 1

Suppose that Assumptions 2 and 2 hold, and let Z be a subset of {\mathbb{Z}} such that \sup_{\zeta,\zeta^{\prime}\in Z}\mathrm{d}(\zeta,\zeta^{\prime})\leq\epsilon/4 and \varkappa_{U}(Z)<\infty. Then for all y>0 and \epsilon>0 one has

 \mathrm{P}\Bigl{\{}\sup_{\zeta\in Z}\Psi\bigl{(}\xi_{\phi[\zeta]}\bigr{)}\geq(% 1+\epsilon)[\varkappa_{U}(Z)+C^{*}(y,Z)]\Bigr{\}}\leq L^{(\epsilon)}_{g,Z}(y).
{remark}

Inspection of the proof of Proposition 1 shows that continuity of \xi_{\bullet} on \mathfrak{H} can be replaced by the assumption that \Psi(\xi_{\bullet}) is continuous \mathrm{P}-almost surely on \phi[{\mathbb{Z}}] in the distance \mathrm{d}. The latter assumption is often easier to verify in specific problems.

Define

 \Psi_{u}^{*}(y,Z):=\sup_{\zeta\in Z}\bigl{\{}\Psi\bigl{(}\xi_{\phi[\zeta]}% \bigr{)}-uC^{*}(y)U(\phi[\zeta])\bigr{\}},\qquad y>0, (17)

where Z\subseteq{\mathbb{Z}} is a subset of {\mathbb{Z}}, u\geq 1 is a constant, and C^{*}(\cdot) is the function defined below in (21). We derive bounds on the tail probability and qth moment of the random variable \Psi_{u}^{*}(y,{\mathbb{Z}}). Note that \Psi_{u}^{*}(y,{\mathbb{Z}}) is \mathfrak{A}-measurable for given y and u because the mapping \zeta\mapsto\xi_{\phi[\zeta]} is \mathrm{P}-almost surely continuous, and {\mathbb{Z}} is a totally bounded set. By the same reason the supremum taken over any subset of {\mathbb{Z}} will be measurable as well.

With r and R defined in Assumption 2(iii), for any a\in[r,R] consider the following subsets of {\mathbb{Z}}:

 {\mathbb{Z}}_{a}:=\{\zeta\in{\mathbb{Z}}\dvtx a/2

In words, for given a\in[r,R], {\mathbb{Z}}_{a} is the slice of the parameter values \zeta\in{\mathbb{Z}} for which the function U(\phi[\zeta]) takes values between a/2 and a.

In what follows, the quantities \varkappa_{U}({\mathbb{Z}}_{a}), \Lambda_{A}({\mathbb{Z}}_{a}), \Lambda_{B}({\mathbb{Z}}_{a}) and L_{g,{\mathbb{Z}}_{a}}^{(\epsilon)}(y) will be considered as functions of a\in[r,R]. That is why, with slight abuse of notation, we will write

 \varkappa_{U}(a):=\varkappa_{U}({\mathbb{Z}}_{a}),\qquad L_{g}^{(\epsilon)}(y,% a):=L_{g,{\mathbb{Z}}_{a}}^{(\epsilon)}(y). (19)

Put also

and let the function C^{*}(\cdot) in (17) be defined as

###### Proposition 2

Suppose that Assumptions 2 and 2 hold, and let\varkappa_{U}({\mathbb{Z}})<\infty. If

 \varkappa_{U}(a)\leq a\qquad\forall a\in[r,R], (22)

and if u_{\epsilon}=2^{\epsilon}(1+\epsilon) then for any \epsilon\in(0,1], y>0 and any q\geq 1 one has

 \displaystyle\qquad\mathrm{P}\{\Psi_{u_{\epsilon}}^{*}(y,{\mathbb{Z}})\geq 0\} \displaystyle\leq \displaystyle N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)\sum_{j=0}^{[\epsilon^{-1% }\log_{2}(R/r)-1]_{+}}L^{(\epsilon)}_{g}\bigl{(}y,r2^{\epsilon(j+1)}\bigr{)}, (23) \displaystyle\mathrm{E}[\Psi_{u_{\epsilon}}^{*}(y,{\mathbb{Z}})]^{q}_{+} \displaystyle\leq \displaystyle N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)\bigl{[}u_{\epsilon}C^{*}% (y)\bigr{]}^{q} \displaystyle{}\times\sum_{j=0}^{[\epsilon^{-1}\log_{2}(R/r)-1]_{+}}\bigl{[}r2% ^{\epsilon(j+1)}\bigr{]}^{q}J^{(\epsilon)}_{g}\bigl{(}y,r2^{\epsilon(j+1)}% \bigr{)},

where J^{(\epsilon)}_{g}(z,a):=q\int_{1}^{\infty}(x-1)^{q-1}L^{(\epsilon)}_{g}(zx,a)% \,{d}x.

{remark}
1. Proposition 1 establishes an upper bound on the tail probability of the supremum of \Psi(\xi_{\phi[\zeta]}) over an arbitrary subset of {\mathbb{Z}} contained in a ball of radius \epsilon/8 in the metric \mathrm{d}. The proof of Proposition 2 uses this bound for balls Z of the radius \epsilon/8 that form a covering of {\mathbb{Z}}. Each ball Z is divided on slices on which the value of U(\phi[\zeta]) is roughly the same. Then the supremum over {\mathbb{Z}} is bounded by the sum of suprema over the slices. This simple technique is often used in the literature on empirical processes where it is referred to as peeling or slicing [see, e.g., van de Geer (2000), Section 5.3, and Giné and Koltchinskii (2006)].

2. Note that Proposition 2 holds for any distance \mathrm{d} on \mathfrak{Z}. Therefore, if \varkappa_{U}(a) is proportional to a, condition (22) can be enforced by rescaling the distance \mathrm{d}.

We now present a useful bound that can be easily derived from (23) and (2). Let

 L^{(\epsilon)}_{g}:=\sum_{k=1}^{\infty}[N_{{\mathbb{Z}},\mathrm{d}}(\epsilon 2% ^{-k})]^{2}\sqrt{g(9\cdot 2^{k-3}k^{-2})}. (25)

Note that for all Z\subseteq{\mathbb{Z}} and y\geq 1

 L^{(\epsilon)}_{g,Z}(y)\leq g(y)+L^{(\epsilon)}_{g}\sqrt{g(y)},

because \inf_{k\geq 1}2^{k}(k)^{-2}=8/9 and g is monotone decreasing. Therefore, we arrive to the following corollary to Proposition 2.

###### Corollary 1

If the assumptions of Proposition 2 hold, and L^{(\epsilon)}_{g}<\infty then for all y\geq 1 and \epsilon\in(0,1]

 \displaystyle\mathrm{P}\{\Psi_{u_{\epsilon}}^{*}(y,{\mathbb{Z}})\geq 0\} \displaystyle\leq \displaystyle N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)[1\vee\epsilon^{-1}\log_{% 2}(R/r)]\bigl{[}g(y)+L^{(\epsilon)}_{g}\sqrt{g(y)}\bigr{]}, \displaystyle\mathrm{E}[\Psi_{u_{\epsilon}}^{*}(y,{\mathbb{Z}})]^{q}_{+} \displaystyle\leq \displaystyle N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)[2^{2\epsilon}R(1+% \epsilon)C^{*}(y)]^{q}[2^{q\epsilon}-1]^{-1}J^{(\epsilon)}_{g}(y),

where J^{(\epsilon)}_{g}(z)=q\int_{1}^{\infty}(x-1)^{q-1}[g(zx)+L^{(\epsilon)}_{g}% \sqrt{g(zx)}]\,{d}x.

## 3 Uniform bounds for norms of empirical processes

Based on the results obtained in Propositions 1 and 2, in this section we develop uniform bounds for the family \{\|\xi_{w}\|_{s,\tau},w\in\mathcal{W}\}, where \xi_{w} is defined in (7). The first step here is to check Assumption 2. For this purpose, we establish an exponential inequality for \|\xi_{w}\|_{s,\tau} when the function w\in\mathcal{W} is fixed. Next, using Corollary 1 we derive a nonrandom uniform bound and establish corresponding inequalities of the type (8) and (10) satisfying requirements (i)–(iv) of the Introduction. We develop also a random uniform bound based on X_{1},\ldots,X_{n} and derive an inequality of the type (12).

To proceed, we need the following assumption.

{assm}

Let \overline{\mathcal{X}} be a countable dense subset of \mathcal{X}. For any \varepsilon>0 and any x\in\mathcal{X}, there exists \overline{x}\in\overline{\mathcal{X}} such that

 \|w(\cdot,x)-w(\cdot,\overline{x})\|_{s,\tau}\leq\varepsilon.

In the sequel, we consider only the sets \mathcal{W} of (\mathfrak{T}\times\mathfrak{X})-measurable functions satisfying Assumption 3. Let

 \nu^{\prime}({d}x)=f(x)\nu({d}x),

and for any s\in[1,\infty] define

 \displaystyle\Sigma_{s}(w,f): \displaystyle= \displaystyle\biggl{[}\int\|w(t,\cdot)\|^{s}_{2,\nu^{\prime}}\tau({d}t)\biggr{% ]}^{1/s} (26) \displaystyle= \displaystyle\biggl{[}\int\biggl{(}\int|w(t,x)|^{2}f(x)\nu({d}x)\biggr{)}^{s/2% }\tau({d}t)\biggr{]}^{1/s}, \displaystyle M_{s,\tau,\nu^{\prime}}(w): \displaystyle= \displaystyle\sup_{x\in\mathcal{X}}\|w(\cdot,x)\|_{s,\tau}\vee\sup_{t\in% \mathcal{T}}\|w(t,\cdot)\|_{s,\nu^{\prime}}, \displaystyle M_{s}(w): \displaystyle= \displaystyle M_{s,\tau,\nu}(w).

Let c_{1}(s):=15s/\ln s, s>2, c_{2}(s) be the constant appearing below in inequality (83) of Lemma 3, and define

 \displaystyle c_{3}(s) \displaystyle:= \displaystyle c_{1}(s)\vee c_{2}\bigl{(}s/(s-1)\bigr{)}\qquad\forall s>2, (27) \displaystyle c_{*}(s) \displaystyle:= \displaystyle\cases{0,&\quad$1\leq s<2$,\cr 1,&\quad$s=2$,\cr c_{3}(s),&\quad$% s>2$.}

It is worth mentioning that c_{1}(s) is the best known constant in the Rosenthal inequality [see Johnson, Schechtman and Zinn (1985)], and in many particular examples c_{2}(s)=1 (see Lemma 3 below). Although c_{1}(s) is defined for s>2 only, it will be convenient to set c_{1}(s)=1 if s\in[1,2]. We use this convention in what follows without further mention.

### 3.1 Probability bounds for fixed weight function

For any w\in\mathcal{W}, we define

 \displaystyle\rho_{s}(w,f) \displaystyle:= \displaystyle\cases{\bigl{[}\sqrt{n}\Sigma_{s}(w,f)\bigr{]}\wedge[4n^{1/s}M_{s% }(w)],&\quad$s<2$,\cr\sqrt{n}M_{2}(w),&\quad$s=2$,\cr c_{1}(s)\bigl{[}\sqrt{n}% \Sigma_{s}(w,f)+2n^{1/s}M_{s}(w)\bigr{]},&\quad$s>2$,} (28) \displaystyle\omega^{2}_{s}(w,f) \displaystyle:= \displaystyle\cases{M^{2}_{s}(w)[14n+96n^{1/s}],&\quad$s<2$,\cr 6nM^{2}_{1,% \tau,\nu^{\prime}}(w)+24\sqrt{n}M^{2}_{2}(w),&\quad$s=2$,}

and if s>2 then we set

 \displaystyle\omega^{2}_{s}(w,f) \displaystyle:= \displaystyle 6c_{3}(s)\bigl{[}nM^{2}_{{2s}/({s+2}),\tau,\nu^{\prime}}(w) \displaystyle          {}+4\sqrt{n}\Sigma_{s}(w,f)M_{s}(w)+8n^{1/s}M^{2}_{s}(w% )\bigr{]}.
###### Theorem 1

Let s\in[1,\infty) be fixed, and suppose that Assumption 3 holds. If M_{s}(w)<\infty, then for any z>0

 \displaystyle{\mathbb{P}}\{\|\xi_{w}\|_{s,\tau}\geq\rho_{s}(w,f)+z\} (30) \displaystyle\qquad\leq\exp\biggl{\{}-\frac{z^{2}}{({1}/{3})\omega_{s}^{2}(w,f% )+({4}/{3})c_{*}(s)M_{s}(w)z}\biggr{\}},

where c_{*}(s) is given in (27).

{remark}

Because c_{*}(s)=0 for s\in[1,2), the distribution of the random variable \|\xi_{w}\|_{s,\tau} has a sub-Gaussian tail. In this case, similar bounds can be obtained from the inequalities given in Pinelis (1990), Theorem 2.1, Pinelis (1994), Theorems 3.3–3.5, and Ledoux and Talagrand (1991), Section 6.3. In particular, Theorem 1.2 of Pinelis (1990) gives the upper bound \exp\{-z^{2}/2nM_{s}^{2}(w)\} which is better by a constant factor than our upper bound in (1) whenever s\in[1,2). However, if s\geq 2 then the cited results are not accurate enough in the sense that the corresponding bounds do not satisfy relations (5) and (6) of the Introduction. It seems that only concentration principle leads to tight upper bounds; that is why we use this unified method in our derivation.

It is obvious that the upper bound of Theorem 1 remains valid if we replace \rho_{s}(w,f), \omega^{2}_{s}(w,f) and M_{s}(w) by their upper bounds. The next result can be derived from Theorem 1 in the case s\in[1,2).

###### Corollary 2

Let s\in[1,2) be fixed, and suppose that Assumption 3 holds. If M_{s}(w)<\infty then for every z>0 and for all n\geq 1

 {\mathbb{P}}\{\|\xi_{w}\|_{s,\tau}\geq 4n^{1/s}M_{s}(w)+z\}\leq\exp\biggl{\{}-% \frac{z^{2}}{37nM^{2}_{s}(w)}\biggr{\}}.

The result of the corollary is valid without any conditions on the density f. Moreover, neither the bound for \|\xi_{w}\|_{s,\tau}, nor the right-hand side of the inequality depend on f. It is important to realize that the probability inequality of Corollary 2 is sharp in some cases. In particular, it is not too difficult to construct a density f such that \Sigma_{s}(w,f)=+\infty for any function w satisfying rather general assumptions. In this case, the established inequality seems to be sharp. On the other hand, for any density f satisfying a moment condition \sqrt{n}\Sigma_{s}(w,f) can be bounded from above, up to a numerical constant, by \sqrt{n}M_{2}(w) which is typically much smaller than n^{1/s}M_{s}(w).

Several useful bounds can be derived from Theorem 1. In particular, it is shown at the end of the proof of Theorem 1 that for all s\geq 2 and p\geq 1

Using these inequalities, we arrive to the following result.

###### Corollary 3

Let s>2 be fixed, and suppose that Assumption 3 holds. If M_{s}(w)<\infty, then for every z>0 and for all n\geq 1

where \tilde{\rho}_{s}(w,f):=c_{1}(s)[\sqrt{n}M_{2}(w)\|\sqrt{f}\|_{s,\nu}+2n^{1/s}M% _{s}(w)] and

 \displaystyle\tilde{\omega}^{2}_{s}(w,f) \displaystyle:= \displaystyle 6c_{3}(s)\bigl{\{}n[1\vee\|f\|_{\infty}]^{({s+2})/{s}}M^{2}_{{2s% }/({s+2})}(w) \displaystyle          {}+4\sqrt{n}M_{2}(w)M_{s}(w)\bigl{\|}\sqrt{f}\bigr{\|}_% {s,\nu}+8n^{1/s}M^{2}_{s}(w)\bigr{\}}.

### 3.2 Uniform bounds

Theorem 1 together with Corollaries 2 and 3 ensures that Assumption 2 is fulfilled for \|\xi_{w}\|_{s,\tau}. In this section, we use Proposition 2 together with Theorem 1 in order to derive a uniform over \mathcal{W} bounds on \|\xi_{w}\|_{s,\tau}.

Following the general setting of Section 2, we assume that \mathcal{W} is a parametrized set of weights, that is,

 \mathcal{W}=\{w:w=\phi[\zeta],\zeta\in{\mathbb{Z}}\}, (32)

where {\mathbb{Z}} is a totally bounded subset of some metric space (\mathfrak{Z},\mathrm{d}). Thus, any w\in\mathcal{W} can be represented as w=\phi[\zeta] for some \zeta\in{\mathbb{Z}}. Recall that N_{{\mathbb{Z}},\mathrm{d}}(\delta), \delta>0 stands for the minimal number of balls of radius \delta in the metric \mathrm{d} needed to cover the set {\mathbb{Z}}, and \mathcal{E}_{{\mathbb{Z}},\mathrm{d}}(\delta)=\ln[N_{{\mathbb{Z}},\mathrm{d}}(% \delta)] is the \delta-entropy of {\mathbb{Z}}.

The next assumption requires that the mapping \zeta\mapsto\phi[\zeta]=w be continuous in the supremum norm. {assm} For every \varepsilon>0, there exists \gamma>0 such that for all \zeta_{1},\zeta_{2}\in{\mathbb{Z}} satisfying \mathrm{d}(\zeta_{1},\zeta_{2})\leq\gamma one has

 {\sup_{t}\sup_{x}}|w_{1}(t,x)-w_{2}(t,x)|\leq\varepsilon,

where w_{1}(t,x)=\phi[\zeta_{1}](t,x) and w_{2}(t,x)=\phi[\zeta_{2}](t,x).

Because \xi_{w} is linear in w, this assumption along with Assumption 3 guarantees that all the considered objects are measurable.

Let \mathcal{F} be the class of all probability densities uniformly bounded by constant \mathrm{f}_{\infty},

 \mathcal{F}:=\biggl{\{}p\dvtx{\mathbb{R}}^{d}\to{\mathbb{R}}\dvtx p\geq 0,\int p% =1,\|p\|_{\infty}\leq\mathrm{f}_{\infty}<\infty\biggr{\}}. (33)

It is easily seen that the inequalities of Theorem 2 and Corollary 3 can be made uniform with respect to the class \mathcal{F}. Indeed, the bound of Corollary 3 remains valid if one replaces \|f\|_{\infty} and \|\sqrt{f}\|_{s,\nu} by \mathrm{f}_{\infty} and \mathrm{f}_{\infty}^{1/2-1/s}, respectively. From now on, we suppose without loss of generality that \mathrm{f}_{\infty}\geq 1.

#### 3.2.1 Uniform nonrandom bound

Theorem 1 together with Corollaries 2 and 3 show that Assumption 2 is fulfilled for \|\xi_{w}\|_{s,\tau} with g(x)=e^{-x},

 \displaystyle U(w) \displaystyle= \displaystyle U_{\xi}(w,f) \displaystyle: \displaystyle= \displaystyle\cases{4n^{1/s}M_{s}(w),&\quad$s\in[1,2)$,\cr\sqrt{n}M_{2}(w),&% \quad$s=2$,\cr c_{1}(s)\bigl{[}\sqrt{n}\Sigma_{s}(w,f)+2n^{1/s}M_{s}(w)\bigr{]% },&\quad$s>2$;} (34) \displaystyle A^{2}(w) \displaystyle= \displaystyle A_{\xi}^{2}(w) \displaystyle: \displaystyle= \displaystyle\cases{37nM^{2}_{s}(w),&\quad$s<2$,\cr 2\mathrm{f}^{2}_{\infty}nM% ^{2}_{1}(w)+8\sqrt{n}M^{2}_{2}(w),&\quad$s=2$,\cr 2c_{3}(s)\mathrm{f}^{2}_{% \infty}\bigl{[}nM^{2}_{{2s}/({s+2})}(w)+4\sqrt{n}M_{2}(w)M_{s}(w)\cr\hskip 141% .0pt+\,8n^{1/s}M^{2}_{s}(w)\bigr{]},&\quad$s>2$;}

and B(w)=B_{\xi}(w):=\frac{4}{3}c_{*}(s)M_{s}(w), where c_{*}(s) is defined in (27).

Put

 r_{\xi}:=\inf_{w\in\mathcal{W}}U_{\xi}(w,f),\qquad R_{\xi}:=\sup_{w\in\mathcal% {W}}U_{\xi}(w,f). (35)

Let \varkappa_{U_{\xi}}(\cdot) be given by (19) with U=U_{\xi}, and

where \Lambda_{A} and \Lambda_{B} are defined in (20); see also (21).

###### Theorem 2

Let s\geq 1 be fixed, (32) hold, and let f\in\mathcal{F} if s\geq 2. Let Assumption 3.2 be fulfilled. If \varkappa_{U_{\xi}}(a)\leq a for all a\in[r_{\xi},R_{\xi}] then for any y\geq 1 and \epsilon\in(0,1] one has

 \displaystyle{\mathbb{P}}\Bigl{\{}\sup_{w\in\mathcal{W}}[\|\xi_{w}\|_{s,\tau}-% u_{\epsilon}C_{\xi}^{*}(y)U_{\xi}(w,f)]\geq 0\Bigr{\}} \displaystyle\qquad\leq\frac{1}{\epsilon}N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/% 8)[1\vee\log_{2}(R_{\xi}/r_{\xi})]\bigl{[}1+L^{(\epsilon)}_{\exp}\bigr{]}e^{-y% /2}, \displaystyle\mathbb{E}\sup_{w\in\mathcal{W}}[\|\xi_{w}\|_{s,\tau}-u_{\epsilon% }C_{\xi}^{*}(y)U_{\xi}(w,f)]_{+}^{q} \displaystyle\qquad\leq\frac{2^{q(\epsilon+1)}u_{\epsilon}^{q}}{2^{q\epsilon}-% 1}\Gamma(q+1)N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)[R_{\xi}C_{\xi}^{*}(1)]^{q% }\bigl{[}1+L^{(\epsilon)}_{\exp}\bigr{]}e^{-y/2},

where u_{\epsilon}=2^{\epsilon}(1+\epsilon), \Gamma(\cdot) is the gamma-function and

 L^{(\epsilon)}_{\exp}=\sum_{k=1}^{\infty}\exp\{2\mathcal{E}_{{\mathbb{Z}},% \mathrm{d}}(\epsilon 2^{-k})-(9/16)2^{k}k^{-2}\}.\vspace*{-3pt} (36)

The proof follows immediately by application of Corollary 1, and noting that for g(x)=e^{-x} the quantity L_{g}^{(\epsilon)} is given by the above formula [cf. (25)], while J_{g}^{(\epsilon)}(\cdot) for g(x)=e^{-x} is bounded as follows

 \displaystyle J_{g}^{(\epsilon)}(z) \displaystyle= \displaystyle q\int_{1}^{\infty}(x-1)^{q-1}\bigl{[}e^{-zx}+L_{g}^{(\epsilon)}% \sqrt{e^{-zx}}\bigr{]}\,{d}x \displaystyle\leq \displaystyle\Gamma(q+1)\bigl{[}1+L^{(\epsilon)}_{\exp}\bigr{]}(2/z)^{q}e^{-z/% 2}.
{remark}

It is instructive to compare the results of Theorem 2 with those of Theorem 1 (and Corollaries 2 and 3). The uniform bound on \|\xi_{w}\|_{s,\tau} in Theorem 2 is determined by the individual bound U_{\xi}(w,f) for a fixed weight w\in\mathcal{W}, and by the function C_{\xi}^{*}(\cdot) which, in its turn, is computed on the basis of A_{\xi}(w), B_{\xi}(w) and U_{\xi}(w,f). The function C_{\xi}^{*}(\cdot) depends on the parametrization (32) and on the distance \mathrm{d} on \mathfrak{Z} via the quantities \Lambda_{A} and \Lambda_{B} [see (20)]. The right-hand sides of the inequalities in Theorem 2 depend on massiveness of the set of weights \mathcal{W} as measured by the entropy \mathcal{E}_{{\mathbb{Z}},\mathrm{d}}(\cdot). Note also that these bounds decrease exponentially in y.

#### 3.2.2 Uniform random bound

\!\!\!The uniform nonrandom bounds on \|\xi_{w}\|_{s,\tau} given in Theorem 2 depend on the density f via U_{\xi}(w,f). As discussed in the Introduction, this does not allow one to use this bound in statistical problems. Our goal is to recover the statement of Theorem 2 (up to some numerical constants) with the unknown quantity U_{\xi}(w,f) replaced by its estimator \hat{U}_{\xi}(w). Note also that U_{\xi}(w,f) of Theorem 2 depends on f only if s>2; here the quantity depending on f is \Sigma_{s}(w,f).

Assume that the conditions of Theorem 2 are satisfied, and let s>2. For any t\in\mathcal{T} define

 \displaystyle\hat{\Sigma}_{s}(w) \displaystyle:= \displaystyle\|S_{w}\|_{s,\tau},\qquad S^{2}_{w}(t):=\frac{1}{n}\sum_{i=1}^{n}% w^{2}(t,X_{i}), (37) \displaystyle\hat{U}_{\xi}(w) \displaystyle:= \displaystyle c_{1}(s)\bigl{[}\sqrt{n}\hat{\Sigma}_{s}(w)+2n^{1/s}M_{s}(w)% \bigr{]}. (38)

It is easily seen that \hat{U}_{\xi}(w) is a reasonable estimate of U_{\xi}(w,f) because under mild assumptions for any fixed t\in\mathcal{T} by the law of large numbers

Moreover,

 \displaystyle|\hat{\Sigma}_{s}(w)-\Sigma_{s}(w,f)|^{2} \displaystyle\leq \displaystyle\bigl{\|}S_{w}-\|w(\cdot,\cdot)\|_{2,\nu^{\prime}}\bigr{\|}^{2}_{% s,\tau} \displaystyle\leq \displaystyle\bigl{\|}\sqrt{\bigl{|}S^{2}_{w}-\|w(\cdot,\cdot)\|^{2}_{2,\nu^{% \prime}}\bigr{|}}\bigr{\|}^{2}_{s,\tau} \displaystyle= \displaystyle\bigl{\|}S^{2}_{w}-\|w(\cdot,\cdot)\|^{2}_{2,\nu^{\prime}}\bigr{% \|}_{{s}/{2},\tau} \displaystyle= \displaystyle\Biggl{\|}\frac{1}{n}\sum_{i=1}^{n}[w^{2}(\cdot,X_{i})-\mathbb{E}% w^{2}(\cdot,X)]\Biggr{\|}_{{s}/{2},\tau}.

Thus, for any s>2 we have

 |\hat{\Sigma}_{s}(w)-\Sigma_{s}(w,f)|\leq\sqrt{\frac{\|\xi_{w^{2}}\|_{{s}/{2},% \tau}}{n}}, (39)

that is, the difference between \hat{\Sigma}_{s}(w) and \Sigma_{s}(w,f) is controlled in terms of \|\xi_{w^{2}}\|_{s/2,\tau}. The idea now is to use Theorem 2 in order to find a nonrandom upper bound on \|\xi_{w^{2}}\|_{s/2,\tau}. One can expect that this bound will be much smaller than \Sigma_{s}(w,f) provided that the function w is small enough. If this is true then \hat{\Sigma}_{s}(w) approximates well \Sigma_{s}(w,f), and it can be used instead of \Sigma_{s}(w,f) in the definition of the uniform over \mathcal{W} upper bound on \|\xi_{w}\|_{s,\tau}.

In order to control uniformly \|\xi_{w^{2}}\|_{s/2,\tau} by applying Theorem 1 and Corollary 1, we need the following definitions. Put

 \displaystyle\tilde{U}(w^{2}) \displaystyle:= \displaystyle\cases{4n^{2/s}M_{s/2}(w^{2}),&\quad$s\in(2,4)$,\cr c_{1}(s/2)% \bigl{[}\mathrm{f}^{1/2}_{\infty}\sqrt{n}M_{2}(w^{2})+2n^{2/s}M_{s/2}(w^{2})% \bigr{]},&\quad$s\geq 4$;} \displaystyle\tilde{A}^{2}(w^{2}) \displaystyle:= \displaystyle\cases{37nM^{2}_{s/2}(w^{2}),&\quad$s\in(2,4)$,\cr 2c_{3}(s/2)% \mathrm{f}^{2}_{\infty}\bigl{[}nM^{2}_{{2s}/({s+4})}(w^{2})\cr\hskip 55.2pt+\,% 4\sqrt{n}M_{2}(w^{2})M_{s/2}(w^{2})\cr\hskip 88.8pt+\,8n^{2/s}M^{2}_{s/2}(w^{2% })\bigr{]},&\quad$s\geq 4$;}

and \tilde{B}(w^{2}):=\frac{4}{3}c_{*}(s/2)M_{s/2}(w^{2}), where c_{*}(\cdot) is given in (27).

For any subset Z\subseteq{\mathbb{Z}}, let \varkappa_{\tilde{U}}(Z), \Lambda_{\tilde{A}}(Z), and \Lambda_{\tilde{B}}(Z) be given by (13)–(15) with U=\tilde{U}, A=\tilde{A} and B=\tilde{B}. With r_{\xi} and R_{\xi} defined in (35), let

 {\mathbb{Z}}_{a}=\{\zeta\in{\mathbb{Z}}\dvtx a/2

and we set

 \displaystyle\varkappa_{\tilde{U}}(a): \displaystyle= \displaystyle\varkappa_{\tilde{U}}({\mathbb{Z}}_{a}),\qquad\lambda_{\tilde{A}}% =\sup_{a\in[r_{\xi},R_{\xi}]}a^{-2}\Lambda_{\tilde{A}}({\mathbb{Z}}_{a}), (41) \displaystyle\lambda_{\tilde{B}} \displaystyle= \displaystyle\sup_{a\in[r_{\xi},R_{\xi}]}a^{-2}\Lambda_{\tilde{B}}({\mathbb{Z}% }_{a}),

[cf. (19) and (20)]. It is important to emphasize here that in the definition of \varkappa{}_{\tilde{U}}, \lambda_{\tilde{A}} and \lambda_{\tilde{B}} we use the same set {\mathbb{Z}}_{a} as in the definition of \varkappa_{U_{\xi}}(\cdot), \Lambda_{A_{\xi}}(\cdot) and \Lambda_{B_{\xi}}(\cdot).

The next result establishes a random uniform bound on \|\xi_{w}\|_{s,\tau}.

###### Theorem 3

Let s>2 be fixed, (32) hold, Assumption 3.2 be fulfilled, and

 \varkappa_{U_{\xi}}(a)\leq a\qquad\forall a\in[r_{\xi},R_{\xi}]. (42)

Let \epsilon\in(0,1] be fixed, and suppose that there exists a positive number \gamma<[4c_{1}(s)(1+\epsilon)]^{-1} such that

 \varkappa_{\tilde{U}}(a)\leq(\gamma a)^{2}\qquad\forall a\in[r_{\xi},R_{\xi}]. (43)

If y_{\gamma} denotes the root of the equation

 \sqrt{y}\lambda_{\tilde{A}}+y\lambda_{\tilde{B}}=\gamma^{2}, (44)

and if y_{\gamma}>1 then:

1. [(ii)]

2. For every y\in[1,y_{\gamma}] one has

 \mathbb{E}\sup_{w\in\mathcal{W}}\{\|\xi_{w}\|_{s,\tau}-\overline{u}_{\epsilon}% (\gamma)C_{\xi}^{*}(y)\hat{U}_{\xi}(w)\}^{q}_{+}\leq T_{1,\epsilon}[C_{\xi}^{*% }(y)]^{q}\exp\{-y/2\},

where \overline{u}_{\epsilon}(\gamma):=u_{\epsilon}[1-4c_{1}(s)(1+\epsilon)\gamma]^{% -1} and u_{\epsilon}=2^{\epsilon}(1+\epsilon).

3. For any subset \mathcal{W}_{0}\subseteq\mathcal{W}, one has

 \displaystyle\mathbb{E}\Bigl{[}\sup_{w\in\mathcal{W}_{0}}\hat{U}_{\xi}(w)\Bigr% {]}^{q} \displaystyle\leq \displaystyle[1+4c_{1}(s)(1+\epsilon)\gamma]^{q}\sup_{w\in\mathcal{W}_{0}}[U_{% \xi}(w,f)]^{q} \displaystyle{}+T_{2,\epsilon}\Bigl{[}\sqrt{n}\sup_{w\in\mathcal{W}_{0}}M_{s}(% w)\Bigr{]}^{q}\exp\{-y_{\gamma}/2\}.

The explicit expressions for the constants T_{1,\epsilon} and T_{2,\epsilon} are given in the beginning of proof of the theorem.

{remark}
1. Theorem 3 requires two sets of conditions: conditions of Theorem 2, and conditions on behavior of the functions \varkappa_{\tilde{U}}(\cdot), \Lambda_{\tilde{A}}(\cdot) and \Lambda_{\tilde{B}}(\cdot) on the slices {\mathbb{Z}}_{a} defined through U_{\xi}(w,f).

2. The parameter \gamma controls closeness of \hat{U}_{\xi}(\cdot) to U_{\xi}(\cdot,f): the smaller \gamma, the closer the random bound \hat{U}_{\xi}(\cdot) to the nonrandom one U_{\xi}(\cdot,f) [see (3)]. In this case, we do not loose much if U_{\xi}(\cdot,f) is replaced by its empirical counterpart \hat{U}_{\xi}(w). Clearly, it is possible to choose \gamma small and simultaneously to keep y_{\gamma} large only if \lambda_{\tilde{A}} and \lambda_{\tilde{B}} are small enough. Fortunately, this is the case in many examples.

3. Note also that when \gamma approaches [4c_{1}(s)(1+\epsilon)]^{-1} the parameter \overline{u}_{\epsilon}(\gamma) increases to infinity [clearly, we want to keep \overline{u}_{\epsilon}(\gamma) as close to one as possible]. Thus, the assumption \gamma<[4c_{1}(s)(1+\epsilon)]^{-1} is important; this poses a restriction on the parameter set \mathcal{W}. We conjecture that the following condition is necessary: for given s\geq 2 there exists a universal constant, say, c(s), such that \gamma<c(s).

The next corollary to Theorem 3 will be useful in what follows.

###### Corollary 4

The statements of Theorem 3 remain valid if U_{\xi}(w,f) and \hat{U}_{\xi}(w) are redefined as \max\{U_{\xi}(w,f),\sqrt{n}M_{2}(w)\} and \max\{\hat{U}_{\xi}(w),\break\sqrt{n}M_{2}(w)\}, respectively.

### 3.3 Unifrom bounds for classes of weights depending on the difference of arguments

As we have seen, the results and assumptions in Theorems 2 and 3 are stated in terms of the quantities (such as \lambda_{\tilde{A}}, \lambda_{\tilde{B}}, y_{\gamma}) that are given implicitly. In particular, additional computations are still necessary in order to apply Theorems 2 and 3 in specific problems. In this section, we specialize the results of Theorems 2 and 3 for the classes of weights \mathcal{W} depending on the difference of arguments. Under natural and easily interpretable assumptions on the class of such weights, we derive explicit uniform bounds on the norms of empirical processes.

Throughout this section, \mathcal{X}=\mathcal{T}={\mathbb{R}}^{d}, \tau=\nu=\operatorname{mes} is the Lebesgue measure and we write \|\cdot\|_{s} instead of \|\cdot\|_{s,\tau}. In this section, the class of weights \mathcal{W} is a set of functions from {\mathbb{R}}^{d}\times{\mathbb{R}}^{d} to {\mathbb{R}} of the following form

 \mathcal{W}=\{w(t-x),w\in\mathcal{V}\}, (46)

where \mathcal{V} is a given set of d-variate functions. For the sake of notational convenience, we will identify the weight w\in\mathcal{W} with the d-variate function w\in\mathcal{V} in the definition of the process \xi_{w} and the quantities such as U_{\xi}, A_{\xi}, B_{\xi} etc. Thus, when we write w\in\mathcal{W} we mean the weight w(\cdot-\cdot) while w\in\mathcal{V} denotes the corresponding d-variate function; this should not lead to a confusion.

Let (\mathfrak{Z},\mathrm{d}) be a fixed metric space; as before, we suppose that \mathcal{V} is parametrized by the parameter \zeta\in\mathfrak{Z}, that is,

 \mathcal{V}=\{w\dvtx w=\phi[\zeta],\zeta\in{\mathbb{Z}}\}, (47)

where {\mathbb{Z}} is a totally bounded subset of the metric space (\mathfrak{Z},\mathrm{d}). Recall that N_{{\mathbb{Z}},\mathrm{d}}(\delta), \delta>0 is the number of the balls of radius \delta in the metric \mathrm{d} that form a minimal covering of the set {\mathbb{Z}}.

We need the following assumptions on the class of weights \mathcal{W} (the functional set \mathcal{V}). {assW}

1. [(W1)]

2. The Lebesgue measure of support of all functions from \mathcal{V} is finite, that is,

 \mu_{*}:=\sup_{w\in\mathcal{V}}\operatorname{mes}\{\operatorname{supp}(w)\}<\infty. (48)
3. There exist real numbers \alpha_{1}\in(0,1) and \alpha_{2}\in(0,1) such that

 \operatorname{mes}\{x\in{\mathbb{R}}^{d}\dvtx|w(x)|\geq\alpha_{1}\|w\|_{\infty% }\}\geq\alpha_{2}\operatorname{mes}\{\operatorname{supp}(w)\}\qquad\forall w% \in\mathcal{V}.
4. There exists a real number \mu\geq 1 such that

5. There exists a real number \beta\in(0,1) such that

 \sup_{\delta\in(0,1)}\{\ln[N_{{\mathbb{Z}},\mathrm{d}}(\delta)]-\delta^{-\beta% }\}=:C_{{\mathbb{Z}}}(\beta)<\infty.
{remark}

We will show that Assumption (W2) is fulfilled if \mathcal{V} is a set of smooth functions. Assumption (W3) together with (W2) allows one to establish relations between {\mathbb{L}}_{p}-norms of functions from \mathcal{V}; this will be extensively used in what follows. Assumption (W4) is a usual entropy condition. In particular, (W4) ensures that the quantity L^{(\epsilon)}_{\exp} in (36) is finite.

In addition to Assumption 3.3, we will need the following assumption on the properties of the mapping \phi in (47). For p\geq 1, put

 0<\underline{\mathrm{w}}_{p}:=n^{1/p}\inf_{w\in\mathcal{V}}\|w\|_{p}\leq n^{1/% p}\sup_{w\in\mathcal{V}}\|w\|_{p}=:\overline{\mathrm{w}}_{p}<\infty (49)

and define

 {\mathbb{Z}}_{p}(b):=\{\zeta\in{\mathbb{Z}}\dvtx n^{1/p}\|\phi[\zeta]\|_{p}% \leq b\},\qquad b\in[\underline{\mathrm{w}}_{p},\overline{\mathrm{w}}_{p}]. (50)
{assL}

The mapping \phi in (47) satisfies the following conditions:

• if s\in[1,2) then

 \sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{s}(b)}\frac{n^{1/s}\|\phi[\zeta_{1}]% -\phi[\zeta_{2}]\|_{s}}{\mathrm{d}(\zeta_{1},\zeta_{2})}\leq b\qquad\forall b% \in[\underline{\mathrm{w}}_{s},\overline{\mathrm{w}}_{s}],
• if s\geq 2 then

 \sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{2}(b)}\frac{\sqrt{n}\|\phi[\zeta_{1}% ]-\phi[\zeta_{2}]\|_{2}}{\mathrm{d}(\zeta_{1},\zeta_{2})}\leq b\qquad\forall b% \in[\underline{\mathrm{w}}_{2},\overline{\mathrm{w}}_{2}]. (51)

We note that Assumption 3.3 guarantees continuity of \|\xi_{w}\|_{s} on \phi[{\mathbb{Z}}] for any s\leq 2. The same property for s>2 follows from Lemma 7. This, in view of Remark 2, replaces Assumption 3.2.

The next statement presents the uniform moment bound on \|\xi_{w}\|_{s} when s\in[1,2], and \mathcal{W} is given by (46).

###### Theorem 4

\!\!\!\!Let the class of weights \mathcal{W} be defined by (46), and let (47) and Assumptions (W1), (W4) and 3.3 hold.

1. [(ii)]

2. If s\in[1,2) then for all n\geq 1, z\geq[\sqrt{37}/2]n^{1/2-1/s}, and \epsilon\in(0,1] one has

 \mathbb{E}\sup_{w\in\mathcal{W}}[\|\xi_{w}\|_{s}-4u_{\epsilon}(1+z)n^{1/s}\|w% \|_{s}]_{+}^{q}\leq T_{3,\epsilon}n^{q/s}\exp\biggl{\{}-\frac{2z^{2}}{37}n^{(2% /s)-1}\biggr{\}}.
3. If f\in\mathcal{F} then for all n\geq 1, z\geq\sqrt{8[\mu_{*}\mathrm{f}^{2}_{\infty}+4n^{-1/2}]}, and \epsilon\in(0,1] one has

 \displaystyle\mathbb{E}\sup_{w\in\mathcal{W}}\bigl{[}\|\xi_{w}\|_{2}-u_{% \epsilon}(1+z+z^{2}/12)\sqrt{n}\|w\|_{2}\bigr{]}_{+}^{q} \displaystyle\qquad\leq T_{4,\epsilon}n^{q/2}\exp\biggl{\{}-\frac{z^{2}}{16[% \mathrm{f}_{\infty}^{2}\mu_{*}+4n^{-1/2}]}\biggr{\}}.

The explicit expressions for the constants T_{3,\epsilon} and T_{4,\epsilon} are given in the beginning of the proof of the theorem.

The bound of Theorem 4 is nonrandom because U_{\xi}(w,f) does not depend on f whenever s\in[1,2]. The proof of this statement is based on application of Theorem 2.

Now we proceed with the case s>2. Here we need some further notation. Given p\geq 2, let m_{p}\in(0,1] be such that

 \sup_{b\in[\underline{\mathrm{w}}_{2},\overline{\mathrm{w}}_{2}]}b^{-1}\sup_{% \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{2}(b)}\frac{n^{1/p}\|\phi[\zeta_{1}]-\phi[% \zeta_{2}]\|_{p}}{[\mathrm{d}(\zeta_{1},\zeta_{2})]^{m_{p}}}=:C_{p}<\infty. (52)

Existence of m_{p}\in(0,1] such that (52) holds is ensured by Lemma 7 given in Section 8.2. In particular, it is shown there that if Assumptions 3.3 and 3.3 hold then m_{p} can be taken equal to 2/p. We note also that m_{2}=1 and C_{2}=1 by Assumption 3.3.

Following (37), (38) and Corollary 4, we set

 \displaystyle\quad\hat{U}_{\xi}(w) \displaystyle= \displaystyle c_{1}(s)\Biggl{\{}\sqrt{n}\Biggl{\|}\Biggl{[}\frac{1}{n}\sum_{i=% 1}^{n}w^{2}(\cdot-X_{i})\Biggr{]}^{1/2}\Biggr{\|}_{s}+2n^{1/s}\|w\|_{s}\Biggr{% \}}, \displaystyle\breve{U}_{\xi}(w): \displaystyle= \displaystyle\max\bigl{\{}\hat{U}_{\xi}(w),\sqrt{n}\|w\|_{2}\bigr{\}}, (53) \displaystyle\overline{U}_{\xi}(w): \displaystyle= \displaystyle\max\bigl{\{}U_{\xi}(w,f),\sqrt{n}\|w\|_{2}\bigr{\}}.

Put also

 \displaystyle C_{\xi}^{*}(y) \displaystyle= \displaystyle 1+2\vartheta_{0}\bigl{\{}\sqrt{y}\bigl{[}\mu_{*}^{1/s}+n^{-1/(2s% )}\bigr{]}+yn^{-1/s}\bigr{\}}, (54) \displaystyle m: \displaystyle= \displaystyle\cases{1\wedge m_{s},&\quad$s\in(2,4)$,\cr 1\wedge m_{s}\wedge m_% {s/2},&\quad$s\geq 4$,}

where \vartheta_{0}:=5c_{1}(s)[C_{s}\vee 1]\mathrm{f}_{\infty}\alpha_{1}^{-1}\alpha_% {2}^{-1/2}, \alpha_{1} and \alpha_{2} are given in Assumption (W2), and m_{p} and C_{p} are defined in (52).

###### Theorem 5

Let Assumptions 3.3 and 3.3 hold, and assume that f\in\mathcal{F}. Suppose that (W3) is fulfilled with \mu>[64c^{2}_{1}(s)]^{({s\wedge 4})/({s\wedge 4-2})}, and (W4) is fulfilled with \beta<m. Let \gamma=\mu^{1/(s\wedge 4)-1/2}, and

 y_{*}:=\cases{\vartheta_{1}n^{4/s-1},&\quad$s\in(2,4)$,\cr\vartheta_{2}\mu^{-1% /2}[\mu_{*}^{2/s}+n^{-1/s}]^{-2},&\quad$s\geq 4$,} (55)

with constants \vartheta_{1} and \vartheta_{2} specified explicitly in the proof; then for any s>2 and y\in[1,y^{*}] one has

 \mathbb{E}\sup_{w\in\mathcal{W}}\{\|\xi_{w}\|_{s}-\overline{u}_{\epsilon}(% \gamma)C_{\xi}^{*}(y)\breve{U}_{\xi}(w)\}^{q}_{+}\leq T_{5,\epsilon}n^{q/2}[C_% {\xi}^{*}(y)]^{q}\exp\{-y/2\},

where \overline{u}_{\epsilon}(\cdot) is defined in Theorem 3. In addition, if \mathcal{W}_{0}\subseteq\mathcal{W} is an arbitrary subset of \mathcal{W} then

 \displaystyle\mathbb{E}\Bigl{[}\sup_{w\in\mathcal{W}_{0}}\breve{U}_{\xi}(w)% \Bigr{]}^{q} \displaystyle\leq \displaystyle\Bigl{\{}\bigl{[}1+4c_{1}(s)(1+\epsilon)\mu^{{1}/({s\wedge 4})-{1% }/{2}}\bigr{]}\sup_{w\in\mathcal{W}_{0}}\overline{U}_{\xi}(w)\Bigr{\}}^{q} \displaystyle{}+T_{6,\epsilon}n^{{q(s-2)}/({2s})}\exp\{-y_{*}/2\}.

The explicit expressions for the constants T_{5,\epsilon} and T_{6,\epsilon} are given in the proof.

Theorem 5 establishes random uniform bounds on the norms of empirical processes in terms of the parameters determining the class \mathcal{W}. In particular, the parameters \mu and \mu_{*} play an important role. Theorem 5 leads to a number of powerful asymptotic results that demonstrate sharpness of the proposed random bound.

###### Corollary 5

Let assumptions of Theorem 5 hold, and let s>2 be fixed. There exist positive constants k_{i}=k_{i}(s), i=1,2,3 such that if

 \displaystyle\mu \displaystyle= \displaystyle\mu_{n}\asymp[\ln n]^{k_{1}},\qquad\mu_{*}=\mu_{*,n}\asymp[\ln n]% ^{-k_{2}}, \displaystyle\epsilon \displaystyle= \displaystyle\epsilon_{n}\asymp[\ln n]^{-k_{3}},\qquad n\to\infty,

then for all \ell>0 and q\geq 1

 \displaystyle\displaystyle\lim_{n\to\infty}\sup_{f\in\mathcal{F}}n^{\ell}% \mathbb{E}\sup_{w\in\mathcal{W}}[\|\xi_{w}\|_{s}-(1+3\epsilon_{n})\breve{U}_{% \xi}(w)]_{+}^{q}=0, \displaystyle\displaystyle\mathbb{E}\Bigl{[}\sup_{w\in\mathcal{W}_{0}}\breve{U% }_{\xi}(w)\Bigr{]}^{q}\leq\Bigl{[}(1+\epsilon_{n})\sup_{w\in\mathcal{W}_{0}}% \overline{U}_{\xi}(w,f)\Bigr{]}^{q}+R_{n}(\mathcal{W}_{0}),

where \limsup_{n\to\infty}\sup_{f\in\mathcal{F}}\sup_{\mathcal{W}_{0}\subseteq% \mathcal{W}}[n^{\ell}R_{n}(\mathcal{W}_{0})]=0.

The explicit expressions for the constants k_{i}>0, i=1,2,3 are easily computed from Theorem 5. {remark} Corollary 5 shows that if the class of weights \mathcal{W} is such that \mu=\mu_{n} and \mu_{*}=\mu_{*,n}, and if \epsilon is set to be \epsilon=\epsilon_{n}, then (1+3\epsilon_{n})\breve{U}_{n}(w) is a uniform random bound on \|\xi_{w}\|_{s} which is asymptotically almost as good as the nonrandom bound \overline{U}_{\xi}(w,f) depending on f. Thus, in asymptotic terms, there is no loss in sharpness of the random uniform bound in comparison with the nonrandom bound that depends on f.

### 3.4 Specific problems

In this section, we consider process \xi_{w} corresponding to special classes of weights \mathcal{W} that arise in kernel density estimation. Using results of Theorems 4 and 5, we derive uniform bounds on the norms of these processes. As in Section 3.3, here \mathcal{X}=\mathcal{T}={\mathbb{R}}^{d}, and \nu and \tau are both the Lebesgue measure.

Let \mathcal{K} be a given set of real functions defined on {\mathbb{R}}^{d} and suppose that \mathcal{K} is a totally bounded set with respect to the {\mathbb{L}}_{\infty}-norm. Let \mathcal{H}:=\bigotimes_{i=1}^{d}[h^{\min}_{i},h^{\max}_{i}], where the vectors h^{\min}=(h^{\min}_{1},\ldots,h^{\min}_{d}), h^{\max}=(h^{\max}_{1},\ldots,h^{\max}_{d}), 0<h^{\min}_{i}\leq h^{\max}_{i}\leq 1, \forall i=1,\ldots,d are fixed.

For any h\in\mathcal{H} define V_{h}:=\prod_{i=1}^{d}h_{i}, and endow the set \mathcal{H} with the following distance:

 \Delta_{\mathcal{H}}(h,h^{\prime})=\max_{i=1,\ldots,d}\ln\biggl{(}\frac{h_{i}% \vee h_{i}^{\prime}}{h_{i}\wedge h_{i}^{\prime}}\biggr{)}. (56)

In order to verify that \Delta_{\mathcal{H}} is indeed a distance on \mathcal{H} it suffices to note that the function (x,y)\mapsto\ln(x\vee y)-\ln(x\wedge y), x>0,y>0 satisfies all axioms of distance on {\mathbb{R}}_{+}\setminus\{0\}.

We will be interested in the following classes of weights \mathcal{W} and the corresponding processes \xi_{w}.

#### Kernel density estimator process

With any K\in\mathcal{K} and h\in\mathcal{H}, we associate the weight function

 w(t-x)=n^{-1}K_{h}(t-x):=(nV_{h})^{-1}K[(t-x)/h].

As before, u/v, u,v\in{\mathbb{R}}^{d}, stands for the coordinate-wise division (u_{1}/v_{1},\ldots,\allowbreak u_{d}/v_{d}).

The weight w is naturally parametrized by K and h so that we put

We equip {\mathbb{Z}}^{(1)} with the family of distances \{\mathrm{d}^{(1)}_{\vartheta}(\cdot,\cdot),\vartheta>0\} defined by

 \displaystyle\mathrm{d}_{\vartheta}^{(1)}(\zeta,\zeta^{\prime}) \displaystyle= \displaystyle\vartheta\max\{\|K-K^{\prime}\|_{\infty},\Delta_{\mathcal{H}}(h,h% ^{\prime})\}, \displaystyle\zeta \displaystyle= \displaystyle(K,h),\qquad\zeta^{\prime}=(K^{\prime},h^{\prime}),\qquad% \vartheta>0.

Obviously, {\mathbb{Z}}^{(1)} is a totally bounded set with respect to \mathrm{d}^{(1)}_{\vartheta} for any \vartheta>0.

The corresponding family of random fields is

and we are interested in bounds on the {\mathbb{L}}_{s}-norm of this process uniform over the class of weights

 \mathcal{W}^{(1)}:=\bigl{\{}w(\cdot-\cdot)=n^{-1}K_{h}(\cdot-\cdot)\dvtx(K,h)% \in{\mathbb{Z}}^{(1)}\bigr{\}}.

We note that \xi^{(1)}_{w} is the stochastic error of the kernel density estimator associated with the kernel K\in\mathcal{K} and bandwidth h\in\mathcal{H}. According to Theorems 4 and 5, for the process \{\xi_{w},w\in\mathcal{W}^{(1)}\}, the uniform bounds on \|\xi_{w}\|_{s} should be based on the following functionals. Define

 U_{\xi}^{(1)}(w):=\cases{4(nV_{h})^{1/s-1}\|K\|_{s},&\quad$s\in[1,2)$,\cr(nV_{% h})^{-1/2}\|K\|_{2},&\quad$s=2$.}

For s>2, we put

 \displaystyle U^{(1)}_{\xi}(w,f) \displaystyle:= \displaystyle c_{1}(s)\biggl{[}n^{-1/2}\biggl{(}\int\biggl{[}\int K^{2}_{h}(t-% x)f(x)\,{d}x\biggr{]}^{s/2}\,{d}t\biggr{)}^{1/s} \displaystyle                                      {}+2(nV_{h})^{1/s-1}\|K\|_{% s}\biggr{]}, \displaystyle\hat{U}^{(1)}_{\xi}(w) \displaystyle:= \displaystyle c_{1}(s)\Biggl{[}n^{-1/2}\Biggl{(}\int\Biggl{[}n^{-1}\sum_{i=1}^% {n}K^{2}_{h}(t-X_{i})\Biggr{]}^{s/2}\,{d}t\Biggr{)}^{1/s} \displaystyle                                    {}+2(nV_{h})^{1/s-1}\|K\|_{s}% \Biggr{]},

and finally

 \displaystyle\overline{U}{}^{(1)}_{\xi}(w,f) \displaystyle:= \displaystyle\max\bigl{[}U^{(1)}_{\xi}(w,f),(nV_{h})^{-1/2}\|K\|_{2}\bigr{]} (59) \displaystyle\breve{U}^{(1)}_{\xi}(w) \displaystyle:= \displaystyle\max\bigl{[}\hat{U}^{(1)}_{\xi}(w),(nV_{h})^{-1/2}\|K\|_{2}\bigr{% ]}.

#### Convolution kernel density estimator process

For any (K,h)\in{\mathbb{Z}}^{(1)} and (Q,\mathfrak{h})\in{\mathbb{Z}}^{(1)}, we define

 w(t-x)=n^{-1}[K_{h}\ast Q_{\mathfrak{h}}](t-x), (60)

where {\mathbb{Z}}^{(1)} is defined in (57), and \ast stands for the convolution on {\mathbb{R}}^{d}. Put

and define the family of distances on {\mathbb{Z}}^{(2)} as

 \displaystyle\mathrm{d}^{(2)}_{\vartheta}(z,z^{\prime})=\vartheta\max\{\|K-K^{% \prime}\|_{\infty}\vee\|Q-Q^{\prime}\|_{\infty},\Delta_{\mathcal{H}}(h,h^{% \prime})\vee\Delta_{\mathcal{H}}(\mathfrak{h},\mathfrak{h}^{\prime})\}, \displaystyle\eqntext{\vartheta>0,} (61)

where z=[(K,h),(Q,\mathfrak{h})], z^{\prime}=[(K^{\prime},h^{\prime}),(Q^{\prime},\mathfrak{h}^{\prime})], z,z^{\prime}\in{\mathbb{Z}}^{(2)}. Obviously, {\mathbb{Z}}^{(2)} is a totally bounded set with respect to the distance \mathrm{d}_{\vartheta}^{(2)} for any \vartheta>0.

The corresponding family of random fields is

 \displaystyle\xi^{(2)}_{w}(t): \displaystyle= \displaystyle\xi_{\phi_{2}[z]}(t) (63) \displaystyle= \displaystyle\frac{1}{n}\sum_{i=1}^{n}\{[K_{h}\ast Q_{\mathfrak{h}}](t-X_{i})-% \mathbb{E}[K_{h}\ast Q_{\mathfrak{h}}](t-X)\}, \displaystyle\eqntext{\qquad\zeta\in{\mathbb{Z}}^{(2)},}

and we are interested in a uniform bound on \|\xi^{(2)}_{w}\|_{s} over

 \mathcal{W}^{(2)}:=\bigl{\{}w(\cdot-\cdot)=n^{-1}K_{h}*Q_{\mathfrak{h}}(\cdot-% \cdot),[(K,h),(Q,\mathfrak{h})]\in{\mathbb{Z}}^{(2)}\bigr{\}}.

The random field \xi_{w} with w given by (60) appears in the context of multivariate density estimation. In particular, the uniform bounds on \|\xi_{w}\|_{s} are instrumental in construction of a selection rule for the family of kernel estimators parametrized by \mathcal{K}\times\mathcal{H} [see Goldenshluger and Lepski (2009)]. Theorems 4 and 5 suggest to base the uniform bounds on the following quantities. Define

 \displaystyle U^{(2)}_{\xi}(w) \displaystyle:= \displaystyle\cases{4n^{1/s-1}\|K_{h}\ast Q_{\mathfrak{h}}\|_{s},&\quad$s\in[1% ,2)$,\cr n^{-1/2}\|K_{h}\ast Q_{\mathfrak{h}}\|_{2},&\quad$s=2$.}

For s>2, we put

 \displaystyle U^{(2)}_{\xi}(w,f) \displaystyle:= \displaystyle c_{1}(s)\biggl{[}n^{-1/2}\biggl{(}\int\biggl{[}\int[K_{h}\ast Q_% {\mathfrak{h}}]^{2}(t-x)f(x)\,{d}x\biggr{]}^{s/2}\,{d}t\biggr{)}^{1/s} \displaystyle                                              {}+2n^{1/s-1}\|K_{h% }\ast Q_{\mathfrak{h}}\|_{s}\biggr{]}, \displaystyle\hat{U}^{(2)}_{\xi}(w) \displaystyle:= \displaystyle c_{1}(s)\Biggl{[}n^{-1/2}\Biggl{(}\int\Biggl{[}n^{-1}\sum_{i=1}^% {n}[K_{h}\ast Q_{\mathfrak{h}}]^{2}(t-X_{i})\Biggr{]}^{s/2}\,{d}t\Biggr{)}^{1/s} \displaystyle                                            {}+2n^{1/s-1}\|K_{h}% \ast Q_{\mathfrak{h}}\|_{s}\Biggr{]};

and finally

 \displaystyle\overline{U}{}^{(2)}_{\xi}(w,f) \displaystyle:= \displaystyle\max\bigl{[}U^{(2)}_{\xi}(w,f),n^{-1/2}\|K_{h}\ast Q_{\mathfrak{h% }}\|_{2}\bigr{]}, (64) \displaystyle\breve{U}^{(2)}_{\xi}(w) \displaystyle:= \displaystyle\max\bigl{[}\hat{U}^{(2)}_{\xi}(w),n^{-{1/2}}\|K_{h}\ast Q_{% \mathfrak{h}}\|_{2}\bigr{]}.

Theorems 4 and 5 can be used in order to establish upper bounds on the norms of the processes \xi^{(i)}_{w}, i=1,2. For this purpose, Assumptions 3.3 and 3.3 should be verified for the classes of weights \mathcal{W}^{(i)}, i=1,2, defined above. To this end, we introduce conditions on the family of kernels \mathcal{K} that imply Assumptions 3.3 and 3.3. These conditions are rather natural and easily verifiable; they can be weakened in several ways, but we do not pursue this issue here and try to minimize cumbersome calculations to be done. {assK}

1. [(K2)]

2. The family \mathcal{K} is a subset of the isotropic Hölder ball of functions \mathbb{H}_{d}(1,\allowbreak L_{\mathcal{K}}) with the exponent 1 and the Lipschitz constant L_{\mathcal{K}}, that is,

where |\cdot| denotes the Euclidean distance. Moreover, any function K from \mathcal{K} is compactly supported and, without loss of generality,\operatorname{supp}(K)\subseteq[-1/2,1/2]^{d} for all K\in\mathcal{K}.

3. There exist real numbers \mathrm{k}_{1}>0 and \mathrm{k}_{\infty}<\infty such that

 \mathrm{k}_{1}\leq\biggl{|}\int K(t)\,{d}t\biggr{|}\leq\|K\|_{\infty}\leq% \mathrm{k}_{\infty}\qquad\forall K\in\mathcal{K}.

Without loss of generality, we will assume that \mathrm{k}_{\infty}\geq 1 and \mathrm{k}_{1}\leq 1.

4. The set \mathcal{K} is a totally bounded set with respect to the {\mathbb{L}}_{\infty}-norm, and there exists a real number \beta_{\mathcal{K}}\in(0,1) such that the entropy \mathcal{E}_{\mathcal{K}}(\cdot) of \mathcal{K} satisfies

 \sup_{\delta\in(0,1)}[\mathcal{E}_{\mathcal{K}}(\delta)-\delta^{-\beta_{% \mathcal{K}}}]=:C_{\mathcal{K}}<\infty.

Several remarks on the above assumptions are in order. First, we note that Assumptions (K1) and (K3) are not completely independent. In fact, if we suppose that \mathcal{K}\subset\mathbb{H}_{d}(\alpha,L_{\mathcal{K}}) with some \alpha>d then Assumption (K3) is automatically fulfilled with \beta_{\mathcal{K}}=\alpha/d. On the other hand, all our results remain valid if \mathcal{K}\subset\mathbb{H}_{d}(\alpha,L_{\mathcal{K}}) with some \alpha>0. Observe also that the condition |{\int K(t)\,{d}t}|\geq\mathrm{k}_{1} of Assumption (K2) is not restrictive at all because for kernel estimators \int K(t)\,{d}t=1. Therefore, the first inequality in (K2) is satisfied with \mathrm{k}_{1}=1. {remark} It is easy to check that Assumption (K1) implies Assumption 3.2 in Section 3.2 and Assumption 2 in Section 2.

Now we apply Theorems 4 and 5 to the families of random fields given by (58) and (63). We present the results for the processes \{\xi_{\phi_{1}[\zeta]},\zeta\in{\mathbb{Z}}^{(1)}\} and \{\xi_{\phi_{2}[z]},z\in{\mathbb{Z}}^{(2)}\} in a unified way.

#### 3.4.1 Case s\in[1,2]. Uniform nonrandom bounds

In order to derive the uniform upper bounds for s\in[1,2], we use Theorem 4. Obviously, Assumption 3.4 implies Assumptions (W1) and (W4). Thus, in order to apply Theorem 4, we need to verify Assumption 3.3. This is done in Lemma 9 given in Section 9. Thus, Theorem 4 is directly applicable, and nonasymptotic bounds can be straightforwardly derived from this theorem; one needs only to recalculate the constants appearing in the statements of the theorem.

We note that the quantity \mu_{*} defined in (48) satisfies \mu_{*}\leq V_{h^{\max}} for the set of weights \mathcal{W}^{(1)} and \mu_{*}\leq 2^{d}V_{h^{\max}} for the set of weights \mathcal{W}^{(2)}. If we assume that V_{h^{\max}}\to 0 as n\to\infty, then we can establish some asymptotic results, one of which is given in the next theorem.

###### Theorem 6

If Assumption 3.4 holds, then for all s\in[1,2), \ell>0 and \epsilon\in(0,1)

 \lim_{n\to\infty}n^{\ell}\sup_{f\in\mathcal{F}}\mathbb{E}\sup_{w\in\mathcal{W}% ^{(i)}}\bigl{[}\bigl{\|}\xi^{(i)}_{w}\bigr{\|}_{s}-(1+\epsilon)U^{(i)}_{\xi}(w% )\bigr{]}_{+}^{q}=0,\qquad i=1,2.

If Assumption 3.4 holds and V_{h^{\max}}=o(1/\ln n) as n\to\infty, then for all \ell>0 and \epsilon\in(0,1)

 \lim_{n\to\infty}n^{\ell}\sup_{f\in\mathcal{F}}\mathbb{E}\sup_{w\in\mathcal{W}% ^{(i)}}\bigl{[}\bigl{\|}\xi^{(i)}_{w}\bigr{\|}_{2}-(1+\epsilon)U^{(i)}_{\xi}(w% )\bigr{]}_{+}^{q}=0,\qquad i=1,2.

Proof of the theorem is omitted; it is a straightforward consequence of Theorem 4 and Lemma 9 given below in Section 9.

#### 3.4.2 Case s>2. Uniform random bounds

In the case s>2, the uniform bounds are derived from Theorem 5. To state these results, we need the following notation. Define

 \displaystyle\vartheta_{0}^{(1)} \displaystyle:= \displaystyle 10c_{1}(s)\mathrm{f}_{\infty}\bigl{[}L_{\mathcal{K}}\sqrt{d}/% \mathrm{k}_{1}\bigr{]}^{d/2}, (65) \displaystyle\vartheta_{0}^{(2)} \displaystyle:= \displaystyle 10c_{1}(s)\mathrm{f}_{\infty}\bigl{[}2^{d+2}\sqrt{d}L_{\mathcal{% K}}\mathrm{k}_{\infty}/\mathrm{k}_{1}^{2}\bigr{]}^{d/2}.

The next two quantities, A_{\mathcal{H}} and B_{\mathcal{H}}, are completely determined by the bandwidth set \mathcal{H}:

 \displaystyle A_{\mathcal{H}} \displaystyle:= \displaystyle\prod_{j=1}^{d}\ln(h_{j}^{\max}/h_{j}^{\min}), (66) \displaystyle B_{\mathcal{H}} \displaystyle:= \displaystyle\log_{2}(V_{h^{\max}}/V_{h^{\min}})=\sum_{j=1}^{d}\log_{2}(h_{j}^% {\max}/h_{j}^{\min}).

For y>0 put

 \displaystyle\quad C_{\xi,i}^{*}(y) \displaystyle:= \displaystyle 1+2\vartheta_{0}^{(i)}\bigl{\{}\sqrt{y}\bigl{(}\bigl{[}2^{d(i-1)% }V_{h^{\max}}\bigr{]}^{1/s}+n^{-1/2s}\bigr{)}+yn^{-1/s}\bigr{\}}, (68) \displaystyle\eqntext{i=1,2.}

Define also

 y_{*}^{(i)}:=\cases{\vartheta_{1}^{(i)}n^{4/s-1},&\quad$s\in(2,4)$,\cr% \vartheta_{2}^{(i)}(nV_{h^{\min}})^{-1/2}\bigl{[}\bigl{(}2^{d(i-1)}V_{h^{\max}% }\bigr{)}^{2/s}+n^{-1/s}\bigr{]}^{-2},&\quad$s\geq 4$,}

where explicit expressions for the constants \vartheta_{1}^{(i)},\vartheta_{2}^{(i)}, i=1,2 are given in the proof of Theorem 7.

###### Theorem 7

\!\!\!Let Assumption 3.4 hold, f\!\in\!\mathcal{F}, and let {\max_{j=1,\ldots,d}}|h^{\max}_{j}|\!\leq\!1. For i=1,2 assume that

 nV_{h^{\min}}>[64c^{2}_{1}(s)]^{({s\wedge 4})/({s\wedge 4-2})}\bigl{[}2^{d+2}% \sqrt{d}L_{\mathcal{K}}\mathrm{k}_{\infty}/\mathrm{k}_{1}^{2}\bigr{]}^{d(i-1)}. (69)

If \gamma:=(nV_{h^{\min}})^{1/(s\wedge 4)-1/2}, then for any s>2, y\in[1,y_{*}^{(i)}] and for i=1,2 one has

 \displaystyle\mathbb{E}\sup_{w\in\mathcal{W}^{(i)}}\bigl{\{}\bigl{\|}\xi_{w}^{% (i)}\bigr{\|}_{s}-\overline{u}_{\epsilon}(\gamma)C_{\xi,i}^{*}(y)\breve{U}^{(i% )}_{\xi}(w)\bigr{\}}^{q}_{+} \displaystyle\qquad\leq\tilde{T}^{(i)}_{1,\epsilon}(1+A_{\mathcal{H}})^{2i}(1+% B_{\mathcal{H}})n^{q/2}[C_{\xi,i}^{*}(y)]^{q}e^{-y/2},

where \overline{u}_{\epsilon}(\cdot) is defined in Theorem 3, and \breve{U}_{\xi}^{(i)}(w) are defined in (59) and (64).

In addition, for any subset \mathcal{W}_{0}\subseteq\mathcal{W}^{(i)}, any s>2 and for i=1,2 one has

 \displaystyle\mathbb{E}\Bigl{[}\sup_{w\in\mathcal{W}_{0}}\breve{U}^{(i)}_{\xi}% (w)\Bigr{]}^{q} \displaystyle\leq \displaystyle\bigl{[}1+4c_{1}(s)(1+\epsilon)(nV_{h^{\min}})^{{1}/({s\wedge 4})% -{1}/{2}}\bigr{]}^{q}\sup_{w\in\mathcal{W}_{0}}\bigl{\{}\overline{U}{}^{(i)}_{% \xi}(w)\bigr{\}}^{q} \displaystyle{}+\tilde{T}_{2,\epsilon}^{(i)}(1+A_{\mathcal{H}})^{2i}(1+B_{% \mathcal{H}})n^{{q(s-2)}/({2s})}\exp\bigl{\{}-y_{*}^{(i)}/2\bigr{\}}.

The explicit expressions for the constants \tilde{T}_{1,\epsilon}^{(i)} and \tilde{T}_{2,\epsilon}^{(i)} are given in the proof.

We emphasize that the upper bounds of Theorem 7 are nonasymptotic. The constants \vartheta_{1}^{(i)}, \vartheta_{2}^{(i)}, \tilde{T}_{1,\epsilon}^{(i)} and \tilde{T}_{2,\epsilon}^{(i)} are written down explicitly in the proof of the theorem; they are completely determined through the quantities L_{\mathcal{K}}, \mathrm{k}_{1}, \mathrm{k}_{\infty}, C_{\mathcal{K}} and \beta_{\mathcal{K}} appearing in Assumption 3.4, and the constant c_{1}(s) in the Rosenthal inequality. {remark} Condition (69) is not restrictive because the standard assumption on the bandwidth set \mathcal{H} in the kernel density estimation is that

The bounds established in Theorem 7 can be used in order to derive asymptotic (as n\to\infty) results under general assumptions on the set of bandwidths \mathcal{H}. One of such results is given in the next corollary.

###### Corollary 6

Let s>2 be fixed, Assumption 3.4 hold, and f\in\mathcal{F}. There exist positive constants k_{1,i}=k_{1,i}(s), k_{2,i}=k_{2,i}(s) and k_{3,i}=k_{3,i}(s), i=1,2, such that if

 \displaystyle V_{h^{\max}} \displaystyle\asymp \displaystyle[\ln n]^{-k_{1,i}},\qquad nV_{h^{\min}}\asymp[\ln n]^{k_{2,i}}, \displaystyle\epsilon \displaystyle= \displaystyle\epsilon_{n}\asymp[\ln n]^{-k_{3,i}},\qquad n\to\infty,

then for all \ell>0, q\geq 1

 \lim_{n\to\infty}\sup_{f\in\mathcal{F}}n^{\ell}\mathbb{E}\sup_{w\in\mathcal{W}% ^{(i)}}\bigl{[}\bigl{\|}\xi^{(i)}_{w}\bigr{\|}_{s}-(1+3\epsilon_{n})\breve{U}^% {(i)}_{\xi}(w)\bigr{]}_{+}^{q}=0.

In addition, for any subset \mathcal{W}_{0}\in\mathcal{W}^{(i)} one has

 \mathbb{E}\Bigl{[}\sup_{w\in\mathcal{W}_{0}}\breve{U}^{(i)}_{\xi}(w)\Bigr{]}^{% q}\leq\Bigl{[}(1+\epsilon_{n})\sup_{w\in\mathcal{W}_{0}}\overline{U}{}^{(i)}_{% \xi}(w,f)\Bigr{]}^{q}+R_{n}^{(i)}(\mathcal{W}_{0}),

where \limsup_{n\to\infty}\sup_{f\in\mathcal{F}}\sup_{\mathcal{W}_{0}\subseteq% \mathcal{W}^{(i)}}[n^{\ell}R_{n}^{(i)}(\mathcal{W}_{0})]=0, i=1,2.

We remark that explicit expressions for the constants k_{1,i} and k_{2,i}, i=1,2, are easily derived from Theorem 7.

## 4 Uniform bounds for norms of regression-type processes

In this section, we use Proposition 2 in order to derive uniform bounds for the family \|\eta_{w}\|_{s,\tau}, w\in\mathcal{W}; we recall that

 \eta_{w}(t)=\sum_{i=1}^{n}w(t,X_{i})\varepsilon_{i},

see (7). First. we verify Assumption 2 by establishing an analogue of Theorem 1 for a fixed weight function w\in\mathcal{W} [see Theorem 8 below]. It turns out that the corresponding inequality depends heavily on the tail probability of the random variable \varepsilon. In other words, we prove that Assumption 2 is fulfilled with function g that is determined by the rate at which the tail probability of \varepsilon decreases. Next, under Assumptions 3.3 and 3.3, we derive uniform bounds using Corollary 1; this leads to an analogue of Theorem 4 for the regression-type processes.

### 4.1 Probability bounds for fixed weight function

We consider two types of moment conditions on the distribution of \varepsilon. {assE} The distribution of \varepsilon is symmetric, and one of the following two conditions is fulfilled:

1. [(E2)]

2. there exist constants \alpha>0, v>0 and b>0 such that

 {\mathbb{P}}\{|\varepsilon|\geq x\}\leq v\exp\{-bx^{\alpha}\}\qquad\forall x>0,
3. there exist constants p\geq[s\vee 2] and P>0 such that

 \mathbb{E}|\varepsilon|^{p}\leq P.

Let \sigma_{\varepsilon}^{2}:=\mathbb{E}\varepsilon^{2} and e_{s}:=(\mathbb{E}|\varepsilon|^{s})^{1/s}. For any w\in\mathcal{W} define

 \displaystyle\varrho_{s}(w,f) \displaystyle:= \displaystyle\cases{\sigma_{\varepsilon}\bigl{\{}\sqrt{n}\Sigma_{s}(w,f)\wedge 4% n^{1/s}M_{s}(w)\bigr{\}},&\quad$s<2$,\cr\sigma_{\varepsilon}\sqrt{n}M_{2}(w),&% \quad$s=2$,\cr c_{1}(s)\bigl{[}\sigma_{\varepsilon}\sqrt{n}\Sigma_{s}(w,f)+2n^% {1/s}e_{s}M_{s}(w)\bigr{]},&\quad$s>2$,} \displaystyle\varpi^{2}_{s}(w,f) \displaystyle:= \displaystyle\cases{M^{2}_{s}(w)[(6\sigma_{\varepsilon}^{2}+8)n+96\sigma_{% \varepsilon}n^{1/s}],&\quad$s<2$,\cr 6\sigma_{\varepsilon}^{2}nM^{2}_{1,\tau,% \nu^{\prime}}(w)+24\sigma_{\varepsilon}\sqrt{n}M^{2}_{2}(w),&\quad$s=2$,}

and if s>2 then we set

 \displaystyle\varpi^{2}_{s}(w,f) \displaystyle:= \displaystyle 6c_{3}(s)\bigl{[}\sigma_{\varepsilon}^{2}nM^{2}_{{2s}/({s+2}),% \tau,\nu^{\prime}}(w) \displaystyle          {}+4\sigma_{\varepsilon}\sqrt{n}\Sigma_{s}(w,f)M_{s}(w)% +8e_{s}n^{1/s}M^{2}_{s}(w)\bigr{]}.

In the above formulas, we use notation introduced in the beginning of Section 3; the formulas should be compared with (28) and (3.1).

The next theorem is the analogue of Theorem 1 for the regression-type processes.

###### Theorem 8
1. [(ii)]

2. Suppose that Assumption (E1) holds, and for x>0 define the function

 \displaystyle G_{1}(x) \displaystyle:= \displaystyle(1+nv)g_{\alpha,b}(x), (70) \displaystyle g_{\alpha,b}(x) \displaystyle:= \displaystyle\cases{\exp\bigl{\{}-|x|\wedge|b^{1/\alpha}x|^{\alpha/(2+\alpha)}% \bigr{\}},&\quad$s<2$,\cr\exp\bigl{\{}-|x|\wedge|b^{1/\alpha}x|^{\alpha/(1+% \alpha)}\bigr{\}},&\quad$s\geq 2$.}

Then for all s\in[1,\infty) and z>0 one has

 {\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq\varrho_{s}(w,f)+z\}\leq G_{1}\biggl{(}% \frac{z^{2}}{({1}/{3})\varpi_{s}^{2}(w,f)+({4}/{3})c_{*}(s)M_{s}(w)z}\biggr{)},

where c_{*}(\cdot) is given in (27).

3. Suppose that Assumption (E2) holds and for x>0 define the function

 G_{2}(x):=(1+nP)\times\cases{(x^{-1}p\ln[1+p^{-1}x])^{p/2},&\quad$s<2$,\cr(x^{% -1}p\ln[1+p^{-1}x])^{p},&\quad$s\geq 2$.}

Then for all s\in[1,\infty) and z>0 one has

 {\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq\varrho_{s}(w)+z\}\leq G_{2}\biggl{(}% \frac{z^{2}}{({1}/{3})\varpi_{s}^{2}(w,f)+({4}/{3})c_{*}(s)M_{s}(w)z}\biggr{)}.

### 4.2 Uniform bound

Theorem 8 guarantees that Assumption 2 holds with function g being either G_{1} or G_{2}. This result is the basis for derivation of uniform bounds, and the general machinery presented in the previous sections can be fully applied here. In this section, we restrict ourselves only with uniform bounds over the classes of weights depending on the difference of arguments. In other words, under Assumptions 3.3, 3.3 and (E1) we prove an analogue of Theorem 4 for the regression-type processes.

A natural assumption in the regression model where the process \{\eta_{w},w\in\mathcal{W}\} appears is that the design variable X is distributed on a bounded interval of {\mathbb{R}}^{d}, that is, the density f is compactly supported. This will be assumed throughout this section.

Let \mathcal{I}\in{\mathbb{R}}^{d} be a bounded interval, \mathcal{T}=\mathcal{X}=\mathcal{I}, and let \tau=\nu=\operatorname{mes} be the Lebesgue measure. For the sake of brevity, we write \alpha_{*}=\alpha_{1}^{-1}\alpha_{2}^{-1/2} where \alpha_{1} and \alpha_{2} appear in Assumption (W2). Define

 \displaystyle\mathrm{a} \displaystyle:= \displaystyle\max\bigl{(}\sigma_{\varepsilon}\sqrt{\operatorname{mes}(\mathcal% {I})},c_{1}(s)[\sigma_{\varepsilon}\mathrm{f}_{\infty}^{1/2}+2e_{s}\alpha_{*}]% \bigr{)}, \displaystyle\mathrm{c}_{n} \displaystyle:= \displaystyle\tfrac{4}{3}c_{*}(s)\alpha_{*}n^{-1/s}; \displaystyle\mathrm{b}^{2}_{n} \displaystyle:= \displaystyle\cases{\bigl{[}2\sigma_{\varepsilon}^{2}+\frac{8}{3}+32\sigma_{% \varepsilon}n^{1/s-1}\bigr{]}\mu_{*}^{2/s-1},&\quad$s<2$,\cr 2\mathrm{f}^{2}_{% \infty}\mu_{*}+8n^{-1/2},&\quad$s=2$,\cr 2c_{3}(s)\mathrm{f}^{2}_{\infty}[% \sigma^{2}_{\varepsilon}\mu_{*}^{2/s}+(4\sigma_{\varepsilon}\alpha_{*}+8e_{s}% \alpha_{*}^{2})n^{-1/s}],&\quad$s>2$.}
###### Theorem 9

Let Assumptions 3.3 and (E1) hold. Suppose f\in\mathcal{F}, and assume that (51) is valid for all s\geq 1. Let Assumption (W4) be fulfilled with \beta<\alpha/(2+\alpha), if s<2, and with \beta<\alpha/(1+\alpha) if s\geq 2. Then for all s\geq 1, q\geq 1 and y>1 one has

 \displaystyle\mathbb{E}\sup_{w\in\mathcal{W}}\bigl{[}\|\eta_{w}\|_{s}-\mathrm{% a}u_{\epsilon}\bigl{(}1+2\sqrt{y}\mathrm{b}_{n}+2y\mathrm{c}_{n}\bigr{)}\sqrt{% n}\|w\|_{2}\bigr{]}_{+}^{q} \displaystyle\qquad\leq T_{n,\epsilon}\bigl{[}1+2\sqrt{y}\mathrm{b}_{n}+2y% \mathrm{c}_{n}\bigr{]}^{q}[g_{\alpha,b}(y)]^{1/4},

where u_{\epsilon}=2^{\epsilon}(1+\epsilon), g_{\alpha,b}(\cdot) is defined in (70), and the explicit expression of the constant T_{n,\epsilon} is given in the beginning of the proof of the theorem.

The following asymptotic result is an immediate consequence of Theorem 9.

###### Corollary 7

Let the assumptions of Theorem 9 hold. For any \alpha>0 there exist a universal constant \mathrm{c}=\mathrm{c}(\alpha)>0 such that if \mu_{*}\asymp[\ln n]^{-\mathrm{c}} then for all s\geq 1, \epsilon\in(0,1) and for all \ell>0

 \lim_{n\to\infty}n^{\ell}\sup_{f\in\mathcal{F}}\mathbb{E}\sup_{w\in\mathcal{W}% }\bigl{[}\|\eta_{w}\|_{s}-(1+\epsilon)\mathrm{a}\sqrt{n}\|w\|_{2}\bigr{]}_{+}^% {q}=0.

The explicit expression for \mathrm{c}(\alpha) is easily derived from Theorem 9.

## 5 Proofs of Propositions 1 and 2

### 5.1 Proof of Proposition 1

Let Z_{k}, k\in{\mathbb{N}} be an \epsilon 2^{-k-3}-net of Z, and let z_{k}(\zeta), \zeta\in Z denote the element of Z_{k} closest to \zeta in the metric \mathrm{d}.

The continuity of the mapping \zeta\mapsto\xi_{\phi[\zeta]} guarantees that \mathrm{P}-almost surely the following relation holds for any \zeta\in Z:

 \xi_{\phi[\zeta]}=\xi_{\phi[\zeta^{(0)}]}+\sum_{k=0}^{\infty}\bigl{[}\xi_{\phi% [z_{k+1}(\zeta)]}-\xi_{\phi[z_{k}(\zeta)]}\bigr{]}, (71)

where \zeta^{(0)} is an arbitrary fixed element of Z and z_{0}(\zeta)=\zeta^{(0)}, \forall\zeta\in Z.

Note also that independently of \zeta for all k\geq 0

 \mathrm{d}(z_{k+1}(\zeta),z_{k}(\zeta))\leq\epsilon 2^{-k-2}. (72)

We get from sub-additivity of \Psi, (71) and (72) that for any \zeta\in Z

 \displaystyle\qquad\Psi\bigl{(}\xi_{\phi[\zeta]}\bigr{)} \displaystyle\leq \displaystyle\Psi\bigl{(}\xi_{\phi[\zeta^{(0)}]}\bigr{)}+\frac{\pi^{2}}{6}\sum% _{k=0}^{\infty}p_{k}\Psi\bigl{(}\xi_{\phi[z_{k+1}(\phi)]}-\xi_{\phi[z_{k}(\phi% )]}\bigr{)}(k+1)^{2} \displaystyle\leq \displaystyle\Psi\bigl{(}\xi_{\phi[\zeta^{(0)}]}\bigr{)}+\frac{\pi^{2}}{6}\sup% _{k\geq 0}\mathop{\sup_{(z,z^{\prime})\in Z_{k+1}\times Z_{k}:}}_{\mathrm{d}(z% ,z^{\prime})\leq\epsilon 2^{-k-2}}(k+1)^{2}\Psi\bigl{(}\xi_{\phi[z]}-\xi_{\phi% [z^{\prime}]}\bigr{)},

where p_{k}:=6/(\pi^{2}(k+1)^{2}) and \sum_{k=0}^{\infty}p_{k}=1. Since \xi_{\bullet} is linear, \xi_{\phi[z]}-\xi_{\phi[z^{\prime}]}=\xi_{\phi[z]-\phi[z^{\prime}]} for all z,z^{\prime}\in{\mathbb{Z}}, and we obtain from (5.1) and the triangle inequality for probabilities that

 \displaystyle\mathrm{P}\Bigl{\{}\sup_{\zeta\in Z}\Psi\bigl{(}\xi_{\phi[\zeta]}% \bigr{)}\geq(1+\epsilon)[\varkappa_{U}(Z)+C^{*}(y,Z)]\Bigr{\}} \displaystyle\qquad\leq\mathrm{P}\bigl{\{}\Psi\bigl{(}\xi_{\phi[\zeta^{(0)}]}% \bigr{)}\geq\varkappa_{U}(Z)+C^{*}(y,Z)\bigr{\}} (74) \displaystyle\qquad\quad{}+\sum_{k=0}^{\infty}\mathop{\sum_{(z,z^{\prime})\in Z% _{k+1}\times Z_{k}:}}_{\mathrm{d}(z,z^{\prime})\leq\epsilon 2^{-k-2}}\mathrm{P% }\biggl{\{}\Psi\bigl{(}\xi_{\phi[z]-\phi[z^{\prime}]}\bigr{)}\geq\frac{6% \epsilon[\varkappa_{U}(Z)+C^{*}(y,Z)]}{\pi^{2}(k+1)^{2}}\biggr{\}} \displaystyle\qquad=:I_{1}+I_{2}.

In view of (13) and because \zeta^{(0)}\in Z, we have that U(\phi[\zeta^{(0)}])\leq\varkappa_{U}(Z). Therefore, we get from Assumption 2(i) and monotonicity of the function g that for any y>0

 \displaystyle I_{1} \displaystyle\leq \displaystyle\mathrm{P}\bigl{\{}\Psi\bigl{(}\xi_{\phi[\zeta^{(0)}]}\bigr{)}-U% \bigl{(}\phi\bigl{[}\zeta^{(0)}\bigr{]}\bigr{)}\geq C^{*}(y,Z)\bigr{\}} \displaystyle\leq \displaystyle g\biggl{(}\frac{[C^{*}(y,Z)]^{2}}{A^{2}(\phi[\zeta^{(0)}])+B(% \phi[\zeta^{(0)}])C^{*}(y,Z)}\biggr{)} \displaystyle\leq \displaystyle g\biggl{(}\frac{[C^{*}(y,Z)]^{2}}{\Lambda^{2}_{A}(Z)+\Lambda_{B}% (Z)C^{*}(y,Z)}\biggr{)}\leq g(y).

To order to get the last inequality, we have used monotonicity of g and that for any y>0

 \frac{[C^{*}(y,Z)]^{2}}{\Lambda^{2}_{A}(Z)+\Lambda_{B}(Z)C^{*}(y,Z)}=\frac{[% \sqrt{y}\Lambda_{A}(Z)+y\Lambda_{B}(Z)]^{2}}{\Lambda_{A}^{2}(Z)+\Lambda_{B}(Z)% [\sqrt{y}\Lambda_{A}(Z)+y\Lambda_{B}(Z)]}\geq y.\vspace*{3pt}

By (13), if z,z^{\prime}\in Z and \mathrm{d}(z,z^{\prime})\leq\epsilon 2^{-k-2} then

 U(\phi[z]-\phi[z^{\prime}])\leq\epsilon 2^{-k-2}\varkappa_{U}(Z),\vspace*{3pt}

and, therefore, for any y\geq 0

 \displaystyle\mathrm{P}\biggl{\{}\Psi\bigl{(}\xi_{\phi[z]-\phi[z^{\prime}]}% \bigr{)}\geq\frac{6\epsilon[\varkappa_{U}(Z)+C^{*}(y,Z)]}{\pi^{2}(k+1)^{2}}% \biggr{\}} \displaystyle\qquad\leq\mathrm{P}\biggl{\{}\Psi\bigl{(}\xi_{\phi[z]-\phi[z^{% \prime}]}\bigr{)}-U(\phi[z]-\phi[z^{\prime}]) \displaystyle\qquad\hskip 22.6pt\geq\frac{6\epsilon[\varkappa_{U}(Z)+C^{*}(y,Z% )]}{\pi^{2}(k+1)^{2}}-\varkappa_{U}(Z)\epsilon 2^{-k-2}\biggr{\}} \displaystyle\qquad\leq\mathrm{P}\biggl{\{}\Psi\bigl{(}\xi_{\phi[z]-\phi[z^{% \prime}]}\bigr{)}-U(\phi[z]-\phi[z^{\prime}])\geq\frac{9\epsilon C^{*}(y,Z)}{1% 6(k+1)^{2}}\biggr{\}}.

Here we took into account that \min_{k\geq 0}[6\pi^{-2}(k+1)^{-2}-2^{-k-2}]>0 and 9/16<(6/\pi^{2}). Putting C_{k}=\frac{9\epsilon C^{*}(y,Z)}{16(k+1)^{2}} and applying Assumption 2(i), we obtain for any z,z^{\prime}\in Z_{k+1}\times Z_{k} satisfying \mathrm{d}(z,z^{\prime})\leq\epsilon 2^{-k-2}:

 \displaystyle\mathrm{P}\biggl{\{}\Psi\bigl{(}\xi_{\phi[z]-\phi[z^{\prime}]}% \bigr{)}\geq\frac{6\epsilon[\varkappa_{U}(Z)+C^{*}(y,Z)]}{\pi^{2}(k+1)^{2}}% \biggr{\}} \displaystyle\qquad\leq g\biggl{(}\frac{C_{k}^{2}}{A^{2}(\phi[z]-\phi[z^{% \prime}])+B(\phi[z]-\phi[z^{\prime}])C_{k}}\biggr{)} \displaystyle\qquad\leq g\biggl{(}\frac{C_{k}^{2}}{[\Lambda_{A}(Z)\epsilon 2^{% -k-2}]^{2}+[\Lambda_{B}(Z)\epsilon 2^{-k-2}]C_{k}}\biggr{)} \displaystyle\qquad\leq g\biggl{(}\frac{\tilde{C}_{k}^{2}}{\Lambda_{A}^{2}(Z)+% \Lambda_{B}(Z)\tilde{C}_{k}}\biggr{)},

where we denoted \tilde{C}_{k}=C_{k}2^{k+2}. Taking into account that 9(k+1)^{-2}2^{k-2}\geq 1 for any k\geq 0, and by definition of C^{*}(y,Z), we obtain for any y>0 that

 \frac{\tilde{C}_{k}^{2}}{\Lambda_{A}^{2}(Z)+\Lambda_{B}(Z)\tilde{C}_{k}}\geq 9% y(k+1)^{-2}2^{k-2}.

Hence, for any z,z^{\prime}\in Z_{k+1}\times Z_{k} satisfying \mathrm{d}(z,z^{\prime})\leq\epsilon 2^{-k-2} one has

 \mathrm{P}\biggl{\{}\Psi\bigl{(}\xi_{\phi[z]-\phi[z^{\prime}]}\bigr{)}\geq% \frac{6\epsilon[\varkappa_{U}(Z)+C^{*}(y,Z)]}{\pi^{2}(k+1)^{2}}\biggr{\}}\leq g% \bigl{(}9y2^{k-2}(k+1)^{-2}\bigr{)}.\hskip-40.0pt (76)

Noting that the right-hand side of (76) does not depend on z,z^{\prime} we get

 I_{2}\leq\sum_{k=0}^{\infty}\{N_{Z,\mathrm{d}}(\epsilon 2^{-k-1})\}^{2}g\bigl{% (}9y2^{k-2}(k+1)^{-2}\bigr{)}. (77)

The theorem statement follows now from (5.1), (5.1) and (77).

### 5.2 Proof of Proposition 2

\!\!\!Let Z_{l}, l\,{=}\,1,\ldots,N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8) be \mathrm{d}-balls of radius \epsilon/8 forming a minimal covering of the set {\mathbb{Z}}. For any 0\leq j\leq[\epsilon^{-1}\log_{2}(R/r)-1]_{+} [without loss of generality, we assume that \epsilon^{-1}\log_{2}(R/r) is integer], let \delta_{j}=r2^{\epsilon j}, and put

 \tilde{{\mathbb{Z}}}_{\delta_{j+1}}=\{\zeta\in{\mathbb{Z}}\dvtx\delta_{j}

Note that \tilde{{\mathbb{Z}}}_{\delta_{j}}\!\subseteq\!{\mathbb{Z}}_{\delta_{j}} for all j because \epsilon\!\in\!(0,1]; recall that {\mathbb{Z}}_{a} is defined in (18).

We have Z_{l}=\bigcup_{j=0}^{[\epsilon^{-1}\log_{2}(R/r)-1]_{+}}\{Z_{l}\cap\tilde{{% \mathbb{Z}}}_{\delta_{j+1}}\} for any l=1,\ldots,N_{Z,\mathrm{d}}(\epsilon/8). Therefore, for any y>0,

 \hskip 28.0pt\Psi_{u_{\epsilon}}^{*}(y,Z_{l})\leq\sup_{j=0,\ldots,[\epsilon^{-% 1}\log_{2}(R/r)-1]_{+}}\Bigl{[}\sup_{\zeta\in Z_{l}\cap\tilde{{\mathbb{Z}}}_{% \delta_{j+1}}}\Psi\bigl{(}\xi_{\phi[\zeta]}\bigr{)}-u_{\epsilon}C^{*}(y)\delta% _{j}\Bigr{]}. (78)

Let 0\leq j\leq[\epsilon^{-1}\log_{2}(R/r)-1]_{+} be fixed; then using the definition of \Lambda_{A} and \Lambda_{B} [see (14) and (15)] and the fact that \tilde{{\mathbb{Z}}}_{\delta_{j+1}}\subseteq{\mathbb{Z}}_{\delta_{j+1}} we have that

 \displaystyle C^{*}(y) \displaystyle\geq \displaystyle 1+\delta^{-1}_{j+1}\bigl{[}2\sqrt{y}\Lambda_{A}({\mathbb{Z}}_{% \delta_{j+1}})+2y\Lambda_{B}({\mathbb{Z}}_{\delta_{j+1}})\bigr{]} \displaystyle\geq \displaystyle 1+\delta^{-1}_{j}\bigl{[}\sqrt{y}\Lambda_{A}({\mathbb{Z}}_{% \delta_{j+1}})+y\Lambda_{B}({\mathbb{Z}}_{\delta_{j+1}})\bigr{]} \displaystyle\geq \displaystyle 1+\delta^{-1}_{j}\bigl{[}\sqrt{y}\Lambda_{A}(\tilde{{\mathbb{Z}}% }_{\delta_{j+1}})+y\Lambda_{B}(\tilde{{\mathbb{Z}}}_{\delta_{j+1}})\bigr{]}.

Therefore

 \displaystyle C^{*}(y)\delta_{j} \displaystyle\geq \displaystyle\delta_{j}+\bigl{[}\sqrt{y}\Lambda_{A}(\tilde{{\mathbb{Z}}}_{% \delta_{j+1}})+y\Lambda_{B}(\tilde{{\mathbb{Z}}}_{\delta_{j+1}})\bigr{]} \displaystyle\geq \displaystyle 2^{-\epsilon}\varkappa_{U}(\tilde{{\mathbb{Z}}}_{\delta_{j+1}})+% C^{*}(y,\tilde{{\mathbb{Z}}}_{\delta_{j+1}}),

since by the premise of the proposition \delta_{j}=2^{-\epsilon}\delta_{j+1}\geq 2^{-\epsilon}\varkappa_{U}({\mathbb{Z% }}_{\delta_{j+1}})\geq 2^{-\epsilon}\times\varkappa_{U}(\tilde{{\mathbb{Z}}}_{% \delta_{j+1}}). Note also that the definition of C^{*}(\cdot,\cdot) implies that C^{*}(\cdot,Z_{1})\leq C^{*}(\cdot,Z_{2}) whenever Z_{1}\subseteq Z_{2}. Thus, we have for any 0\leq j\leq[\epsilon^{-1}\log_{2}(R/r)-1]_{+} and any l=1,\ldots,N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)

 u_{\epsilon}C^{*}(y)\delta_{j}\geq(1+\epsilon)[\varkappa_{U}(Z_{l}\cap\tilde{{% \mathbb{Z}}}_{\delta_{j+1}})+C^{*}(y,Z_{l}\cap\tilde{{\mathbb{Z}}}_{\delta_{j+% 1}})]. (79)

Taking into account (78), we obtain

 \displaystyle\mathrm{P}\{\Psi_{u_{\epsilon}}^{*}(y,Z_{l})\geq 0\} \displaystyle\qquad\leq\sum_{j=0}^{[\epsilon^{-1}\log_{2}(R/r)-1]_{+}}\mathrm{% P}\Bigl{\{}\sup_{\zeta\in Z_{l}\cap\tilde{{\mathbb{Z}}}_{\delta_{j+1}}}\Psi% \bigl{(}\xi_{\phi[\zeta]}\bigr{)}\geq(1+\epsilon)[\varkappa_{U}(Z_{l}\cap% \tilde{{\mathbb{Z}}}_{\delta_{j+1}}) \displaystyle\qquad\quad\hskip 215.0pt{}+C^{*}(y,Z_{l}\cap\tilde{{\mathbb{Z}}}% _{\delta_{j+1}})]\Bigr{\}}.

Applying Proposition 1 for the sets Z_{l}\cap\tilde{{\mathbb{Z}}}_{\delta_{j+1}}, we get for any y>0

 \displaystyle\mathrm{P}\{\Psi_{u_{\epsilon}}^{*}(y,Z_{l})\geq 0\} \displaystyle\leq \displaystyle\sum_{j=0}^{[\epsilon^{-1}\log_{2}(R/r)-1]_{+}}L^{(\epsilon)}_{g,% {\mathbb{Z}}_{\delta_{j+1}}}(y) \displaystyle= \displaystyle\sum_{j=0}^{[\epsilon^{-1}\log_{2}(R/r)-1]_{+}}L^{(\epsilon)}_{g}% \bigl{(}y,r2^{\epsilon(j+1)}\bigr{)}.

It remains to note that the right-hand side of the last inequality does not depend on l; thus, we come to the first assertion of the proposition.

Now we derive the bound for the moments of \Psi^{*}_{u_{\epsilon}}(y,{\mathbb{Z}}). We have from (78) with y>0 that for any q\geq 1

 \displaystyle\mathrm{E}\Bigl{(}\sup_{\zeta\in{\mathbb{Z}}}\bigl{\{}\Psi\bigl{(% }\xi_{\phi[\zeta]}\bigr{)}-u_{\epsilon}C^{*}(y)U(\phi[\zeta])\bigr{\}}\Bigr{)}% ^{q}_{+} \displaystyle\qquad\leq\sum_{l=1}^{N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)}% \sum_{j=0}^{[\epsilon^{-1}\log_{2}(R/r)-1]_{+}}\mathrm{E}\Bigl{(}\sup_{\zeta% \in Z_{l}\cap\tilde{{\mathbb{Z}}}_{\delta_{j+1}}}\bigl{\{}\Psi\bigl{(}\xi_{% \phi[\zeta]}\bigr{)}-u_{\epsilon}C^{*}(y)\delta_{j}\bigr{\}}\Bigr{)}^{q}_{+} (80) \displaystyle\qquad=:\sum_{l=1}^{N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)}\sum_% {j=0}^{[\epsilon^{-1}\log_{2}(R/r)-1]_{+}}E_{j}(l).

For l=1,\ldots,N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8) and 0\leq j\leq[\epsilon^{-1}\log_{2}(R/r)-1]_{+} we have

 \displaystyle E_{j}(l) \displaystyle= \displaystyle q\int_{u_{\epsilon}C^{*}(y)\delta_{j}}^{\infty}[x-u_{\epsilon}C^% {*}(y)\delta_{j}]^{q-1} \displaystyle               {}\times\mathrm{P}\Bigl{\{}\sup_{\zeta\in Z_{l}% \cap\tilde{{\mathbb{Z}}}_{\delta_{j+1}}}\Psi\bigl{(}\xi_{\phi[\zeta]}\bigr{)}% \geq x\Bigr{\}}\,{{d}x} \displaystyle= \displaystyle[u_{\epsilon}C^{*}(y)]^{q}\delta^{q}_{j}q \displaystyle{}\times\int_{1}^{\infty}(z-1)^{q-1}\mathrm{P}\Bigl{\{}\sup_{% \zeta\in Z_{l}\cap\tilde{{\mathbb{Z}}}_{\delta_{j+1}}}\Psi\bigl{(}\xi_{\phi[% \zeta]}\bigr{)}\geq zu_{\epsilon}C^{*}(y)\delta_{j}\Bigr{\}}\,{{d}z} \displaystyle\leq \displaystyle[u_{\epsilon}C^{*}(y)]^{q}\delta^{q}_{j}q \displaystyle{}\times\int_{1}^{\infty}(z-1)^{q-1}\mathrm{P}\Bigl{\{}\sup_{% \zeta\in Z_{l}\cap\tilde{{\mathbb{Z}}}_{\delta_{j+1}}}\Psi\bigl{(}\xi_{\phi[% \zeta]}\bigr{)}\geq u_{\epsilon}C^{*}(yz)\delta_{j}\Bigr{\}}\,{{d}z} \displaystyle\leq \displaystyle[u_{\epsilon}C^{*}(y)]^{q}\delta^{q}_{j}q\int_{1}^{\infty}(z-1)^{% q-1}L^{(\epsilon)}_{g}\bigl{(}yz,r2^{\epsilon(j+1)}\bigr{)}\,{d}z.

Here the third line follows from zC^{*}(y)\geq C^{*}(yz) for any z\geq 1, and the last line is a consequence of (79) and the probability bound established above.

The second statement of the theorem follows now from (5.2) and (5.2) since the right-hand side in (5.2) does not depend on l.

## 6 Proof of Theorem 1

### 6.1 Preliminaries

For convenience in this section, we present some well-known results that will be repeatedly used in the proofs.

#### Empirical processes

Let \mathcal{F} be a countable set of functions f\dvtx\mathcal{X}\to{\mathbb{R}}. Suppose that \mathbb{E}f(X)=0, \|f\|_{\infty}\leq b, \forall f\in\mathcal{F} and put

###### Lemma 1

For any x\geq 0

 {\mathbb{P}}\{Y-\mathbb{E}Y\geq x\}\leq\exp\biggl{\{}-\frac{x^{2}}{2n\sigma^{2% }+4b\mathbb{E}Y+({2}/{3})bx}\biggr{\}}.

The statement of the lemma is an immediate consequence of the the Bennett inequality for empirical processes [see Bousquet (2002)] and the standard arguments allowing to derive the Bernstein inequality from the Bennett inequality.

#### Inequalities for sums of independent random variables

We recall the well-known Rosenthal and Bahr–Esseen [see von Bahr and Esseen (1965)] bounds on the moments of sums of independent random variables.

###### Lemma 2

Let Y_{1},\ldots,Y_{n} be independent random variables, \mathbb{E}Y_{i}=0, i=1,\ldots,n. Then

 \displaystyle\mathbb{E}\Biggl{|}\sum_{i=1}^{n}Y_{i}\Biggr{|}^{p} \displaystyle\leq \displaystyle[c_{1}(p)]^{p}\Biggl{\{}\sum_{i=1}^{n}\mathbb{E}|Y_{i}|^{p}+% \Biggl{(}\sum_{i=1}^{n}\mathbb{E}Y_{i}^{2}\Biggr{)}^{p/2}\Biggr{\}},\qquad p>2; \displaystyle\mathbb{E}\Biggl{|}\sum_{i=1}^{n}Y_{i}\Biggr{|}^{p} \displaystyle\leq \displaystyle 2\sum_{i=1}^{n}\mathbb{E}|Y_{i}|^{p},\qquad p\in[1,2),

where c_{1}(p)=15p/\ln p.

The constant c_{1}(p)=15p/\ln p in the Rosenthal inequality is obtained by symmetrization of the inequality of Theorem 4.1 in Johnson, Schechtman and Zinn (1985).

#### Norms of integral operators

The next statement presents inequalities for norms of integral operators.

###### Lemma 3

Let (\mathcal{T},\mathfrak{T},\tau) and (\mathcal{X},\mathfrak{X},\chi) be \sigma-finite spaces, w be a (\mathfrak{T}\times\mathfrak{X})-measurable function on \mathcal{T}\times\mathcal{X}, and let

 M_{p,\tau,\chi}(w):=\sup_{x\in\mathcal{X}}\|w(\cdot,x)\|_{p,\tau}\vee\sup_{t% \in\mathcal{T}}\|w(t,\cdot)\|_{p,\chi}.

If R\in{\mathbb{L}}_{p}(\mathcal{X},\chi) and \mathcal{I}_{R}(t):=\int w(t,x)R(x)\chi({d}x) then the following statements hold:

1. [(a)]

2. For any p\in[1,\infty]

 \|\mathcal{I}_{R}\|_{p,\tau}\leq M_{1,\tau,\chi}(w)\|R\|_{p,\chi}. (82)
3. For any 1<p<r<\infty

 \|\mathcal{I}_{R}\|_{r,\tau}\leq c_{2}(p)M_{q,\tau,\chi}(w)\|R\|_{p,\chi}, (83)

where \frac{1}{q}=1+\frac{1}{r}-\frac{1}{p}, and c_{2}(p) is a numerical constant independent of w.

The statements of the lemma can be found in Folland (1999), Theorems 6.18 and 6.36.

Note that if \chi=\nu^{\prime}:=f\nu then M_{p,\tau,\chi}(w)=M_{p}(w), \forall w [see (26)]. If \mathcal{T}=\mathcal{X}={\mathbb{R}}^{d}, \tau and \chi are the Lebesgue measures, and if w(t,x) depends on the difference t-x only, then c_{2}(p)=1, and (83) is the well-known Young inequality.

### 6.2 Proof of Theorem 1

We begin with two technical lemmas; their proofs are given in the Appendix.

###### Lemma 4

Let \mathbb{B}_{{s}/({s-1})} be the unit ball in {\mathbb{L}}_{{s}/({s-1})}(\mathcal{T},\tau), and suppose that Assumption 3 hold. Then, there exists a countable set \mathfrak{L}\subset\mathbb{B}_{{s}/({s-1})} such that

 \|\xi_{w}\|_{s,\tau}=\sup_{l\in\mathfrak{L}}\int l(t)\xi_{w}(t)\tau({d}t).% \vspace*{-1pt}
###### Lemma 5

Let \overline{w}(t,x)=w(t,x)-\mathbb{E}w(t,X); then for all p\geq 1 one has:

1. [(a)]

2. \|\overline{w}(\cdot,x)\|_{p,\tau}\leq 2\sup_{x\in\mathcal{X}}\|w(\cdot,x)\|_{% p,\tau}.

3. M_{p}(\overline{w})\leq 2M_{p}(w).

We break the proof of Theorem 1 into several steps.

#### Step 1: Reduction to empirical process

We obtain from Lemma 4

 \displaystyle\|\xi_{w}\|_{s,\tau} \displaystyle= \displaystyle\sup_{l\in\mathfrak{L}}\int l(t)\xi_{w}(t)\tau({d}t) \displaystyle= \displaystyle\sup_{l\in\mathfrak{L}}\sum_{i=1}^{n}\int l(t)\overline{w}(t,X_{i% })\tau({d}t) \displaystyle= \displaystyle\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(X_{i}),

where

 \Lambda=\biggl{\{}\lambda\dvtx\mathcal{X}\to{\mathbb{R}}\dvtx\lambda(x)=\int l% (t)\overline{w}(t,x)\tau({d}t),l\in\mathfrak{L}\biggr{\}}.

Thus,

 \|\xi_{w}\|_{s,\tau}=\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(X_{i})=:Y (84)

and, obviously, \mathbb{E}\lambda(X)=0. The idea now is to apply Lemma 1 to the random variable Y.

#### Step 2: Some upper bounds

In order to apply Lemma 1, we need to bound from above the following quantities: (i) \mathbb{E}Y; (ii) b:=\sup_{\lambda\in\Lambda}\|\lambda\|_{\infty}; and (iii) \sigma^{2}:=\sup_{\lambda\in\Lambda}\mathbb{E}\lambda^{2}(X).

(i) Upper bound for \mathbb{E}Y. Applying the Hölder inequality, we get from (84)

 \mathbb{E}\Biggl{[}\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(X_{i})\Biggr{% ]}=\mathbb{E}\|\xi_{w}\|_{s,\tau}\leq[\mathbb{E}\|\xi_{w}\|^{s}_{s,\tau}]^{{1}% /{s}}=\biggl{[}\int\mathbb{E}|\xi_{w}(t)|^{s}\tau({d}t)\biggr{]}^{{1}/{s}}.

If s\in[1,2], then for all t\in\mathcal{T}

 \mathbb{E}|\xi_{w}(t)|^{s}\leq[\mathbb{E}|\xi_{w}(t)|^{2}]^{{s}/{2}}\leq[n% \mathbb{E}w^{2}(t,X)]^{{s}/{2}}=\biggl{[}n\int w^{2}(t,x)f(x)\nu({d}x)\biggr{]% }^{{s}/{2}}.

Thus, we have for all s\in[1,2]

 \mathbb{E}Y=\mathbb{E}\Biggl{[}\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(X% _{i})\Biggr{]}\leq\sqrt{n}\Sigma_{s}(w,f). (85)

Note that the same quantity can be bounded from above in a different way. Indeed, in view of the Barh–Esseen inequality (the second statement of Lemma 6.2)

 \mathbb{E}|\xi_{w}(t)|^{s}\leq 2n\mathbb{E}|\overline{w}(t,X)|^{s}=2^{1+s}n% \mathbb{E}|w(t,X)|^{s}

and we obtain for all s\in[1,2]

 \mathbb{E}Y=\mathbb{E}\Biggl{[}\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(X% _{i})\Biggr{]}\leq 2^{1+1/s}n^{1/s}M_{s}(w). (86)

We get finally from (85) and (86)

 \mathbb{E}Y\leq\bigl{\{}\sqrt{n}\Sigma_{s}(w,f)\bigr{\}}\wedge\{4n^{1/s}M_{s}(% w)\}. (87)

If s=2, we obtain a bound independent of f: indeed, in this case

 \displaystyle\mathbb{E}Y \displaystyle= \displaystyle\mathbb{E}\Biggl{[}\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(% X_{i})\Biggr{]}\leq\sqrt{n}\biggl{[}\int\!\!\int w^{2}(t,x)f(x)\nu({d}x)\tau({% d}t)\biggr{]}^{{1}/{2}} \displaystyle\leq \displaystyle\sqrt{n}M_{2}(w).

If s>2, then applying the Rosenthal inequality (the first assertion of Lemma 6.2) to \xi_{w}(t), which is a sum of i.i.d. random variables for any t\in\mathcal{T}, we get

 [\mathbb{E}(|\xi_{w}(t)|^{s})]^{{1}/{s}}\leq c_{1}(s)[(n\mathbb{E}w^{2}(t,X))^% {{s}/{2}}+n\mathbb{E}|\overline{w}(t,X)|^{s}]^{{1}/{s}}

and, therefore,

 \displaystyle\mathbb{E}\Biggl{[}\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(% X_{i})\Biggr{]} \displaystyle\qquad\leq c_{1}(s)\biggl{\{}\sqrt{n}\biggl{[}\int\biggl{(}\int w% ^{2}(t,x)f(x)\nu({d}x)\biggr{)}^{{s}/{2}}\tau({d}t)\biggr{]}^{{1}/{s}} (89) \displaystyle\qquad\quad\hskip 28.3pt{}+2n^{1/s}\biggl{[}\int\!\!\int|w(t,x)|^% {s}f(x)\nu({d}x)\tau({d}t)\biggr{]}^{{1}/{s}}\biggr{\}}.

To get the last inequality we have used that \mathbb{E}|\overline{w}(t,X)|^{s}\leq 2^{s}\mathbb{E}|w(t,X)|^{s}, for all s\geq 1.

It is evident that the second integral on the right-hand side of (6.2) does not exceed M_{s}(w). Moreover, since (\mathbb{E}w^{2}(t,X))^{{s}/{2}}\leq\mathbb{E}|w(t,X)|^{s}, s\geq 2, the following bound is true \Sigma_{s}(w,f)\leq M_{s}(w). We conclude that \mathbb{E}Y<\infty whenever M_{s}(w)<\infty, and

 \mathbb{E}Y=\mathbb{E}\Biggl{[}\sup_{\lambda\in\Lambda}\sum_{i=1}^{n}\lambda(X% _{i})\Biggr{]}\leq c_{1}(s)\bigl{\{}\sqrt{n}\Sigma_{s}(w,f)+2n^{1/s}M_{s}(w)% \bigr{\}}. (90)

(ii) Upper bound for b=\sup_{\lambda\in\Lambda}\|\lambda\|_{\infty}. Taking into account that l\in\mathfrak{L}\subset\mathbb{B}_{{s}/({s-1})} (Lemma 4) and applying the Hölder inequality, we get for any x\in\mathcal{X}

 |\lambda(x)|\leq\biggl{[}\int|w(t,x)-\mathbb{E}w(t,X)|^{s}\tau({d}t)\biggr{]}^% {{1}/{s}}=\|\overline{w}(\cdot,x)\|_{s,\tau}.

Therefore, in view of Lemma 5(a)

 b=\|\lambda\|_{\infty}\leq 2\sup_{x\in\mathcal{X}}\|w(\cdot,x)\|_{s,\tau}\leq 2% M_{s}(w). (91)

(iii) Upper bound on thedualvariance \sigma^{2}. Since \mathbb{E}\lambda(X)=0, we have

 \displaystyle\sigma^{2} \displaystyle= \displaystyle\sup_{\lambda\in\Lambda}\int\lambda^{2}(x)f(x)\nu({d}x) \displaystyle= \displaystyle\sup_{l\in\mathfrak{L}}\int\biggl{[}\int\overline{w}(t,x)l(t)\tau% ({d}t)\biggr{]}^{2}f(x)\nu({d}x) \displaystyle\leq \displaystyle\sup_{l\in\mathbb{B}_{{s}/({s-1})}}\int\biggl{[}\int\overline{w}(% t,x)l(t)\tau({d}t)\biggr{]}^{2}f(x)\nu({d}x) \displaystyle\leq \displaystyle\sup_{l\in\mathbb{B}_{{s}/({s-1})}}\int\biggl{[}\int w(t,x)l(t)% \tau({d}t)\biggr{]}^{2}f(x)\nu({d}x).

The expression on the right-hand side is bounded differently depending on the value of s.

If s\in[1,2), then applying the Hölder inequality to the inner integral in the previous expression we obtain

 \displaystyle\sigma^{2} \displaystyle\leq \displaystyle\int\biggl{[}\int|w(t,x)|^{s}\tau({d}t)\biggr{]}^{{2}/{s}}f(x)\nu% ({d}x) \displaystyle\leq \displaystyle\sup_{x\in\mathcal{X}}\|w(\cdot,x)\|^{2}_{s,\tau}\leq M^{2}_{s}(w).

We remark also that the bound given by (6.2) remains true for all s\geq 1. This shows, in particular, that \sigma is always bounded whenever M_{s}(w)<\infty.

If s=2, then we apply inequality (82) of Lemma 3 with p=2 and \chi({d}x)=\nu^{\prime}({d}x)=f(x)\nu({d}x) to the integral operator \mathcal{I}_{l}(x)=\int w(t,x)l(t)\tau({d}t). This leads to the following bound

 \sigma^{2}\leq M^{2}_{1,\tau,\nu^{\prime}}(w). (93)

If s>2, then we apply inequality (83) of Lemma 3 with r=2, p=\frac{s}{s-1}, q=\frac{2s}{s+2} and \chi=\nu^{\prime} to the integral operator \mathcal{I}_{l}(x)=\int w(t,x)l(t)\tau({d}t). This yields

 \quad\sigma^{2}\leq c_{2}\bigl{(}s/(s-1)\bigr{)}M^{2}_{q,\tau,\nu^{\prime}}(w)% =c_{2}\bigl{(}s/(s-1)\bigr{)}M^{2}_{2s/(s+2),\tau,\nu^{\prime}}(w). (94)

#### Step 3: Application of Lemma 1

\!\!\!1. Case s\in[1,2). Here we have from (87), (91) and (6.2)

 \displaystyle\mathbb{E}Y \displaystyle\leq \displaystyle\bigl{\{}\sqrt{n}\Sigma_{s}(w,f)\bigr{\}}\wedge\{4n^{1/s}M_{s}(w)% \}=:\rho_{s}(w,f), \displaystyle b \displaystyle\leq \displaystyle 2M_{s}(w),\qquad\sigma^{2}\leq M^{2}_{s}(w).

Therefore applying Lemma 1, we have for all z>0

 \displaystyle{\mathbb{P}}\{\|\xi_{w}\|_{s,\tau}\geq\rho_{s}(w,f)+z\} (95) \displaystyle\qquad\leq\exp\biggl{\{}-\frac{z^{2}}{2M^{2}_{s}(w)[n+16n^{1/s}]+% [4M_{s}(w)z/3]}\biggr{\}},

where we have used (86) in the denominator of the expression inside of the exponent.

To get the result of the theorem, we note that the following trivial upper bound follows from the triangle inequality and the statement (a) of Lemma 5:

 \|\xi_{w}\|_{s,\tau}\leq 2nM_{s}(w)\qquad\forall s\geq 1.

Thus, the probability in (6.2) is equal to zero if z>2nM_{s}(w); hence, we can replace z by 2nM_{s}(w) in the denominator of the expression on the right-hand side. This leads to the statement of the theorem for s\in[1,2).

2. Case s=2. We have from (6.2), (91) and (93)

Thus, for all z>0

 \displaystyle{\mathbb{P}}\bigl{\{}\|\xi_{w}\|_{2,\tau}\geq\sqrt{n}M_{2}(w)+z% \bigr{\}} \displaystyle\qquad\leq\exp\biggl{\{}-\frac{z^{2}}{2[nM^{2}_{1,\tau,\nu^{% \prime}}(w)+4\sqrt{n}M^{2}_{2}(w)+({2}/{3})M_{2}(w)z]}\biggr{\}},

and the statement of Theorem 1 is established for s=2.

3. Case s>2. We have from (90), (91) and (94)

 \displaystyle\mathbb{E}Y \displaystyle\leq \displaystyle c_{1}(s)\bigl{[}\sqrt{n}\Sigma_{s}(w,f)+2n^{1/s}M_{s}(w)\bigr{]}, \displaystyle b \displaystyle\leq \displaystyle 2M_{s}(w);\qquad\sigma^{2}\leq c_{2}\bigl{(}s/(s-1)\bigr{)}M^{2}% _{2s/(s+2),\tau,\nu^{\prime}}(w).

Thus, for any z>0 we get

 \displaystyle{\mathbb{P}}\bigl{\{}\|\xi_{w}\|_{s,\tau}\geq c_{1}(s)\bigl{[}% \sqrt{n}\Sigma_{s}(w,f)+2n^{1/s}M_{s}(w)\bigr{]}+z\bigr{\}} \displaystyle\qquad\leq\exp\bigl{\{}-{z^{2}}\bigl{(}2c_{3}(s)\bigl{[}nM^{2}_{{% 2s}/({s+2}),\tau,\nu^{\prime}}(w)+4\sqrt{n}\Sigma_{s}(w,f)M_{s}(w) \displaystyle                                                    {}+8n^{1/s}M^% {2}_{s}(w)+\tfrac{2}{3}M_{s}(w)z\bigr{]}\bigr{)}^{-1}\bigr{\}},

where c_{3}(s) is given in (27). This completes the proof of the theorem for the case of s>2.

We conclude by establishing the inequalities in (31). In order to derive the first inequality, we apply (82) of Lemma 3 with p=s/2>1, \chi=\nu to the integral operator \mathcal{I}_{f}(t):=\int w^{2}(t,x)f(x)\nu({d}x). This yields

 \biggl{[}\int\biggl{(}\int w^{2}(t,x)f(x)\nu({d}x)\biggr{)}^{s/2}\tau({d}t)% \biggr{]}^{1/s}\leq M_{2}(w)\bigl{\|}\sqrt{f}\bigr{\|}_{s,\nu},

as claimed. The second inequality in (31) follows straightforwardly from the definition of M_{p,\tau,\nu^{\prime}} and M_{p}.

## 7 Proofs of Theorem 3 and Corollary 4

### 7.1 Proof of Theorem 3

First, we specify the constants appearing in the statement of the theorem:

 \displaystyle T_{1,\epsilon} \displaystyle:= \displaystyle\biggl{(}\frac{2^{q(\epsilon+1)}}{2^{q\epsilon}-1}\Gamma(q+1)+1% \biggr{)}N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)(2u_{\epsilon}R_{\xi})^{q}[1% \vee\log_{2}(R_{\xi}/r_{\xi})]\bigl{[}1+L^{(\epsilon)}_{\exp}\bigr{]}, \displaystyle T_{2,\epsilon} \displaystyle:= \displaystyle[c_{1}(s)+2]^{q}N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)[1\vee\log% _{2}(R_{\xi}/r_{\xi})]\bigl{[}1+L^{(\epsilon)}_{\exp}\bigr{]}.

Recall that in view of (32), any w\in\mathcal{W} is represented as w=\phi[\zeta] for some \zeta\in{\mathbb{Z}}. For every 0\leq j\leq[\log_{2}(R_{\xi}/r_{\xi})-1]_{+} [without loss of generality, we assume that \log_{2}(R_{\xi}/r_{\xi}) is an integer number], put \delta_{j}=r2^{j+1}, and define the random events

 \mathcal{A}:=\bigcap_{j=0}^{[\log_{2}(R_{\xi}/r_{\xi})-1]_{+}}\mathcal{A}_{j},% \qquad\mathcal{A}_{j}:=\Bigl{\{}\sup_{\zeta\in{\mathbb{Z}}_{\delta_{j}}}\bigl{% \|}\xi_{\phi^{2}[\zeta]}\bigr{\|}_{s/2,\tau}\leq[2(1+\epsilon)\gamma\delta_{j}% ]^{2}\Bigr{\}}.

(i) The following trivial inequality holds:

Therefore,

where \overline{\mathcal{A}}_{j} denotes the event complementary to \mathcal{A}_{j}, and {\mathbf{1}}(\mathcal{A}) is the indicator of the event \mathcal{A}. The second term on the right-hand side is bounded using Theorem 2; our current goal is to bound the first and the third terms.

Note that, if the event \mathcal{A} occurs then for every \zeta\in{\mathbb{Z}}

 \displaystyle U_{\xi}(\phi[\zeta],f)[1+4c_{1}(s)(1+\epsilon)\gamma] (97) \displaystyle\qquad\geq\hat{U}_{\xi}(\phi[\zeta])\geq U_{\xi}(\phi[\zeta],f)[1% -4c_{1}(s)(1+\epsilon)\gamma].

Indeed, in view of (3.2.1), (38) and (39) we get

 \displaystyle\hat{U}_{\xi}(\phi[\zeta]) \displaystyle\geq \displaystyle U_{\xi}(\phi[\zeta],f)-|\hat{U}_{\xi}(\phi[\zeta])-U_{\xi}(\phi[% \zeta],f)| \displaystyle= \displaystyle U_{\xi}(\phi[\zeta],f)-c_{1}(s)\sqrt{n}|\hat{\Sigma}_{s}(\phi[% \zeta])-\Sigma_{s}(\phi[\zeta],f)| \displaystyle\geq \displaystyle U_{\xi}(\phi[\zeta],f)-c_{1}(s)\sqrt{\bigl{\|}\xi_{\phi^{2}[% \zeta]}\bigr{\|}_{s/2,\tau}}.

Let \zeta\in{\mathbb{Z}} be fixed. Since {\mathbb{Z}}_{\delta_{j}}, j=0,\ldots,[\log_{2}(R_{\xi}/r_{\xi})-1]_{+}, defined in (40), form the partition of {\mathbb{Z}}, there exists j_{*} such that \zeta\in{\mathbb{Z}}_{\delta_{j_{*}}}. Because \zeta\in{\mathbb{Z}}_{\delta_{j_{*}}} implies U_{\xi}(\phi[\zeta],f)\geq\delta_{j_{*}}/2=\delta_{j_{*}-1}, we obtain from (7.1) on the event \mathcal{A} that

 \displaystyle\hat{U}_{\xi}(\phi[\zeta]) \displaystyle\geq \displaystyle U_{\xi}(\phi[\zeta],f)-2c_{1}(s)(1+\epsilon)\gamma\delta_{j_{*}} \displaystyle\geq \displaystyle U_{\xi}(\phi[\zeta],f)[1-4c_{1}(s)(1+\epsilon)\gamma].

Thus, the right-hand side inequality in (7.1) is proved. Similarly, we have from (7.1) and (7.1) that

 \displaystyle\hat{U}_{\xi}(\phi[\zeta]) \displaystyle\leq \displaystyle U_{\xi}(\phi[\zeta],f)+|\hat{U}_{\xi}(\phi[\zeta])-U_{\xi}(\phi[% \zeta],f)| \displaystyle\leq \displaystyle U(\phi[\zeta])+c_{1}(s)\sqrt{\bigl{\|}\xi_{\phi^{2}[\zeta]}\bigr% {\|}_{s/2,\tau}} \displaystyle\leq \displaystyle U_{\xi}(\phi[\zeta],f)[1+4c_{1}(s)(1+\epsilon)\gamma].

Thus, (7.1) is proved.

Using the right-hand side inequality in (7.1) and applying Theorem 2, we obtain

 \displaystyle\mathbb{E}\Bigl{[}\sup_{\zeta\in{\mathbb{Z}}}\bigl{\{}\bigl{\|}% \xi_{\phi[\zeta]}\bigr{\|}_{s,\tau}-\overline{u}_{\epsilon}(\gamma)C_{\xi}^{*}% (y)\hat{U}_{\xi}(\phi[\zeta])\bigr{\}}^{q}_{+}{\mathbf{1}}(\mathcal{A})\Bigr{]} \displaystyle\qquad\leq\mathbb{E}\sup_{\zeta\in{\mathbb{Z}}}\bigl{\{}\bigl{\|}% \xi_{\phi[\zeta]}\bigr{\|}_{s,\tau}-u_{\epsilon}C_{\xi}^{*}(y)U_{\xi}(\phi[% \zeta],f)\bigr{\}}^{q}_{+} (100) \displaystyle\qquad\leq\frac{2^{q(\epsilon+1)}u_{\epsilon}^{q}}{2^{q\epsilon}-% 1}\Gamma(q+1)N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)[R_{\xi}C_{\xi}^{*}(1)]^{q% }\bigl{[}1+L_{\exp}^{(\epsilon)}\bigr{]}\exp\{-y/2\}.

Now we bound the probability {\mathbb{P}}\{\overline{\mathcal{A}}_{j}\}. Let Z_{l}, l=1,\ldots,N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8) be a minimal covering of {\mathbb{Z}} by balls of radius \epsilon/8 in the metric \mathrm{d}. By definition of \mathcal{A}_{j}, we have

 {\mathbb{P}}\{\overline{\mathcal{A}}_{j}\}\leq\sum_{l=1}^{N_{{\mathbb{Z}},% \mathrm{d}}(\epsilon/8)}{\mathbb{P}}\Bigl{\{}\sup_{\zeta\in Z_{l}\cap{\mathbb{% Z}}_{\delta_{j}}}\bigl{\|}\xi_{\phi^{2}[\zeta]}\bigr{\|}_{s/2,\tau}\geq[2(1+% \epsilon)\gamma\delta_{j}]^{2}\Bigr{\}}. (101)

Note that

 [2\gamma\delta_{j}]^{2}\geq\varkappa_{\tilde{U}}({\mathbb{Z}}_{\delta_{j}})+% \delta_{j}^{2}\bigl{[}\sqrt{y_{\gamma}}\lambda_{\tilde{A}}+y_{\gamma}\lambda_{% \tilde{B}}\bigr{]}\geq\varkappa_{\tilde{U}}(Z_{l}\cap{\mathbb{Z}}_{\delta_{j}}% )+\delta^{2}_{j}\bigl{[}\sqrt{y}\lambda_{\tilde{A}}+y\lambda_{\tilde{B}}\bigr{% ]};

here the first inequality follows from the condition \varkappa_{\tilde{U}}({\mathbb{Z}}_{a})=\varkappa_{\tilde{U}}(a)\leq(\gamma a)% ^{2}, \forall a\in[r_{\xi},R_{\xi}] and from definition of y_{\gamma}; the second inequality holds by the inclusion Z_{l}\cap{\mathbb{Z}}_{\delta_{j}}\subseteq{\mathbb{Z}}_{\delta_{j}} and because y\leq y_{\gamma}. Furthermore, by (41) and by the above inclusion

 \displaystyle\lambda_{\tilde{A}} \displaystyle\geq \displaystyle\delta_{j}^{-2}\Lambda_{\tilde{A}}({\mathbb{Z}}_{\delta_{j}})\geq% \delta_{j}^{-2}\Lambda_{\tilde{A}}({\mathbb{Z}}_{\delta_{j}}\cap Z_{l}), \displaystyle\lambda_{\tilde{B}} \displaystyle\geq \displaystyle\delta_{j}^{-2}\Lambda_{\tilde{B}}({\mathbb{Z}}_{\delta_{j}})\geq% \delta_{j}^{-2}\Lambda_{\tilde{B}}({\mathbb{Z}}_{\delta_{j}}\cap Z_{l}),

 \displaystyle[2\gamma\delta_{j}]^{2} \displaystyle\geq \displaystyle\varkappa_{\tilde{U}}(Z_{l}\cap{\mathbb{Z}}_{\delta_{j}})+\sqrt{y% }\Lambda_{\tilde{A}}({\mathbb{Z}}_{\delta_{j}}\cap Z_{l})+y\Lambda_{\tilde{B}}% ({\mathbb{Z}}_{\delta_{j}}\cap Z_{l}) \displaystyle= \displaystyle\varkappa_{\tilde{U}}(Z_{l}\cap{\mathbb{Z}}_{\delta_{j}})+\tilde{% C}_{*}(y,Z_{l}\cap{\mathbb{Z}}_{\delta_{j}}),

where \tilde{C}_{*}(y,\cdot):=\sqrt{y}\Lambda_{\tilde{A}}(\cdot)+y\Lambda_{\tilde{B}% }(\cdot) [cf. (16)].

Hence, applying Proposition 1, we obtain from (101) that

 \displaystyle\qquad{\mathbb{P}}\{\overline{\mathcal{A}}_{j}\} \displaystyle\leq \displaystyle\sum_{l=1}^{N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)}{\mathbb{P}}% \Bigl{\{}\sup_{\zeta\in Z_{l}\cap{\mathbb{Z}}_{\delta_{j}}}\bigl{\|}\xi_{\phi^% {2}[\zeta]}\bigr{\|}_{s/2,\tau} \displaystyle               \geq(1+\epsilon)[\varkappa_{\tilde{U}}(Z_{l}\cap{% \mathbb{Z}}_{\delta_{j}})+\tilde{C}^{*}(y,Z_{l}\cap{\mathbb{Z}}_{\delta_{j}})]% \Bigr{\}} \displaystyle\leq \displaystyle N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)\Biggl{[}\exp\{-y\}+\sum_% {k=0}^{\infty}\exp\{2\mathcal{E}_{Z,\mathrm{d}}(\epsilon 2^{-k})-9y2^{k-3}k^{-% 2}\}\Biggr{]} \displaystyle\leq \displaystyle N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)\bigl{[}1+L^{(\epsilon)}_% {\exp}\bigr{]}\exp\{-y/2\},

where we have used that y\geq 1.

Finally, combining (7.1), (7.1), the bound of Theorem 2, and (7.1) we come to the first assertion of the theorem. Here we also used that C_{\xi}^{*}(1)\leq C_{\xi}(y) because y\geq 1.

(ii) In order to prove the second statement, we note first the following nonrandom bound: since \hat{\Sigma}_{s}(w)\leq M_{s}(w) for all w\in\mathcal{W} and s>2,

 \hat{U}_{s}(w)\leq M_{s}(w)\bigl{[}c_{1}(s)\sqrt{n}+2n^{1/s}\bigr{]}\leq[c_{1}% (s)+2]\sqrt{n}M_{s}(w)\qquad\forall w\in\mathcal{W}.

Next, the left-hand side inequality in (7.1) implies that for any subset \mathcal{W}_{0}\!\subseteq\!\mathcal{W}

 \mathcal{A}\subseteq\Bigl{\{}\sup_{w\in\mathcal{W}_{0}}\hat{U}_{\xi}(w)<[1+4c_% {1}(s)(1+\epsilon)\gamma]\sup_{w\in\mathcal{W}_{0}}U_{\xi}(w,f)\Bigr{\}}=:% \mathcal{A}_{0}.

Therefore {\mathbb{P}}(\overline{\mathcal{A}}_{0})\leq{\mathbb{P}}(\overline{\mathcal{A}}) and

 \mathbb{E}\{[\hat{U}(w)]^{q}{\mathbf{1}}(\overline{\mathcal{A}}_{0})\}\leq[c_{% 1}(s)+2]^{q}\bigl{[}\sqrt{n}M_{s}(w)\bigr{]}^{q}{\mathbb{P}}(\overline{% \mathcal{A}}).

Using (7.1) with y=y_{\gamma}, and definition of the event \mathcal{A}, we complete the proof.

### 7.2 Proof of Corollary 4

First, as in (7.1), we need to bound \breve{U}_{\xi}(w):=\max\{\hat{U}_{\xi}(w),\sqrt{n}M_{2}(w)\} from above and from below in terms of \overline{U}_{\xi}(w,f):=\max\{U_{\xi}(w,f),\sqrt{n}M_{2}(w)\}. Such bounds are easily derived from the following trivial fact: for any positive A,B, C and any \delta\in(0,1)

 A(1+\delta)\geq B\geq A(1-\delta)\quad\Rightarrow\quad[A\vee C](1+\delta)\geq[% B\vee C]\geq[A\vee C](1-\delta).

Next, (7.1) remains valid because, by construction, U_{\xi}(w,f)\leq\overline{U}_{\xi}(w,f) and the assumptions, allowing to apply Theorem 2 are imposed now on \overline{U}_{\xi}(w,f) instead of U_{\xi}(w,f). The computations leading to (7.1) remain also unchanged if U_{\xi}(w,f) is replaced by \overline{U}_{\xi}(w,f). Note that now \lambda_{\tilde{A}} and \lambda_{\tilde{B}} are defined via \overline{U}_{\xi}(w,f).

## 8 Proofs of Theorems 4, 5

### 8.1 Proof of Theorem 4

The proof is based on an application of Theorem 2.

Put

 \displaystyle T_{3,\epsilon} \displaystyle:= \displaystyle\frac{2^{q(\epsilon+1)}u_{\epsilon}^{q}}{2^{q\epsilon}-1}\Gamma(q% +1)N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)\bigl{[}1+L^{(\epsilon)}_{\exp}\bigr% {]}[4\overline{\mathrm{w}}_{s}(1+4n^{1/2-1/s})]^{q}; \displaystyle T_{4,\epsilon} \displaystyle:= \displaystyle\frac{2^{q(\epsilon+1)}u_{\epsilon}^{q}}{2^{q\epsilon}-1}\Gamma(q% +1)N_{{\mathbb{Z}},\mathrm{d}}(\epsilon/8)\bigl{[}1+L^{(\epsilon)}_{\exp}\bigr% {]} \displaystyle{}\times\overline{\mathrm{w}}^{q}_{2}\bigl{\{}1+2\sqrt{2\mu_{*}% \mathrm{f}_{\infty}^{2}+8n^{-1/2}}+(8/3)n^{-1/2}\bigr{\}}^{q}.

We have M_{p}(w)=\|w\|_{p} for all w\in\mathcal{V} and p\geq 1, and (3.2.1) yields

 U_{\xi}(w,f)=\cases{4n^{1/s}\|w\|_{s},&\quad$s\in[1,2)$,\cr\sqrt{n}\|w\|_{2},&% \quad$s=2$.} (103)

Therefore, in view of (35)

 \quad r_{\xi}=\cases{4n^{1/s}\underline{\mathrm{w}}_{s},&\quad$s\in[1,2)$,\cr% \sqrt{n}\underline{\mathrm{w}}_{2},&\quad$s=2$,}\qquad R_{\xi}=\cases{4n^{1/s}% \overline{\mathrm{w}}_{s},&\quad$s\in[1,2)$,\cr\sqrt{n}\overline{\mathrm{w}}_{% 2},&\quad$s=2$.} (104)

It follows from (48), the Hölder inequality and the formulas for A_{\xi}^{2}(w) and B_{\xi}(w) immediately after (3.2.1) that

 \displaystyle A_{\xi}^{2}(w) \displaystyle\leq \displaystyle\cases{37n\|w\|^{2}_{s},&\quad$s\in[1,2)$,\cr\bigl{[}2\mathrm{f}_% {\infty}^{2}n\mu_{*}+8\sqrt{n}\bigr{]}\|w\|^{2}_{2},&\quad$s=2$,} (105) \displaystyle B_{\xi}(w) \displaystyle= \displaystyle\cases{0,&\quad$s\in[1,2)$,\cr\frac{4}{3}\|w\|_{2},&\quad$s=2$.}

In order to apply Theorem 2, we need to check that \varkappa_{U_{\xi}}(a)\leq a for all a\in[r_{\xi},R_{\xi}].

Let s\,{\in}\,[1,2); here {\mathbb{Z}}_{a}\,{=}\,\{\zeta\dvtx a/2\,{<}\,4n^{1/s}\|\phi[\zeta]\|_{s}\,{=}% \,4n^{1/2}\|w\|_{s}\,{\leq}\,a\}; see (40). By (103), Assumption 3.3 and because {\mathbb{Z}}_{a}\subseteq{\mathbb{Z}}_{s}(a/4) we have

 \sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}}\frac{U_{\xi}(\phi[\zeta_{1}]-% \phi[\zeta_{2}],f)}{\mathrm{d}(\zeta_{1},\zeta_{2})}\leq\sup_{\zeta_{1},\zeta_% {2}\in{\mathbb{Z}}_{s}(a/4)}\frac{4n^{1/s}\|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_% {s}}{\mathrm{d}(\zeta_{1},\zeta_{2})}\leq a.

If s=2, then {\mathbb{Z}}_{a}=\{\zeta\dvtx a/2<\sqrt{n}\|\phi[\zeta]\|_{2}=\sqrt{n}\|w\|_{2% }\leq a\}, and again by Assumption 3.3 \sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}}[\sqrt{n}\|\phi[\zeta_{1}]-\phi[% \zeta_{2}]\|_{2}/\mathrm{d}(\zeta_{1},\zeta_{2})]\leq a. Thus, \varkappa_{U_{\xi}}(a)\leq a for all a\in[r_{\xi},R_{\xi}], and Theorem 2 can be applied. To this end, we should compute the quantities \Lambda_{A_{\xi}} and \Lambda_{B_{\xi}} [see (14), (15) and (20)].

For s\in[1,2), we have by (105), definition of {\mathbb{Z}}_{a} and Assumption 3.3 that

 \displaystyle\sup_{\zeta\in{\mathbb{Z}}_{a}}A_{\xi}(\phi[\zeta]) \displaystyle= \displaystyle\sup_{\zeta\in{\mathbb{Z}}_{a}}\sqrt{37n}\|\phi[\zeta]\|_{s}=% \frac{\sqrt{37}}{4}an^{1/2-1/s}, \displaystyle\sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}}\frac{A_{\xi}(\phi[% \zeta_{1}]-\phi[\zeta_{2}])}{\mathrm{d}(\zeta_{1},\zeta_{2})} \displaystyle\leq \displaystyle\sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{s}(a/4)}\frac{\sqrt{37}% n^{1/2}\|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{s}}{\mathrm{d}(\zeta_{1},\zeta_{2})} \displaystyle\leq \displaystyle\frac{\sqrt{37}}{4}an^{1/2-1/s}.

Similarly, if s=2 then \sup_{\zeta\in{\mathbb{Z}}_{a}}A_{\xi}(\phi[\zeta])\leq a(2\mathrm{f}_{\infty}% ^{2}\mu_{*}+8n^{-1/2})^{1/2} and

 \displaystyle\sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}}\frac{A_{\xi}(\phi[% \zeta_{1}]-\phi[\zeta_{2}])}{\mathrm{d}(\zeta_{1},\zeta_{2})} \displaystyle\leq \displaystyle\sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{2}(a)}\bigl{[}2\mathrm{% f}_{\infty}^{2}n\mu_{*}+8\sqrt{n}\bigr{]}^{1/2}\frac{\|\phi[\zeta_{1}]-\phi[% \zeta_{2}]\|_{2}}{\mathrm{d}(\zeta_{1},\zeta_{2})} \displaystyle\leq \displaystyle a\biggl{(}2\mathrm{f}_{\infty}^{2}\mu_{*}+\frac{8}{\sqrt{n}}% \biggr{)}^{1/2}.

These computations and similar computations for \Lambda_{B_{\xi}} yield

 \displaystyle\Lambda_{A_{\xi}} \displaystyle\leq \displaystyle\cases{\frac{\sqrt{37}}{4}n^{1/2-1/s},&\quad$s\in[1,2)$,\cr[2% \mathrm{f}_{\infty}^{2}\mu_{*}+8n^{-1/2}]^{1/2},&\quad$s=2$,} \displaystyle\Lambda_{B_{\xi}} \displaystyle= \displaystyle\cases{0,&\quad$s\in[1,2)$,\cr\frac{4}{3}n^{-1/2},&\quad$s=2$.}

Recall that C_{\xi}^{*}(y)=1+2\sqrt{y}\Lambda_{A_{\xi}}+2y\Lambda_{B_{\xi}} [see (21)]. Therefore if for arbitrary z>0, we set

 y=\cases{\dfrac{4}{37}n^{(2/s)-1}z^{2},&\quad$s\in[1,2)$,\cr\dfrac{z^{2}}{8}[% \mathrm{f}_{\infty}^{2}\mu_{*}+4n^{-1/2}]^{-1},&\quad$s=2$,}

then we get C^{*}_{\xi}(y)=1+z if s\in[1,2) and

 C^{*}_{\xi}(y)=1+z+\frac{z^{2}}{3\sqrt{n}[\mathrm{f}_{\infty}^{2}\mu_{*}+4n^{-% 1/2}]}\leq 1+z+\frac{z^{2}}{12},

if s=2. Then the statements (i) and (ii) follow by application of the moment bound of Theorem 2. Observe that C_{\xi}(1)=1+\frac{\sqrt{37}}{2}n^{-1/2-1/s} for s\in[1,2), and C_{\xi}(1)=1+2[2\mathrm{f}_{\infty}^{2}\mu_{*}+8n^{-1/2}]^{1/2}+\frac{4}{3}n^{% -1/2} for s=2; R_{\xi} is given in (104). These expressions along with the moment bound of Theorem 2 lead to the formulas for T_{1,\epsilon} and T_{2,\epsilon} given in the beginning of the proof.

### 8.2 Proof of Theorem 5

First, we specify the constants appearing in the statement of the theorem. Put \alpha_{*}:=\alpha_{1}^{-1}\alpha_{2}^{-1/2} where \alpha_{1} and \alpha_{2} appear in Assumption (W2); then

Define also

 \displaystyle k_{*} \displaystyle:= \displaystyle 8\alpha_{*}^{2}c_{1}(s)[C_{s}\vee C_{s/2}\vee 1], (107) \displaystyle L^{(\epsilon)}_{*}(\beta) \displaystyle:= \displaystyle\sum_{k=1}^{\infty}\exp\{2^{1+k\beta/m}(k_{*}^{-1}\epsilon)^{-% \beta/m}-(9/16)2^{k}k^{-2}\},

and note that L^{(\epsilon)}_{*}(\beta)<\infty because \beta<m. If we set I_{\epsilon}(q):=2^{q(\epsilon+1)}[2^{q\epsilon}-1]^{-1}\Gamma(q+1)+1, then the constants T_{5,\epsilon} and T_{6,\epsilon} appearing in the statement of the theorem are given by

 \displaystyle T_{5,\epsilon} \displaystyle:= \displaystyle I_{\epsilon}(q)(2u_{\epsilon}k_{*}\overline{\mathrm{w}}_{2})^{q}% N_{{\mathbb{Z}},\mathrm{d}}([k_{*}^{-1}\epsilon/8]^{1/m}) \displaystyle{}\times\log_{2}\biggl{(}\frac{k_{*}\overline{\mathrm{w}}_{2}}{% \underline{\mathrm{w}}_{2}}\biggr{)}\bigl{[}1+L^{(\epsilon)}_{*}(\beta)\exp\{2% C_{\mathbb{Z}}(\beta)\}\bigr{]}, \displaystyle T_{6,\epsilon} \displaystyle:= \displaystyle[c_{1}(s)+2]^{q}(\alpha_{*}\overline{\mathrm{w}}_{2})^{q}N_{{% \mathbb{Z}},\mathrm{d}}([k_{*}^{-1}\epsilon/8]^{1/m}) \displaystyle{}\times\log_{2}\biggl{(}\frac{k_{*}\overline{\mathrm{w}}_{2}}{% \underline{\mathrm{w}}_{2}}\biggr{)}\bigl{[}1+L^{(\epsilon)}_{*}(\beta)\exp\{2% C_{\mathbb{Z}}(\beta)\}\bigr{]}.

The proof is based on application of Theorem 3 and Corollary 4. These results will be utilized with a distance \mathrm{d}_{*} on \mathfrak{Z} which is related to the original distance \mathrm{d}, and specified below. In order to apply Theorem 3, we need to verify its conditions and to compute the quantities \Lambda_{A_{\xi}}, \Lambda_{B_{\xi}}, \lambda_{\tilde{A}}, \lambda_{\tilde{B}} and y_{\gamma}. These computations are routine and tedious.

We break the proof into steps.

0{}^{0}. Auxiliary results. We begin with preliminary results that will be used in the subsequent proof.

###### Lemma 6

Let (46) hold and Assumptions (W2) and (W3) be satisfied; then for all w\in\mathcal{W} and 1\leq p<q\leq\infty one has

 [n^{1/q}M_{q}(w)]\leq\alpha^{-1}_{1}\alpha_{2}^{-1/p}\mu^{1/q-1/p}[n^{1/p}M_{p% }(w)].
{pf}

Recall that under (46), M_{p}(w)=\|w\|_{p} for all p\geq 1. In view of Assumption (W2) for any w\in\mathcal{V}, we have

 \alpha_{1}\alpha_{2}^{1/p}\|w\|_{\infty}[\operatorname{mes}\{\operatorname{% supp}(w)\}]^{1/p}\leq\|w\|_{p}\leq\|w\|_{\infty}[\operatorname{mes}\{% \operatorname{supp}(w)\}]^{1/p}.

This inequality, together with Assumption (W3), yields

 \displaystyle n^{1/q}\|w\|_{q} \displaystyle\leq \displaystyle n^{1/q}\|w\|_{\infty}[\operatorname{mes}\{\operatorname{supp}(w)% \}]^{1/q} \displaystyle= \displaystyle\frac{n^{1/p}\|w\|_{\infty}[\operatorname{mes}\{\operatorname{% supp}(w)\}]^{1/p}}{[n\operatorname{mes}\{\operatorname{supp}(w)\}]^{1/p-1/q}} \displaystyle\leq \displaystyle\alpha^{-1}_{1}\alpha_{2}^{-1/p}\mu^{1/q-1/p}[n^{1/p}\|w\|_{p}].
\upqed

Our next lemma demonstrates that there exists a real number m_{p}\in(0,1] such that (52) holds.

###### Lemma 7

Let Assumptions 3.3 and 3.3 hold; then for any p\geq 2, the inequality (52) is valid with m_{p}=2/p and C_{p}=(2\alpha_{*})^{1-2/p}\mu^{1/p-1/2}, that is,

 \sup_{b\in[\underline{\mathrm{w}}_{2},\overline{\mathrm{w}}_{2}]}b^{-1}\sup_{% \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{2}(b)}\frac{n^{1/p}\|\phi[\zeta_{1}]-\phi[% \zeta_{2}]\|_{p}}{[\mathrm{d}(\zeta_{1},\zeta_{2})]^{2/p}}\leq(2\alpha_{*})^{1% -2/p}\mu^{1/p-1/2}.
{pf}

We obviously have for any p>2

 \|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{p}\leq(\|\phi[\zeta_{1}]\|_{\infty}+\|% \phi[\zeta_{2}]\|_{\infty})^{1-2/p}(\|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{2})^{% 2/p}.

Applying Lemma 6 with q=\infty and p=2, we have that \sup_{\zeta\in{\mathbb{Z}}_{2}(b)}\|\phi[\zeta]\|_{\infty}\leq b\alpha_{*}\mu^% {-1/2} for all b\in[\underline{\mathrm{w}}_{2},\overline{\mathrm{w}}_{2}]. Then in view of Assumption 3.3

 \sup_{\zeta_{1},\zeta_{1}\in{\mathbb{Z}}_{2}(b)}\frac{n^{1/p}\|\phi[\zeta_{1}]% -\phi[\zeta_{2}]\|_{p}}{[\mathrm{d}(\zeta_{1},\zeta_{2})]^{2/p}}\leq b(2\alpha% _{*})^{1-2/p}\mu^{1/m-1/2}\qquad\forall b\in[\underline{\mathrm{w}}_{2},% \overline{\mathrm{w}}_{2}],

as claimed.

###### Lemma 8

Let Assumptions 3.3 and 3.3 hold; then for any \zeta\in{\mathbb{Z}}_{a}

 \displaystyle\sqrt{n}\|\phi[\zeta]\|_{p} \displaystyle\leq \displaystyle\mu_{*}^{1/p-1/2}a\qquad\forall p\in[1,2), (110) \displaystyle n^{1/p}\|\phi[\zeta]\|_{p} \displaystyle\leq \displaystyle\alpha_{*}\mu^{1/p-1/2}a\qquad\forall p>2, (111) \displaystyle\sqrt{n}\|\phi^{2}[\zeta]\|_{p} \displaystyle\leq \displaystyle\alpha_{*}\mu^{-1/2}\mu_{*}^{1/p-1/2}a^{2}\qquad\forall p\in[1,2). (112)
{pf}

By the Hölder inequality \|\phi[\zeta]\|_{p}\leq\mu_{*}^{1/p-1/2}\|\phi[\zeta]\|_{2}; then (110) holds by definition of {\mathbb{Z}}_{a}. Inequality (111) follows Lemma 6. In order to prove (112), we write \|\phi^{2}[\zeta]\|_{p}\leq\|\phi[\zeta]\|_{\infty}\|\phi[\zeta]\|_{p}, note that by Lemma 6 \|\phi[\zeta]\|_{\infty}\leq\alpha_{*}\mu^{-1/2}\sqrt{n}\|\phi[\zeta]\|_{2} and use (111).

1{}^{0}. Notation. Now we establish some notation. Recall that U_{\xi}(w,f)=c_{1}(s)[\sqrt{n}\Sigma_{s}(w,f)+2n^{1/s}M_{s}(w)] and \overline{U}_{\xi}(w,f) is given by (3.3). It follows from the definition of \overline{U}_{\xi}(\cdot,f), (3.2.1) and (31) that \overline{U}_{\xi}(w,f)\geq\sqrt{n}\|w\|_{2} and

 \overline{U}_{\xi}(w,f)\leq c_{1}(s)\bigl{[}\mathrm{f}_{\infty}^{1/2-1/s}\sqrt% {n}\|w\|_{2}+2n^{1/s}\|w\|_{s}\bigr{]}\leq c_{1}(s)\alpha_{*}[\mathrm{f}_{% \infty}^{1/2}+2]\sqrt{n}\|w\|_{2},

where the last inequality is a consequence of Lemma 6. Therefore, we put

where \underline{\mathrm{w}}_{p} and \overline{\mathrm{w}}_{p} are defined in (49). Recall also that {\mathbb{Z}}_{a}=\{\zeta\dvtx a/2<\overline{U}_{\xi}(\phi[\zeta],f)\leq a\}. By definition of \overline{U}_{\xi}(w,f) and by the fact that M_{p}(w)=\|w\|_{p} for all p\geq 1, we have that {\mathbb{Z}}_{a}\subseteq{\mathbb{Z}}_{2}(a) for all a\in[r_{\xi},R_{\xi}]; see (50). Define the distance

 \mathrm{d}_{*}(\zeta_{1},\zeta_{2})=k_{*}\times\cases{\mathrm{d}(\zeta_{1},% \zeta_{2})\vee[\mathrm{d}(\zeta_{1},\zeta_{2})]^{m_{s}},&\quad$s\in[1,4)$,\cr% \mathrm{d}(\zeta_{1},\zeta_{2})\vee[\mathrm{d}(\zeta_{1},\zeta_{2})]^{m_{s}}% \cr\qquad\vee\,[\mathrm{d}(\zeta_{1},\zeta_{2})]^{m_{s/2}},&\quad$s>4$,} (113)

where k_{*} is given in (107). Note that \mathrm{d}_{*}(\cdot,\cdot) is indeed a distance because by definition m_{p}\leq 1 for all p\geq 2.

2{}^{0}. Verification of condition (42). It follows from definition of \overline{U}_{\xi}(\cdot,f), (3.2.1) and (31) that

 \displaystyle\sqrt{n}\|\phi[\zeta]\|_{2} \displaystyle\leq \displaystyle\overline{U}_{\xi}(\phi[\zeta],f) \displaystyle\leq \displaystyle c_{1}(s)\bigl{[}\mathrm{f}_{\infty}^{1/2-1/s}\sqrt{n}\|\phi[% \zeta]\|_{2}+2n^{1/s}\|\phi[\zeta]\|_{s}\bigr{]}.

Therefore, by (8.2), Assumption 3.3 and (52) for any \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}

 \displaystyle\overline{U}_{\xi}(\phi[\zeta_{1}]-\phi[\zeta_{2}],f) \displaystyle\qquad\leq c_{1}(s)\bigl{[}\mathrm{f}_{\infty}^{1/2-1/s}\sqrt{n}% \|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{2}+2n^{1/s}\|\phi[\zeta_{1}]-\phi[\zeta_{% 2}]\|_{s}\bigr{]} \displaystyle\qquad\leq c_{1}(s)[\mathrm{f}_{\infty}^{1/2}+2C_{s}]a\{\mathrm{d% }(\zeta_{1},\zeta_{2})\vee[\mathrm{d}(\zeta_{1},\zeta_{2})]^{m_{s}}\}.

Thus

 \sup_{\zeta_{1},\zeta_{2}}\frac{\overline{U}_{\xi}(\phi[\zeta_{1}]-\phi[\zeta_% {2}],f)}{\mathrm{d}_{*}(\zeta_{1},\zeta_{2})}\leq a\qquad\forall a\in[r_{\xi},% R_{\xi}],

and (42) is valid, because k_{*}\geq c_{1}(s)[\mathrm{f}_{\infty}^{1/2}+2C_{s}]; see (107) and (113).

3{}^{0}. Computation of \varkappa_{\tilde{U}} and verification of (43).

We start with bounds on \sup_{\zeta\in{\mathbb{Z}}_{a}}\tilde{U}(\phi^{2}[\zeta]). Recall that

 \tilde{U}(\phi^{2}[\zeta])=\cases{4n^{2/s}M_{s/2}(\phi^{2}[\zeta]),&\quad$s\in% (2,4)$,\cr c_{1}(s/2)\bigl{[}\mathrm{f}_{\infty}^{1/2}\sqrt{n}M_{2}(\phi^{2}[% \zeta])\cr\hskip 37.13pt+\,2n^{2/s}M_{s/2}(\phi^{2}[\zeta])\bigr{]},&\quad$s% \geq 4$.} (115)

By (111), for any \zeta\in{\mathbb{Z}}_{a},

 \displaystyle         n^{2/s}\|\phi^{2}[\zeta]\|_{s/2} \displaystyle= \displaystyle(n^{1/s}\|\phi[\zeta]\|_{s})^{2}\leq\alpha_{*}^{2}\mu^{(2/s)-1}n% \|\phi[\zeta]\|_{2}^{2} \displaystyle\leq \displaystyle\alpha_{*}^{2}\mu^{(2/s)-1}a^{2}\qquad\forall s>2, \displaystyle\sqrt{n}\|\phi^{2}[\zeta]\|_{2} \displaystyle= \displaystyle(n^{1/4}\|\phi[\zeta]\|_{4})^{2}\leq\alpha_{*}^{2}\mu^{-1/2}n\|% \phi[\zeta]\|_{2}^{2}\leq\alpha_{*}^{2}\mu^{-1/2}a^{2}. (117)

Substituting these bounds in the expression for \tilde{U}(\phi^{2}[\zeta]) and taking into account that \mu\geq 1 in view of (W3) we obtain for all s>2

 \quad\sup_{\zeta\in{\mathbb{Z}}_{a}}\tilde{U}(\phi^{2}[\zeta])\leq k_{1}\mu^{{% 2}/({s\wedge 4})-1}a^{2},\qquad k_{1}:=4\alpha_{*}^{2}c_{1}(s/2)[\mathrm{f}_{% \infty}^{1/2}+2]. (118)

Now we establish bounds on \tilde{U}(\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]), \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}.

(a) First, we consider the case s\in(2,4). By the Hölder and triangle inequalities, we have

 \displaystyle n^{2/s}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{s/2} \displaystyle\qquad\leq n^{2/s-1/2}\bigl{[}\|\phi[\zeta_{1}]\|_{2s/(4-s)}+\|% \phi[\zeta_{2}]\|_{2s/(4-s)}\bigr{]}\sqrt{n}\|\phi[\zeta_{1}]-\phi[\zeta_{2}]% \|_{2}.

Noting that 2s/(4-s)>2 and applying (111), we have

Then using Assumption 3.3 we get

This along with (115) implies that for s\in(2,4)

 \tilde{U}(\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}])\leq 8\alpha_{*}\mu^{2/s-1}a% ^{2}\mathrm{d}(\zeta_{1},\zeta_{2})\qquad\forall\zeta_{1},\zeta_{2}\in{\mathbb% {Z}}_{a}. (120)

(b) Now assume that s\geq 4. We have for \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}

 \displaystyle            \sqrt{n}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{2} \displaystyle\leq \displaystyle\bigl{[}\|\phi[\zeta_{1}]\|_{\infty}+\|\phi[\zeta_{2}]\|_{\infty}% \bigr{]}\sqrt{n}\|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{2} \displaystyle\leq \displaystyle 2\alpha_{*}\mu^{-1/2}a^{2}\mathrm{d}(\zeta_{1},\zeta_{2}),

where we used Assumption 3.3, and (111) with p=\infty. Furthermore, we have for all \zeta_{1},\zeta_{2}\in{\mathbb{Z}}

 \displaystyle n^{2/s}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{s/2} \displaystyle\qquad\leq\bigl{[}\|\phi[\zeta_{1}]\|_{\infty}+\|\phi[\zeta_{2}]% \|_{\infty}\bigr{]}n^{2/s}\|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{s/2} (122) \displaystyle\qquad\leq 2C_{s/2}\alpha_{*}\mu^{-1/2}a^{2}\{\mathrm{d}(\zeta_{1% },\zeta_{2})\}^{m_{s/2}},

where we have used (111) with p=\infty and the definition of m_{p} [see (52)]. These inequalities lead to the following bound: for all s>4

 \displaystyle\tilde{U}(\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]) \displaystyle\leq \displaystyle 2\alpha_{*}c_{1}(s/2)[\mathrm{f}^{1/2}_{\infty}+2C_{s/2}] \displaystyle{}\times\mu^{-1/2}a^{2}\{\mathrm{d}(\zeta_{1},\zeta_{2})\vee[% \mathrm{d}(\zeta_{1},\zeta_{2})]^{m_{s/2}}\}.

Combining this with (120), we obtain that for all s>2

 \tilde{U}(\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}])\leq k_{2}\mu^{{2}/({s\wedge 4% })-1}a^{2}\{\mathrm{d}(\zeta_{1},\zeta_{2})\vee\mathrm{d}^{m_{s/2}}(\zeta_{1},% \zeta_{2})\}, (123)

where k_{2}:=8\alpha_{*}c_{1}(s/2)[\mathrm{f}_{\infty}^{1/2}+2C_{s/2}]. Now using (118) and (123), we obtain

 \varkappa_{\tilde{U}}(a)=\sup_{\zeta\in{\mathbb{Z}}_{a}}\frac{\tilde{U}(\phi^{% 2}[\zeta_{1}]-\phi^{2}[\zeta_{2}])}{\mathrm{d}_{*}(\zeta_{1},\zeta_{2})}\vee% \sup_{\zeta\in{\mathbb{Z}}_{a}}\tilde{U}(\phi^{2}[\zeta])\leq\mu^{{2}/({s% \wedge 4})-1}a^{2},

and the last bound holds because k_{*}\geq k_{1}\vee k_{2} [see (107)]. Thus, the condition (43) is valid with

 \gamma=\mu^{{1}/({s\wedge 4})-{1}/{2}}. (124)

Note that condition of the theorem \mu>[64c^{2}_{1}(s)]^{s\wedge 4/(s\wedge 4-2)} ensures that \gamma<[4c_{1}(1+\epsilon)]^{-1} for any \epsilon\in(0,1) as required in Theorem 3.

4{}^{0}. Bounding \Lambda_{A_{\xi}} and \Lambda_{B_{\xi}}. By the formula for A_{\xi}^{2}(w) given immediately after (3.2.1), and by (110) and (111), we have for \zeta\in{\mathbb{Z}}_{a}

 \displaystyle A_{\xi}^{2}(\phi[\zeta]) \displaystyle\leq \displaystyle 2c_{1}(s)\mathrm{f}_{\infty}^{2}\bigl{[}n\|\phi[\zeta]\|_{2s/(s+% 2)}^{2}+4\sqrt{n}\|\phi[\zeta]\|_{2}\|\phi[\zeta]\|_{s}+8n^{1/s}\|\phi[\zeta]% \|_{s}^{2}\bigr{]} \displaystyle\leq \displaystyle 2c_{1}(s)\mathrm{f}_{\infty}^{2}a^{2}[\mu_{*}^{2/s}+12\alpha_{*}% ^{2}n^{-1/s}]\leq 24\alpha_{*}^{2}c_{1}(s)\mathrm{f}_{\infty}^{2}a^{2}[\mu_{*}% ^{2/s}+n^{-1/s}].

Here we have used that \mu\geq 1, \alpha_{*}\geq 1 and we write c_{1}(s) instead of c_{3}(s) in the definition of A^{2}(\cdot) because for functions w(t,x) depending on t-x only the constant c_{2}(s) equals one [see (27) and remark after Lemma 3 in Section 6]. Thus,

 \sup_{\zeta\in{\mathbb{Z}}_{a}}A_{\xi}(\phi[\zeta])\leq 5\sqrt{c_{1}(s)}\alpha% _{*}\mathrm{f}_{\infty}a\bigl{[}\mu_{*}^{1/s}+n^{-1/(2s)}\bigr{]}.

In order to bound A_{\xi}^{2}(\phi[\zeta_{1}]-\phi[\zeta_{2}]), we note that for all \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}:

• by the Hölder inequality and by Assumption 3.3, \sqrt{n}\|\phi[\zeta_{1}]\!-\!\phi[\zeta_{2}]\|_{2s/(s+2)}\leq a\mu_{*}^{1/s}% \mathrm{d}(\zeta_{1},\zeta_{2});

• by Assumption 3.3, \sqrt{n}\|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{2}\leq a\mathrm{d}(\zeta_{1},% \zeta_{2});

• by (52), n^{1/s}\|\phi[\zeta_{1}]-\phi[\zeta_{2}]\|_{s}\leq C_{s}a[\mathrm{d}(\zeta_{1}% ,\zeta_{2})]^{m_{s}}.

Therefore,

 \sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}}\frac{A_{\xi}(\phi[\zeta_{1}]-% \phi[\zeta_{2}])}{\mathrm{d_{*}}(\zeta_{1},\zeta_{2})}\leq 5\sqrt{c_{1}(s)}% \mathrm{f}_{\infty}(C_{s}\vee 1)a\bigl{[}\mu_{*}^{1/s}+n^{-1/(2s)}\bigr{]}

and \Lambda_{A_{\xi}}\leq 5\sqrt{c_{1}(s)}\alpha_{*}\mathrm{f}_{\infty}(C_{s}\vee 1% )[\mu_{*}^{1/s}+n^{-1/(2s)}]. Similarly, since B_{\xi}(\phi[\zeta])=\frac{4}{3}c_{1}(s)\|\phi[\zeta]\|_{s}, we have by (111) that \Lambda_{B_{\xi}}\leq\frac{4}{3}c_{1}(s)(C_{s}\vee 1)\alpha_{*}n^{-1/s}. Thus, we have shown that

 \Lambda_{A_{\xi}}\leq k_{3}\bigl{[}\mu_{*}^{1/s}+n^{-1/(2s)}\bigr{]},\qquad% \Lambda_{B_{\xi}}\leq k_{3}n^{-1/s},\qquad k_{3}:=5c_{1}(s)\alpha_{*}\mathrm{f% }_{\infty}(C_{s}\vee 1).

These bounds on \Lambda_{A_{\xi}} and \Lambda_{B_{\xi}} lead to the definition of C_{\xi}^{*}(y) in (54) [see also (21)]. Note that \vartheta_{0} in (54) satisfies \vartheta_{0}=k_{3}.

5{}^{0}. Computation of \lambda_{\tilde{A}}, \lambda_{\tilde{B}} and y_{\gamma}.

(i) First, consider the case s\in(2,4). Recall that in this case \tilde{A}^{2}(\phi^{2}[\zeta])=37n\|\phi^{2}[\zeta]\|^{2}_{s/2}=37n\|\phi[% \zeta]\|_{s}^{4} and \tilde{B}(\phi^{2}[\zeta])=0. Hence, by (111)

 \sup_{\zeta\in{\mathbb{Z}}_{a}}\tilde{A}(\phi^{2}[\zeta])=\sup_{\zeta\in{% \mathbb{Z}}_{a}}\sqrt{37n}\|\phi[\zeta]\|^{2}_{s}\leq\sqrt{37}\alpha_{*}^{2}% \mu^{2/s-1}n^{1/2-2/s}a^{2}.

It follows from (119) that for any \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}

 \sqrt{37n}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{s/2}\leq 2\sqrt{37}% \alpha_{*}\mu^{2/s-1}n^{1/2-2/s}a^{2}\mathrm{d}(\zeta_{1},\zeta_{2}).

Combining these results, we obtain that \lambda_{\tilde{A}}\leq 2\sqrt{37}\alpha_{*}^{2}\mu^{2/s-1}n^{1/2-2/s} and \lambda_{\tilde{B}}=0 which, in turn, by (44) and (124) implies that

 y_{\gamma}=\gamma^{4}\lambda^{-2}_{\tilde{A}}\geq\bigl{(}2\sqrt{37}\alpha_{*}^% {2}\bigr{)}^{-2}n^{4/s-1}=:y_{*}.

This explains the definition of the constant \vartheta_{1} in (106).

(ii) Now let s\geq 4; here recall that

 \displaystyle\tilde{A}^{2}(\phi^{2}[\zeta]) \displaystyle= \displaystyle 2c_{1}(s/2)\mathrm{f}_{\infty}^{2}\bigl{[}n\|\phi^{2}[\zeta]\|_{% 2s/(s+4)}^{2}+4\sqrt{n}\|\phi^{2}[\zeta]\|_{2}\|\phi^{2}[\zeta]\|_{s/2} \displaystyle                                                   {}+8n^{2/s}\|% \phi^{2}[\zeta]\|_{s/2}^{2}\bigr{]}.

Observing that for \zeta\in{\mathbb{Z}}_{a}:

1. [(a)]

2. \sqrt{n}\|\phi^{2}[\zeta]\|_{2s/(s+4)}\leq\alpha_{*}\mu_{*}^{2/s}a^{2} by (112) and \mu\geq 1;

3. n^{1/s}\|\phi^{2}[\zeta]\|_{s/2}\leq\alpha_{*}^{2}n^{-1/s}a^{2} by (8.2);

4. \sqrt{n}\|\phi^{2}[\zeta]\|_{2}\|\phi^{2}[\zeta]\|_{s/2}\leq\alpha_{*}^{2}\mu^% {-1/2}n^{-2/s}a^{4} by (117) and (b),

we obtain

 \sup_{\zeta\in{\mathbb{Z}}_{a}}\tilde{A}(\phi[\zeta])\leq 5\sqrt{c_{1}(s/2)}% \mathrm{f}_{\infty}\alpha_{*}a[\mu_{*}^{2/s}+n^{-1/s}]. (125)

Similarly, for \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a} we have

 \displaystyle\sqrt{n}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{2s/(s+4)} \displaystyle\leq \displaystyle\mu_{*}^{2/s}\sqrt{n}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_% {2} \displaystyle\leq \displaystyle 2\alpha_{*}\mu_{*}^{2/s}a^{2}\mathrm{d}(\zeta_{1},\zeta_{2}), \displaystyle n^{1/s}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{s/2} \displaystyle\leq \displaystyle 2C_{s/2}\alpha_{*}n^{-1/s}a^{2}\{\mathrm{d}(\zeta_{1},\zeta_{2})% \}^{m_{s/2}},
 \displaystyle\bigl{[}\sqrt{n}\|\phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{2}\|% \phi^{2}[\zeta_{1}]-\phi^{2}[\zeta_{2}]\|_{s/2}\bigr{]}^{1/2} \displaystyle\qquad\leq 2\sqrt{C_{s/2}}\alpha_{*}n^{-1/s}a^{2}\{\mathrm{d}(% \zeta_{1},\zeta_{2})\vee[\mathrm{d}(\zeta_{1},\zeta_{2})]^{m_{s/2}}\},

where the first line follows from the Hölder inequality and (8.2); the second one follows from (8.2); and the third line follows from the two previous inequalities. This yields

 \sup_{\zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{a}}\frac{\tilde{A}(\phi^{2}[\zeta_{1% }]-\phi^{2}[\zeta_{2}])}{\mathrm{d}_{*}(\zeta_{1},\zeta_{2})}\leq 5\mathrm{f}_% {\infty}\sqrt{2c_{1}(s/2)}\alpha_{*}C_{s/2}a^{2}[\mu_{*}^{2/s}+n^{-1/s}].

Combining the last inequality with (125), we obtain

 \lambda_{\tilde{A}}\leq k_{4}[\mu_{*}^{2/s}+n^{-1/s}],\qquad k_{4}:=5\mathrm{f% }_{\infty}\sqrt{2c_{1}(s/2)}\alpha_{*}C_{s/2}.

Now in order to bound \lambda_{\tilde{B}} we recall that \tilde{B}(\phi^{2}[\zeta])=\frac{4}{3}c_{1}(s/2)\|\phi^{2}[\zeta]\|_{s/2}. Then (8.2) gives \sup_{\zeta\in{\mathbb{Z}}_{a}}\tilde{B}(\phi^{2}[\zeta])\leq\frac{4}{3}c_{1}(% s/2)\alpha_{*}^{2}n^{-2/s}a^{2}. This alongwith (8.2) leads to

 \lambda_{\tilde{B}}\leq k_{5}n^{-2/s},\qquad k_{5}:=\tfrac{8}{3}c_{1}(s/2)% \alpha_{*}^{2}C_{s/2}.

Combining these results with (44) and taking into account that, by (124), \gamma=\mu^{-1/4}\leq 1 for s\geq 4, we have

 \mu^{-1/4}=\sqrt{y_{\gamma}}\lambda_{\tilde{A}}+y_{\gamma}\lambda_{\tilde{B}}% \leq\bigl{[}\sqrt{y_{\gamma}}+y_{\gamma}\bigr{]}(k_{4}\vee k_{5})[\mu_{*}^{2/s% }+n^{-1/s}],

and an elementary calculation shows that

 y_{\gamma}\geq\mu^{-1/2}(k_{4}\vee k_{5})^{-2}[\mu_{*}^{2/s}+n^{-1/s}]^{-2}=:y% _{*}.

This inequality yields the constant \vartheta_{2} appearing in (106).

6{}^{0}. Application of Theorem 3. In order to apply Theorem 3 with the distance \mathrm{d}_{*}(\cdot,\cdot) given in (113), we need to compute the quantity

 L^{(\epsilon)}_{\exp}=\sum_{k=1}^{\infty}\exp\{2\mathcal{E}_{{\mathbb{Z}},% \mathrm{d}_{*}}(\epsilon 2^{-k})-(9/16)2^{k}k^{-2}\}.

Note that the entropy number \mathcal{E}_{{\mathbb{Z}},\mathrm{d}_{*}}(\cdot)=\ln\{N_{{\mathbb{Z}},\mathrm{% d}_{*}}(\cdot)\} is computed with respect to the distance \mathrm{d}_{*}. Therefore, we first express the entropy \mathcal{E}_{{\mathbb{Z}},\mathrm{d}_{*}}(\cdot) in terms of the original distance \mathrm{d} and then, using Assumption (W4), we derive a bound for L^{(\epsilon)}_{\exp}.

By the definition of the distance \mathrm{d}_{*}, for all \delta\in(0,1) and \zeta_{1},\zeta_{2}\in{\mathbb{Z}},

where m:=1\wedge m_{s} if s\in(2,4) and m:=1\wedge m_{s}\wedge m_{s/2} if s\geq 4. Therefore, N_{{\mathbb{Z}},\mathrm{d}_{*}}(\delta)\leq N_{{\mathbb{Z}},\mathrm{d}}([k_{*}% ^{-1}\delta]^{1/m}). In view of Assumption (W4), this yields

 \displaystyle\sup_{\delta\in(0,1)}\{\mathcal{E}_{{\mathbb{Z}},\mathrm{d}_{*}}(% \delta)-[k_{*}^{-1}\delta]^{-\beta/m}\} \displaystyle\leq \displaystyle\sup_{\delta\in(0,1)}\{\mathcal{E}_{{\mathbb{Z}},\mathrm{d}}([k_{% *}^{-1}\delta]^{1/m})-[k_{*}^{-1}\delta]^{-\beta/m}\} \displaystyle\leq \displaystyle\sup_{x\in(0,1)}\{\mathcal{E}_{{\mathbb{Z}},\mathrm{d}}(x)-x^{-% \beta}\}=C_{{\mathbb{Z}}}(\beta).

Thus, we obtain that

 \displaystyle L^{(\epsilon)}_{\exp} \displaystyle\leq \displaystyle\exp\{2C_{{\mathbb{Z}}}(\beta)\}\sum_{k=1}^{\infty}\exp\{2^{1+k% \beta/m}(k_{*}^{-1}\epsilon)^{-\beta/m}-(9/16)2^{k}k^{-2}\} \displaystyle= \displaystyle\exp\{2C_{{\mathbb{Z}}}(\beta)\}L_{*}^{(\epsilon)}(\beta).

Now the result of the theorem follows from the bounds of Theorem 3. The constants T_{5,\epsilon} and T_{6,\epsilon} given in the beginning of the proof are obtained from the expressions for T_{1,\epsilon} and T_{2,\epsilon} and bounds of Theorem 3. In particular, we used that in view of Lemma 6 \sqrt{n}\sup_{w\in\mathcal{V}_{0}}\|w\|_{s}\leq n^{({s-2})/({2s})}\alpha_{*}% \mu^{1/s-1/2}\overline{\mathrm{w}}_{2}.

## 9 Proof of Theorem 7

The proof is based on verification of conditions and application of Theorem 5. First, we establish auxiliary results that provide the basis for verification of Assumptions 3.3 and 3.3. Then, based on these results, we show that all conditions of Theorem 5 are fulfilled. This will yield the required result.

Let K and K^{\prime} be any functions satisfying Assumptions (K1) and (K2), and let h,h^{\prime} be given vectors from \mathcal{H}. Let \zeta=(K,h), \zeta^{\prime}=(K^{\prime},h^{\prime}), and recall that \phi_{1}[\zeta] is the mapping (K,h)\mapsto n^{-1}K_{h}.

Similarly, if K,Q,K^{\prime},Q^{\prime} are any functions satisfying Assumptions (K1) and (K2), and if h,h^{\prime},\mathfrak{h},\mathfrak{h}^{\prime} are vectors from \mathcal{H} then z=[(K,h),(Q,\mathfrak{h})], z^{\prime}=[(K^{\prime},h^{\prime}),(Q^{\prime},\mathfrak{h}^{\prime})], and \phi_{2}[z] is the mapping [(K,h),(Q,\mathfrak{h})]\mapsto n^{-1}(K_{h}\ast Q_{\mathfrak{h}}).

1{}^{0}. Auxiliary results. We begin with auxiliary results about properties of the mappings \phi_{1}[\zeta] and \phi_{2}[z]. The proofs of these results are given in the Appendix.

Define the function

 D(x):=e^{dx}\bigl{[}x+\tfrac{1}{2}L_{\mathcal{K}}\sqrt{d}(e^{x}-1)+\mathrm{k}_% {\infty}(e^{dx}-1)\bigr{]},\qquad x\geq 0, (126)

and put

where D^{\prime} is the first derivative of the function D.

The next lemma states that Assumption 3.3 is fulfilled for the mappings \zeta\mapsto\phi_{1}[\zeta] and z\mapsto\phi_{2}[z].

###### Lemma 9

Let Assumption 3.4 hold, and s\geq 1. If the sets {\mathbb{Z}}^{(i)}, i=1,2, are equipped with the distances \mathrm{d}^{(i)}_{\theta_{i}}(\cdot,\cdot) then Assumption 3.3 is valid for the mappings \zeta\mapsto\phi{}_{1}[\zeta] and z\mapsto\phi_{2}[z].

The next three statements provide a basis for verification of Assumption 3.3. For any h,h^{\prime}\in\mathcal{H}, let h\vee h^{\prime}=(h_{1}\vee h_{1}^{\prime},\ldots,h_{d}\vee h_{d}^{\prime}) and h\wedge h^{\prime}=(h_{1}\wedge h_{1}^{\prime},\ldots,h_{d}\wedge h_{d}^{% \prime}).

###### Lemma 10

Let Assumptions (K1) and (K2) hold; then for any p\!\in\![1,\infty]

 \displaystyle\|\phi_{1}[\zeta]\|_{p} \displaystyle= \displaystyle n^{-1}V_{h}^{-1+1/p}\|K\|_{p}\qquad\forall\zeta\in{\mathbb{Z}}^{% (1)}, (128) \displaystyle\|\phi_{1}[\zeta]-\phi_{1}[\zeta^{\prime}]\|_{p} \displaystyle\leq \displaystyle n^{-1}(V_{h\vee h^{\prime}})^{-1+1/p}D\bigl{(}\mathrm{d}^{(1)}_{% 1}(\zeta,\zeta^{\prime})\bigr{)}\qquad\forall\zeta,\zeta^{\prime}\in{\mathbb{Z% }}^{(1)}, (129) \displaystyle\|\phi_{2}[z]-\phi_{2}[z^{\prime}]\|_{p} \displaystyle\leq \displaystyle 2n^{-1}\mathrm{k}_{\infty}[(V_{h\vee h^{\prime}})\vee(V_{% \mathfrak{h}\vee\mathfrak{h}^{\prime}})]^{-1+1/p} \displaystyle{}\times D\bigl{(}2\mathrm{d}^{(2)}_{1}(z,z^{\prime})\bigr{)}% \qquad\forall z,z^{\prime}\in{\mathbb{Z}}^{(2)}.

Observe that Lemma 10 implies that Assumption 3.2 of Section 3.2 is fulfilled for the mappings \zeta\mapsto\phi_{1}[\zeta] and z\mapsto\phi_{2}[z].

###### Lemma 11

Let w\in\mathbb{H}_{d}(1,P) with some P>0, and let \tilde{x}\in{\mathbb{R}}^{d} be a point such that w(\tilde{x})=\|w\|_{\infty}>0; then

 \biggl{\{}x\in{\mathbb{R}}^{d}\dvtx|w(x)|\geq\frac{1}{2}\|w\|_{\infty}\biggr{% \}}\supseteq\bigotimes_{i=1}^{d}\biggl{[}\tilde{x}_{i}-\frac{\|w\|_{\infty}}{2% P\sqrt{d}},\tilde{x}_{i}+\frac{\|w\|_{\infty}}{2P\sqrt{d}}\biggr{]}.
###### Lemma 12

Under Assumption 3.4 for any p\geq 1:

 \displaystyle\mbox{\hphantom{ii}{(i)}\quad}\|\phi_{2}[z]\|_{p}\leq 2^{d/p}% \mathrm{k}^{2}_{\infty}n^{-1}(V_{h\vee\mathfrak{h}})^{-1+1/p}, \displaystyle\mbox{\hphantom{i}({ii})\quad}\|\phi_{2}[z]\|_{p}\geq 2^{{d(1-p)}% /{p}}\mathrm{k}^{2}_{1}n^{-1}(V_{h\vee\mathfrak{h}})^{-1+1/p}, \displaystyle\mbox{({iii})\quad}\operatorname{mes}\{\operatorname{supp}(\phi_{% 2}[z])\}\geq(V_{h\vee\mathfrak{h}})\biggl{[}\frac{\mathrm{k}^{2}_{1}}{2^{d+1}% \sqrt{d}L_{\mathcal{K}}\mathrm{k}_{\infty}}\biggr{]}^{d}, \displaystyle\mbox{ ({iv})\quad}\operatorname{mes}\biggl{\{}t\dvtx\phi_{2}[z](% t)\geq\frac{1}{2}\|\phi_{2}[z]\|_{\infty}\biggr{\}}\geq\biggl{[}\frac{\mathrm{% k}^{2}_{1}}{2^{d+2}\sqrt{d}L_{\mathcal{K}}\mathrm{k}_{\infty}}\biggr{]}^{d}% \operatorname{mes}\{\operatorname{supp}(\phi_{2}[z])\}.

2{}^{0}. Verification of conditions of Theorem 5. We check Assumption 3.3 for the classes of weights \mathcal{W}^{(1)} and \mathcal{W}^{(2)} given by the parametrization \phi_{1}[\zeta] and \phi_{2}[z].

First, we note that (W1) is fulfilled both for \phi_{1}[\zeta] and \phi_{2}[z] in view of Assumption (K1). Furthermore, Assumptions (K1) and (K2) together with Lemma 11 imply (W2) for \phi_{1}[\zeta] with

while the statement (iv) of Lemma 12 yield (W2) for \phi_{2}[z] with the constants

Clearly, \operatorname{mes}\{\operatorname{supp}(\phi_{1}[\zeta])\}\geq V_{h^{\min}}; hence the condition

 nV_{h^{\min}}>[64c^{2}_{1}(s)]^{({s\wedge 4})/({s\wedge 4-2})}

implies (W3) for \phi_{1}[\zeta] with \mu=nV_{h^{\min}}. It follows from the statement (iii) of Lemma 12 that Assumption (W3) holds for \phi_{2}[z] with \mu=nV_{h^{\min}} if

 nV_{h^{\min}}>\alpha_{2,2}^{-1}[64c^{2}_{1}(s)]^{({s\wedge 4})/({s\wedge 4-2})}.

Finally, a standard calculation shows that if \mathcal{E}_{\mathcal{H}}(\cdot) is the entropy number of the set \mathcal{H} measured in the distance \Delta_{\mathcal{H}} [see (56)] then for any \delta\in(0,1]

 \mathcal{E}_{\mathcal{H}}(\delta)\leq d\ln(3/\delta)+\sum_{i=1}^{d}(\ln\ln[h^{% \max}_{i}/h^{\min}_{i}])_{+}. (133)

This result together with (K3) guarantees that Assumption (W4) is fulfilled for the both parametrizations.

Now we compute the quantities m_{p} and C_{p} appearing in (52). Although Lemma 7 shows that we always can set m_{p}=2/p, it turns out that under Assumption 3.4 we can put m_{p}=1 for all p\geq 2 both for \phi_{1}[\zeta] and for \phi_{2}[z]. This leads to weaker conditions on the entropy \mathcal{E}_{\mathcal{K}}(\cdot) (see formulation of Theorem 5).

First, consider the mapping \phi_{1}[\zeta]; here following (50), we set

 {\mathbb{Z}}_{2}^{(1)}(b):=\{\zeta=(K,h)\dvtx n^{1/2}\|\phi_{1}[\zeta]\|_{2}% \leq b\}=\{\zeta=(K,h)\dvtx(nV_{h})^{-1/2}\|K\|_{2}\leq b\}

for b\in[\underline{\mathrm{w}}^{(1)}_{2},\overline{\mathrm{w}}^{(1)}_{2}] where by (49)

 \underline{\mathrm{w}}^{(1)}_{2}\geq\mathrm{k}_{1}(nV_{h^{\max}})^{-1/2},% \qquad\overline{\mathrm{w}}^{(1)}_{2}\leq\mathrm{k}_{\infty}(nV_{h^{\min}})^{-% 1/2}. (134)

By (129) of Lemma 10, we have for any p\geq 2 and \zeta_{1}=(K,h), \zeta_{2}=(K^{\prime},h^{\prime})

 n^{1/p}\|\phi_{1}[\zeta_{1}]-\phi_{1}[\zeta_{2}]\|_{p}\leq(nV_{h\vee h^{\prime% }})^{-1+1/p}D\bigl{(}\mathrm{d}_{1}^{(1)}(\zeta_{1},\zeta_{2})\bigr{)},

and if \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{2}^{(1)}(b) are such that \mathrm{d}_{1}^{(1)}(\zeta_{1},\zeta_{2})\leq 2 then by definition of {\mathbb{Z}}_{2}^{(1)}(b)

 n^{1/p}\|\phi_{1}[\zeta_{1}]-\phi_{1}[\zeta_{2}]\|_{p}\leq[b/\mathrm{k}_{1}]^{% 2-2/p}D^{\prime}(2)\mathrm{d}_{1}^{(1)}(\zeta_{1},\zeta_{2}),

where we have used that \|K\|_{2}\geq\|K\|_{1}\geq\mathrm{k}_{1} for all K\in\mathcal{K}, D(0)=0, and D is monotone increasing. If \zeta_{1},\zeta_{2}\in{\mathbb{Z}}_{2}^{(1)}(b) and \mathrm{d}_{1}^{(1)}(\zeta_{1},\zeta_{2})>2, then by the triangle inequality, and (128) of Lemma 10

 \displaystyle n^{1/p}\|\phi_{1}[\zeta_{1}]-\phi_{1}[\zeta_{2}]\|_{p} \displaystyle\leq \displaystyle n^{1/p}\|\phi_{1}[\zeta_{1}]\|_{p}+n^{1/p}\|\phi_{1}[\zeta_{2}]% \|_{p} \displaystyle\leq \displaystyle 2\mathrm{k}_{\infty}(nV_{h})^{-1+1/p}\leq\mathrm{k}_{\infty}[b/% \mathrm{k}_{1}]^{2-2/p}\mathrm{d}_{1}^{(1)}(\zeta_{1},\zeta_{2}).

These inequalities show that if {\mathbb{Z}}^{(1)} is equipped with the distance \mathrm{d}^{(1)}_{\theta_{1}}(\cdot,\cdot) [see (127) for definition of \theta_{1}] then (52) holds with

 m_{p}=1,\qquad C_{p}=\theta_{1}^{-1}[\mathrm{k}_{\infty}/\mathrm{k}_{1}]^{2-2/% p}D^{\prime}(2)\leq 1, (135)

because nV_{h^{\min}}\geq 1 (which implies b\leq\mathrm{k}_{\infty}).

Now consider the mapping \phi_{2}[z]; following (50) we have here

 {\mathbb{Z}}_{2}^{(2)}(b):=\{z=[(K,h),(Q,\mathfrak{h})]\dvtx n^{1/2}\|\phi_{2}% [z]\|_{2}\leq b\},\qquad b\in\bigl{[}\underline{\mathrm{w}}_{2}^{(2)},% \overline{\mathrm{w}}_{2}^{(2)}\bigr{]},

where by the statements (i) and (ii) of Lemma 12

 2^{-d/2}\mathrm{k}^{2}_{1}(nV_{h^{\max}})^{-1/2}\leq\underline{\mathrm{w}}^{(2% )}_{2},\qquad\overline{\mathrm{w}}^{(2)}_{2}\leq 2^{d/2}\mathrm{k}^{2}_{\infty% }(nV_{h^{\min}})^{-1/2}. (136)

Note that if z=[(K,h),(Q,\mathfrak{h})]\in{\mathbb{Z}}^{(2)}_{2}(b) then by the statement (ii) of Lemma 12 we have (nV_{h\vee\mathfrak{h}})^{-1}\leq 2^{d}\mathrm{k}^{-4}_{1}b^{2}. By this fact and by (10) of Lemma 10, we have for z_{1}=[(K,h),(Q,\mathfrak{h})],z_{2}=[(K^{\prime},h^{\prime}),(Q^{\prime},% \mathfrak{h}^{\prime})]\in{\mathbb{Z}}_{2}^{(2)} such that \mathrm{d}_{1}^{(2)}(z_{1},\allowbreak z_{2})\leq 2

 \displaystyle n^{1/p}\|\phi_{2}[z_{1}]-\phi_{2}[z_{2}]\|_{p} \displaystyle\leq \displaystyle 2\mathrm{k}_{\infty}[(nV_{h\vee h^{\prime}})\vee(nV_{\mathfrak{h% }\vee\mathfrak{h}^{\prime}})]^{-1+1/p}D\bigl{(}2\mathrm{d}^{(2)}_{1}(z_{1},z_{% 2})\bigr{)} \displaystyle\leq \displaystyle 2^{d+2-d/p}\mathrm{k}_{\infty}[b/\mathrm{k}_{1}^{2}]^{2-2/p}D^{% \prime}(4)\mathrm{d}_{1}^{(2)}(z_{1},z_{2}).

If \mathrm{d}_{1}^{(2)}(z_{1},z_{2})>2, then using the triangle inequality and Lemma 12(i) we have

 n^{1/p}\|\phi_{2}[z_{1}]-\phi_{2}[z_{2}]\|_{p}\leq 2^{d+1}\mathrm{k}_{\infty}^% {2}[b/\mathrm{k}_{1}^{2}]^{2-2/p}\leq 2^{d}\mathrm{k}_{\infty}^{2}[b/\mathrm{k% }_{1}^{2}]^{2-2/p}\mathrm{d}_{1}^{(2)}(z_{1},z_{2}).

Combining these inequalities, we observe that if {\mathbb{Z}}^{(2)} is equipped with the distance \mathrm{d}_{\theta_{2}}^{(2)}(\cdot,\cdot) [see (127)] then (52) holds with

 m_{p}=1,\qquad C_{p}=\theta_{2}^{-1}2^{2d+2-{3d}/{p}}\mathrm{k}_{\infty}^{2}[% \mathrm{k}_{\infty}^{2}/\mathrm{k}_{1}]^{2-2/p}D^{\prime}(4)\leq 1. (137)

We have used that b\leq 2^{d/2}\mathrm{k}_{\infty}^{2} because nV_{h^{\min}}\geq 1. Thus (135) and (137) show that m=1 and the condition \beta<m of Theorem 5 holds if in Assumption (K3) \beta_{\mathcal{K}}<1.

3{}^{0}. Application of Theorem 5. First, note that \vartheta_{0}^{(i)}, i=1,2, defined in (65) satisfy

where \alpha_{2,i}, i=1,2, are given in (131) and (132). This is in accordance with the definition of the constant \vartheta_{0} in (54) for the parametrizations \phi_{1}[\zeta] and \phi_{2}[z]. Then the definition of C_{\xi,i}^{*}(y) in (68) corresponds to (54). Following (106), we put

Then the formula for y_{*}^{(i)} appearing in the statement of the theorem is a version of (55).

Now we need to specify the constants T_{5,\epsilon} and T_{6,\epsilon}; see (8.2), (8.2).

Following (107), we set for i=1,2

 k_{*,i}:=8c_{1}(s)\alpha_{*,i}^{2},\qquad L_{*,i}^{(\epsilon)}(\beta):=\sum_{k% =1}^{\infty}\exp\{2^{1+k\beta}(k_{*,i}^{-1}\epsilon)^{-\beta}-(9/16)2^{k}k^{-2% }\}.

In view of (133) and Assumption (K3), we obtain for any \beta\in(\beta_{\mathcal{K}},1) that

 \displaystyle C_{{\mathbb{Z}}^{(1)}}(\beta) \displaystyle= \displaystyle\sup_{\delta\in(0,1)}\bigl{\{}\mathcal{E}_{{\mathbb{Z}}^{(1)},% \mathrm{d}_{\theta_{1}}^{(1)}}(\delta)-\delta^{-\beta}\bigr{\}} \displaystyle\leq \displaystyle C_{\mathcal{K}}+C_{\beta,d}+\sum_{i=1}^{d}(\ln\ln[h^{\max}_{i}/h% ^{\min}_{i}])_{+}, \displaystyle C_{{\mathbb{Z}}^{(2)}}(\beta) \displaystyle= \displaystyle\sup_{\delta\in(0,1)}\bigl{\{}\mathcal{E}_{{\mathbb{Z}}^{(2)},% \mathrm{d}_{\theta_{2}}^{(2)}}(\delta)-\delta^{-\beta}\bigr{\}} \displaystyle\leq \displaystyle 2C_{\mathcal{K}}+2C_{\beta,d}+2\sum_{i=1}^{d}(\ln\ln[h^{\max}_{i% }/h^{\min}_{i}])_{+},

where we have taken into account that \theta_{1}\geq 1, \theta_{2}\geq 1 and denoted

Therefore for i=1,2

 L_{*,i}^{(\epsilon)}(\beta)\exp\{2C_{{\mathbb{Z}}^{(i)}}(\beta)\}\leq[1+A_{% \mathcal{H}}]^{i}\exp\{2iC_{\mathcal{K}}\}\inf_{\beta\in(\beta_{\mathcal{K}},1% )}\bigl{[}L_{*,i}^{(\epsilon)}(\beta)\exp\{2iC_{\beta,d}\}\bigr{]},

and, by Assumption (K3), (133) and (66)

 N_{{\mathbb{Z}}^{(i)},\mathrm{d}_{\theta_{i}}^{(i)}}(k_{*,i}^{-1}\epsilon/8)% \leq[1+A_{\mathcal{H}}]^{i}[24k_{*,i}\theta_{i}/\epsilon]^{di}\exp\biggl{\{}i% \biggl{(}\frac{8k_{*,i}\theta_{i}}{\epsilon}\biggr{)}^{\beta_{\mathcal{K}}}% \biggr{\}}\exp\{iC_{\mathcal{K}}\}.

Finally, substituting these bounds in (8.2) and using (134), (136) and (66) we have that

 T^{(i)}_{5,\epsilon}\leq(1+A_{\mathcal{H}})^{2i}(1+B_{\mathcal{H}})\tilde{T}_{% 1}^{(i)},

where

 \displaystyle\tilde{T}_{1,\epsilon}^{(i)} \displaystyle:= \displaystyle I_{\epsilon}(q)(2^{1+d/2}u_{\epsilon}k_{*,i}\mathrm{k}_{\infty}^% {2})^{q}[24k_{*,i}\theta_{i}\epsilon^{-1}]^{di}\exp\biggl{\{}i\biggl{(}\frac{8% k_{*,i}\theta_{i}}{\epsilon}\biggr{)}^{\beta_{\mathcal{K}}}\biggr{\}}\exp\{3iC% _{\mathcal{K}}\} \displaystyle{}\times\log_{2}\biggl{(}\frac{2^{d}\mathrm{k}_{\infty}^{2}k_{*,i% }}{\mathrm{k}_{1}^{2}}\biggr{)}\Bigl{\{}1+\inf_{\beta\in(\beta_{\mathcal{K}},1% )}\bigl{[}L_{*,i}^{(\epsilon)}(\beta)\exp\{2iC_{\beta,d}\}\bigr{]}\Bigr{\}},% \qquad i=1,2.

This leads to the first statement of the theorem. The second statement of the theorem follows substitution of the above bounds in (8.2) which gives T_{6,\epsilon}^{(i)}\leq(1+A_{\mathcal{H}})^{2i}(1+B_{\mathcal{H}})\tilde{T}_{% 2}^{(i)}, where

 \displaystyle\tilde{T}_{2,\epsilon}^{(i)} \displaystyle:= \displaystyle[c_{1}(s)+2]^{q}(2^{d/2}\alpha_{*,i}\mathrm{k}_{\infty}^{2})^{q}[% 24k_{*,i}\theta_{i}\epsilon^{-1}]^{di}\exp\biggl{\{}i\biggl{(}\frac{8k_{*,i}% \theta_{i}}{\epsilon}\biggr{)}^{\beta_{\mathcal{K}}}\biggr{\}}\exp\{3iC_{% \mathcal{K}}\} \displaystyle{}\times\log_{2}\biggl{(}\frac{2^{d}\mathrm{k}_{\infty}^{2}k_{*,i% }}{\mathrm{k}_{1}^{2}}\biggr{)}\Bigl{\{}1+\inf_{\beta\in(\beta_{\mathcal{K}},1% )}\bigl{[}L_{*,i}^{(\epsilon)}(\beta)\exp\{2iC_{\beta,d}\}\bigr{]}\Bigr{\}},% \qquad i=1,2.

## 10 Proofs of Theorems 8 and 9

### 10.1 Proof of Theorem 8

Let X^{\prime}=(X,\varepsilon), and let X^{\prime}_{i}, i=1,\ldots,n be independent copies of X^{\prime}. For any l>0, x^{\prime}=(x,u)\in\mathcal{X}\times{\mathbb{R}} and t\in\mathcal{T} define the function

 w^{(l)}(t,x^{\prime})=w(t,x)u{\mathbf{1}}_{[-l,l]}(u).

With this notation, we note that on the event \{{\max_{i=1,\ldots,n}}|\varepsilon_{i}|\leq l\}

 \eta_{w}(t)=\sum_{i=1}^{n}w(t,X_{i})\varepsilon_{i}=\sum_{i=1}^{n}w^{(l)}(t,X_% {i}^{\prime})=\xi_{w^{(l)}}(t),

and the last equality holds because \mathbb{E}w^{(l)}(t,X^{\prime})=0, for all t\in\mathcal{T} and l>0 because the distribution of \varepsilon is symmetric. Therefore for any z>0,

 {\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq z\}\leq{\mathbb{P}}\bigl{\{}\bigl{\|}% \xi_{w^{(l)}}\bigr{\|}_{s,\tau}\geq z\bigr{\}}+n{\mathbb{P}}\{|\varepsilon|>l\}.

If Assumption (E1) is fulfilled, then for any z>0

 {\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq z\}\leq{\mathbb{P}}\bigl{\{}\bigl{\|}% \xi_{w^{(l)}}\bigr{\|}_{s,\tau}\geq z\bigr{\}}+nv\exp\{-bl^{\alpha}\}. (138)

If Assumption (E2) is fulfilled, then for any z>0

 {\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq z\}\leq{\mathbb{P}}\bigl{\{}\bigl{\|}% \xi_{w^{(l)}}\bigr{\|}_{s,\tau}\geq z\bigr{\}}+nPl^{-p}. (139)

In order to bound the first term on the right-hand side of (138) and (139), we repeat the steps in the proof of Theorem 1 with w replaced by w^{(l)} and optimize with respect to the truncation level l.

For any z>0, we define

 \Upsilon_{s}(w,f,z)=\frac{z^{2}}{({1}/{3})\varpi_{s}^{2}(w,f)+({4}/{3})c_{*}(s% )M_{s}(w)z},

where c_{*}(s) is given in (27).

First, consider the case s\geq 2. Using the same reasoning as in the proof of Theorem 1, we have the following upper bound: for all z>0

 {\mathbb{P}}\bigl{\{}\bigl{\|}\xi_{w^{(l)}}\bigr{\|}_{s,\tau}\geq\varrho_{s}(w% ,f)+z\bigr{\}}\leq\exp\{-[1\vee l]^{-1}\Upsilon_{s}(w,f,z)\}. (140)

Under Assumption (E1), if we set

 l=\cases{[b^{-1}\Upsilon_{s}(w,f,z)]^{{1}/{\alpha}},&\quad$b^{-1}\Upsilon_{s}(% w,f,z)<1$,\cr[b^{-1}\Upsilon_{s}(w,f,z)]^{{1}/({1+\alpha})},&\quad$b^{-1}% \Upsilon_{s}(w,f,z)\geq 1$,}

then it follows from (138) and (140) that

 {\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq\varrho_{s}(w,f)+z\}\leq G^{(1)}(% \Upsilon_{s}(w,f,z)).

Thus, the first statement of the theorem is proved if s\geq 2.

If Assumption (E2) is fulfilled then we choose

 l=\frac{\Upsilon_{s}(w,f,z)}{p\ln(1+p^{-1}\Upsilon_{s}(w,f,z))}

and note that l\geq 1 for any value of \Upsilon_{s}(w,f,z). Then (139) and (140) imply that

 \displaystyle{\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq\varrho_{s}(w,f)+z\} \displaystyle\leq \displaystyle\biggl{[}\frac{1}{(1+p^{-1}\Upsilon_{s}(w,f,z))}\biggr{]}^{p} \displaystyle{}+nP\biggl{[}\frac{p\ln(1+p^{-1}\Upsilon_{s}(w,f,z))}{\Upsilon_{% s}(w,f,z)}\biggr{]}^{p}.

Using the trivial inequality (1+u)^{-1}\leq u^{-1}\ln(1+u),u\geq 0 we get

 {\mathbb{P}}\{\|\eta_{w}\|_{s,\tau}\geq\varrho_{s}(w,f)+z\}\leq[1+nP]\biggl{[}% \frac{p\ln(1+p^{-1}\Upsilon_{s}(w,f,z))}{\Upsilon_{s}(w,f,z)}\biggr{]}^{p}

and, therefore, the second statement of the theorem is proved for the case s\geq 2.

If s<2, then we have similarly to (140) that for all z>0

 {\mathbb{P}}\bigl{\{}\bigl{\|}\xi_{w^{(l)}}\bigr{\|}_{s,\tau}\geq\varrho_{s}(w% ,f)+z\bigr{\}}\leq\exp\{-[1\vee l]^{-2}\Upsilon_{s}(w,f,z)\}.

The same computations as in the case s\geq 2 lead to the statement of the theorem when s<2.

### 10.2 Proof of Theorem 9

Put

 \displaystyle L^{(\epsilon)}_{\alpha,b} \displaystyle:= \displaystyle\sum_{k=1}^{\infty}\exp\{\epsilon^{-\beta}2^{\beta k+1}\}\sqrt{g_% {\alpha,b}(9\cdot 2^{k-3}k^{-2})}, \displaystyle J^{(\epsilon)}_{\alpha,b} \displaystyle:= \displaystyle q\int_{1}^{\infty}(x-1)^{q-1}[g_{\alpha,b}(x)]^{{1}/{4}}\,{d}x, \displaystyle T_{n,\epsilon} \displaystyle:= \displaystyle(1+nv)[2^{2\epsilon}(1+\epsilon)\mathrm{a}\overline{\mathrm{w}}_{% 2}]^{q}[2^{q\epsilon}-1]^{-1}\exp\{C_{\mathbb{Z}}(\beta)+(8/\epsilon)^{\beta}\} \displaystyle{}\times\bigl{(}1+\exp\{2C_{\mathbb{Z}}(\beta)\}L^{(\epsilon)}_{% \alpha,b}\bigr{)}J^{(\epsilon)}_{\alpha,b}.

We note that L^{(\epsilon)}_{\alpha,b}<\infty since \beta<\alpha/(2+\alpha) if s<2, and \beta<\alpha/(1+\alpha) if s\geq 2. Note also that the quantity J^{(\epsilon)}_{g}(\cdot) in the second inequality of Corollary 1 admits the following bound if g=G_{1}:

If for any \zeta\in{\mathbb{Z}}, we let

 \displaystyle U_{\eta}(\phi[\zeta]) \displaystyle= \displaystyle\mathrm{a}\sqrt{n}\|\phi[\zeta]\|_{2},\qquad A_{\eta}(\phi[\zeta]% )=\mathrm{b}_{n}\sqrt{n}\|\phi[\zeta]\|_{2}, \displaystyle B_{\eta}(\phi[\zeta]) \displaystyle= \displaystyle\mathrm{c}_{n}\sqrt{n}\|\phi[\zeta]\|_{2},

then we have for f\in\mathcal{F}

 \displaystyle\varrho_{s}(\phi[\zeta],f) \displaystyle\leq \displaystyle U_{\eta}(\phi[\zeta]),\qquad\tfrac{1}{3}\varpi_{s}^{2}(\phi[% \zeta],f)\leq A^{2}_{\eta}(\phi[\zeta]), \displaystyle\tfrac{4}{3}c_{*}(s)M_{s}(\phi[\zeta]) \displaystyle\leq \displaystyle B_{\eta}(\phi[\zeta]).

Thus, in view of Theorem 8, Assumption 2 holds with U=U_{\eta}, A=A_{\eta}, B=B_{\eta} and g=G^{(1)}. Then standard computations show that \Lambda_{A_{\eta}}=\mathrm{b}_{n} and \Lambda_{B_{\eta}}=\mathrm{c}_{n}. The assertion of the theorem follows now from Corollary 1.

## Appendix

### Proof of Lemma 4

Let

 \mathcal{X}^{(n)}={\underbrace{\mathcal{X}\times\cdots\times\mathcal{X}}_{n% \mbox{-}\mathrm{times}}},\qquad\overline{\mathcal{X}}{}^{(n)}={\underbrace{% \overline{\mathcal{X}}\times\cdots\times\overline{\mathcal{X}}}_{n\mbox{-}% \mathrm{times}}}.

Obviously, \overline{\mathcal{X}}{}^{(n)} is a countable dense subset of \mathcal{X}^{(n)}. For any x^{(n)}\in\mathcal{X}^{(n)} and t\in\mathcal{T}, put

 \xi\bigl{(}t,x^{(n)}\bigr{)}=\sum_{i=1}^{n}[w(t,x_{i})-\mathbb{E}w(t,X)],

and let

 \mathfrak{L}=\biggl{\{}l_{\overline{x}^{(n)}}\dvtx\mathcal{T}\to{\mathbb{R}}% \dvtx l_{\overline{x}^{(n)}}(t)=\frac{|\xi(t,\overline{x}^{(n)})|^{s-1}% \operatorname{sign}{[\xi(t,\overline{x}^{(n)})]}}{\|\xi(\cdot,\overline{x}^{(n% )})\|_{s,\tau}^{s-1}},\overline{x}^{(n)}\in\overline{\mathcal{X}}{}^{(n)}% \biggr{\}}.

Note that \mathfrak{L} is countable and \mathfrak{L}\subset\mathbb{B}_{{s}/({s-1})} since, obviously

Note that \xi_{w}(\cdot)=\xi(\cdot,X^{(n)}),X^{(n)}=(X_{1},\ldots,X_{n}), and therefore, in order to prove the assertion of the lemma it is sufficient to show that

 \bigl{\|}\xi\bigl{(}\cdot,x^{(n)}\bigr{)}\bigr{\|}_{s,\tau}=\sup_{l\in% \mathfrak{L}}\int l(t)\xi\bigl{(}t,x^{(n)}\bigr{)}\tau({d}t)\qquad\forall x^{(% n)}\in\mathcal{X}^{(n)}. (1)

First, let us note that Assumption 3 implies that for every \varepsilon>0 and every x^{(n)}\in\mathcal{X}^{(n)} there exists \overline{x}^{(n)}\in\overline{\mathcal{X}}{}^{(n)} such that

 \bigl{\|}\xi\bigl{(}\cdot,x^{(n)}\bigr{)}-\xi\bigl{(}\cdot,\overline{x}^{(n)}% \bigr{)}\bigr{\|}_{s,\tau}\leq\varepsilon. (2)

Taking into account that \mathfrak{L}\subset\mathbb{B}_{{s}/({s-1})} and using the Hölder inequality, we obtain from (2) that

 \biggl{|}\sup_{l\in\mathfrak{L}}\int l(t)\xi\bigl{(}t,{x}^{(n)}\bigr{)}\tau({d% }t)-\sup_{l\in\mathfrak{L}}\int l(t)\xi\bigl{(}t,\overline{x}^{(n)}\bigr{)}% \tau({d}t)\biggr{|}\leq\varepsilon. (3)

Obviously

 \bigl{\|}\xi\bigl{(}\cdot,\overline{x}^{(n)}\bigr{)}\bigr{\|}_{s,\tau}=\int l_% {\overline{x}^{(n)}}(t)\xi\bigl{(}t,\overline{x}^{(n)}\bigr{)}\tau({d}t).

It implies in view of the duality argument that

 \bigl{\|}\xi\bigl{(}\cdot,\overline{x}^{(n)}\bigr{)}\bigr{\|}_{s,\tau}=\sup_{l% \in\mathfrak{L}}\int l(t)\xi\bigl{(}t,\overline{x}^{(n)}\bigr{)}\tau({d}t). (4)

Using the triangle inequality, we obtain from (2), (3) and (4) that for every \varepsilon>0 and every x^{(n)}\in\mathcal{X}^{(n)}

 \biggl{|}\bigl{\|}\xi\bigl{(}\cdot,x^{(n)}\bigr{)}\bigr{\|}_{s,\tau}-\sup_{l% \in\mathfrak{L}}\int l(t)\xi\bigl{(}t,{x}^{(n)}\bigr{)}\tau({d}t)\biggr{|}\leq 2\varepsilon,

which completes the proof of (1) because \varepsilon>0 can be chosen arbitrary small.

### Proof of Lemma 5

First, note that for any p\geq 1 and x\in\mathcal{X}

 \displaystyle\|\overline{w}(\cdot,x)\|_{p,\tau} \displaystyle\leq \displaystyle 2^{1-{1}/{p}}\biggl{[}\int|w(t,x)|^{p}\tau({d}t)+\int\mathbb{E}|% w(t,X)|^{p}\tau({d}t)\biggr{]}^{{1}/{p}} \displaystyle\leq \displaystyle 2\sup_{x\in\mathcal{X}}\|w(\cdot,x)\|_{p,\tau}.

Here, we have used the triangle inequality. Next, for any p\geq 1 and t\in\mathcal{T},

Here, we used that \mathbb{E}|\eta-\mathbb{E}\eta|^{p}\leq 2^{p}\mathbb{E}|\eta|^{p}. Combining both inequalities, we have

 M_{p}(\overline{w})\leq 2M_{p}(w),

and the second statement of the lemma is proved.

### Proof of Lemma 9

1{}^{0}. First, we establish statement of the lemma for the mapping \zeta\mapsto\phi_{1}[\zeta]. For any s\geq 1 let \mathfrak{s}:=s\wedge 2. Following (49) and (50) and in view of (128), we have

 \displaystyle\underline{\mathrm{w}}^{(1)}_{\mathfrak{s}} \displaystyle\geq \displaystyle\mathrm{k}_{1}(nV_{h^{\max}})^{1/\mathfrak{s}-1},\qquad\overline{% \mathrm{w}}^{(1)}_{\mathfrak{s}}\leq\mathrm{k}_{\infty}(nV_{h^{\min}})^{1/% \mathfrak{s}-1}, \displaystyle{\mathbb{Z}}^{(1)}_{\mathfrak{s}}(b): \displaystyle= \displaystyle\bigl{\{}\zeta=(K,h)\in{\mathbb{Z}}^{(1)}\dvtx(nV_{h})^{1/% \mathfrak{s}-1}\|K\|_{\mathfrak{s}}\leq b\bigr{\}},\qquad b\in\bigl{[}% \underline{\mathrm{w}}^{(1)}_{\mathfrak{s}},\overline{\mathrm{w}}^{(1)}_{% \mathfrak{s}}\bigr{]}.

We note that if \zeta=(K,h)\in{\mathbb{Z}}^{(1)}_{\mathfrak{s}}(b) then

 (nV_{h})^{1/\mathfrak{s}-1}\leq\mathrm{k}^{-1}_{1}b. (5)

Let \zeta_{1},\zeta_{2}\in{\mathbb{Z}}^{(1)}_{\mathfrak{s}}(b) be such that \mathrm{d}^{(1)}_{1}(\zeta_{1},\zeta_{2})\leq 2. Applying (129) with p=\mathfrak{s} and using (5), we get

 n^{1/\mathfrak{s}}\|\phi_{1}[\zeta_{1}]-\phi_{1}[\zeta_{2}]\|_{\mathfrak{s}}% \leq\mathrm{k}^{-1}_{1}bD^{\prime}(2)\mathrm{d}^{(1)}_{1}(\zeta_{1},\zeta_{2})% =b\mathrm{d}^{(1)}_{\theta_{1}}(\zeta_{1},\zeta_{2}). (6)

Here we have taken into account that D^{\prime}(2)={\sup_{x\in[0,2]}}|D^{\prime}(x)|, where the function D(\cdot) is given in (126). If \zeta_{1},\zeta_{2}\in{\mathbb{Z}}^{(1)}_{\mathfrak{s}}(b) are such that \mathrm{d}^{(1)}_{1}(\zeta_{1},\zeta_{2})>2, then by the triangle inequality

 n^{1/\mathfrak{s}}\|\phi_{1}[\zeta_{1}]-\phi_{1}[\zeta_{2}]\|_{\mathfrak{s}}% \leq 2b\leq b\mathrm{d}^{(1)}_{1}(\zeta_{1},\zeta_{2})\leq b\mathrm{d}^{(1)}_{% \theta_{1}}(\zeta_{1},\zeta_{2}). (7)

Thus, (6) and (7) imply that that Assumption 3.3 holds if {\mathbb{Z}}^{(1)} is equipped with the distance \mathrm{d}^{(1)}_{\theta_{1}}, where we recall that \theta_{1}=\mathrm{k}_{\infty}\mathrm{k}^{-1}_{1}D^{\prime}(2)\geq 1 [see (127)].

2{}^{0}. Now we prove the statement of the lemma for the mapping z\mapsto\phi_{2}[z]. By the statements (i) and (ii) of Lemma 12 applied with p=\mathfrak{s} we have

 2^{{d(1-\mathfrak{s})}/{\mathfrak{s}}}\mathrm{k}^{2}_{1}(nV_{h^{\max}})^{1/% \mathfrak{s}-1}\leq\underline{\mathrm{w}}^{(2)}_{\mathfrak{s}},\qquad\overline% {\mathrm{w}}^{(2)}_{\mathfrak{s}}\leq 2^{d/\mathfrak{s}}\mathrm{k}^{2}_{\infty% }(nV_{h^{\min}})^{1/\mathfrak{s}-1}.

Recall that

 {\mathbb{Z}}^{(2)}_{\mathfrak{s}}(b):=\bigl{\{}z=[(K,h),(Q,\mathfrak{h})]\in{% \mathbb{Z}}^{(2)}\dvtx n^{1/\mathfrak{s}}\|\phi_{2}[z]\|_{\mathfrak{s}}\leq b% \bigr{\}},\qquad b\in\bigl{[}\underline{\mathrm{w}}^{(2)}_{\mathfrak{s}},% \overline{\mathrm{w}}^{(2)}_{\mathfrak{s}}\bigr{]}.

If z=[(K,h),(Q,\mathfrak{h})]\in{\mathbb{Z}}^{(2)}_{\mathfrak{s}}(b) then by the statement (ii) of Lemma 12

 (nV_{h\vee\mathfrak{h}})^{1/\mathfrak{s}-1}\leq 2^{{d(\mathfrak{s}-1)}/{% \mathfrak{s}}}\mathrm{k}^{-2}_{1}b\leq 2^{d/2}\mathrm{k}^{-2}_{1}b. (8)

Let z_{1},z_{2}\in{\mathbb{Z}}^{(2)}_{\mathfrak{s}}(b) be such that \mathrm{d}^{(2)}_{1}(z_{1},z_{2})\leq 2. Applying (10) with p=\mathfrak{s} and using (8), we obtain

 n^{1/\mathfrak{s}}\|\phi_{2}[z_{1}]-\phi_{2}[z_{2}]\|_{\mathfrak{s}}\leq b2^{2% +d/2}\mathrm{k}_{\infty}\mathrm{k}^{-2}_{1}D^{\prime}(4)\mathrm{d}^{(2)}_{1}(z% _{1},z_{2}). (9)

If z_{1},z_{2}\in{\mathbb{Z}}^{(2)}_{\mathfrak{s}}(b) are such that \mathrm{d}^{(2)}_{1}(z_{1},z_{2})>2, then we have by the triangle inequality

 n^{1/\mathfrak{s}}\|\phi_{2}[z_{1}]-\phi_{2}[z_{2}]\|_{\mathfrak{s}}\leq 2b% \leq b\mathrm{d}^{(2)}_{1}(z_{1},z_{2}). (10)

Thus, (9) and (10) imply that Assumption 3.3 is valid provided that {\mathbb{Z}}^{(2)} is equipped with the distance \mathrm{d}^{(2)}_{\theta_{2}}(\cdot,\cdot), where \theta_{2}=2^{2d+2}\mathrm{k}_{\infty}^{4}\mathrm{k}_{1}^{-2}D^{\prime}(4)\geq% \break 2^{({d+4})/{2}}\times\mathrm{k}_{\infty}\mathrm{k}^{-2}_{1}D^{\prime}(4% )\geq 1 [see (127)].

### Proof of Lemma 10

1{}^{0}. Inequality (128) is immediate. We start with the proof of (129).

Since the required bound is symmetric in h and h^{\prime}, without loss of generality we will assume that V_{h}\geq V_{h^{\prime}}. By the triangle inequality in view of Assumption (K1), we get

 \displaystyle\quad\|K_{h}-K^{\prime}_{h^{\prime}}\|_{p} \displaystyle\leq \displaystyle\|K_{h}-K^{\prime}_{h}\|_{p}+\|K^{\prime}_{h}-K^{\prime}_{h^{% \prime}}\|_{p} \displaystyle\leq \displaystyle V_{h}^{-1+1/p}\|K-K^{\prime}\|_{p}+\|K^{\prime}_{h}-K^{\prime}_{% h^{\prime}}\|_{p} \displaystyle\leq \displaystyle V_{h}^{-1+1/p}\biggl{[}\|K-K^{\prime}\|_{\infty}+\mathrm{k}_{% \infty}\biggl{(}\frac{V_{h}}{V_{h^{\prime}}}-1\biggr{)}\biggr{]} \displaystyle{}+V^{-1}_{h^{\prime}}\|K^{\prime}(\cdot/h)-K^{\prime}(\cdot/h^{% \prime})\|_{p} \displaystyle\leq \displaystyle(V_{h\vee h^{\prime}})^{-1+1/p}\biggl{[}\frac{V_{h\vee h^{\prime}% }}{V_{h\wedge h^{\prime}}}\biggr{]}\biggl{[}\|K-K^{\prime}\|_{\infty}+\mathrm{% k}_{\infty}\biggl{(}\frac{V_{h\vee h^{\prime}}}{V_{h\wedge h^{\prime}}}-1% \biggr{)}\biggr{]} \displaystyle{}+(V_{h\vee h^{\prime}})^{-1+1/p}\biggl{[}\frac{V_{h\vee h^{% \prime}}}{V_{h\wedge h^{\prime}}}\biggr{]} \displaystyle    {}\times\|K^{\prime}(\cdot[h\vee h^{\prime}]/h)-K^{\prime}(% \cdot[h\vee h^{\prime}]/h^{\prime})\|_{p},

where h\wedge h^{\prime}=(h_{1}\wedge h_{1}^{\prime},\ldots,h_{d}\wedge h_{d}^{% \prime}). The second term of the last inequality is obtained using the evident change-of-variables t\mapsto t/[h\vee h^{\prime}] (the division is understood in the coordinate-wise sense).

Note that all coordinates of the vectors [h\vee h^{\prime}]/h and [h\vee h^{\prime}]/h^{\prime} are greater or equal to 1. Therefore, in view of Assumption (K1) the integration (or supremum if p=\infty) over the whole {\mathbb{R}}^{d} in \|K^{\prime}(\cdot[h\vee h^{\prime}]/h)-K^{\prime}(\cdot[h\vee h^{\prime}]/h^{% \prime})\|_{p} can be replaced by the integration (supremum) over the support of K^{\prime}. Together with Assumption (K1), this yields

 \displaystyle\|K^{\prime}(\cdot[h\vee h^{\prime}]/h)-K^{\prime}(\cdot[h\vee h^% {\prime}]/h^{\prime})\|_{p} (12) \displaystyle\qquad\leq L_{\mathcal{K}}\sqrt{\frac{1}{4}\sum_{j=1}^{d}\biggl{[% }\frac{h_{j}\vee h_{j}^{\prime}}{h_{j}\wedge h_{j}^{\prime}}-1\biggr{]}^{2}}% \leq 2^{-1}L_{\mathcal{K}}\sqrt{d}\bigl{(}\exp\{\Delta_{\mathcal{H}}(h,h^{% \prime})\}-1\bigr{)}.

Noting that V_{h\vee h^{\prime}}/V_{h\wedge h^{\prime}}\leq\exp\{d\Delta_{\mathcal{H}}(h,h% ^{\prime})\} we obtain from (Proof of Lemma 10) and (Proof of Lemma 10) that

Then (129) follows from the last inequality and the monotonicity of the function D(\cdot).

2{}^{0}. Now we turn to the proof of (10). Recall that z=[(K,h),(Q,\mathfrak{h})] and z^{\prime}=[(K^{\prime},h^{\prime}),(Q^{\prime},\mathfrak{h}^{\prime})]. For brevity, we also write \zeta_{K}=(K,h) and \zeta_{Q}=(Q,\mathfrak{h}) with evident changes in notation for \zeta^{\prime}_{K} and \zeta^{\prime}_{Q}.

By the triangle inequality, we have

 \displaystyle\|K_{h}*Q_{\mathfrak{h}}-K^{\prime}_{h^{\prime}}*Q^{\prime}_{% \mathfrak{h}^{\prime}}\|_{p} \displaystyle\leq \displaystyle\|K_{h}*Q_{\mathfrak{h}}-K_{h}*Q^{\prime}_{\mathfrak{h}^{\prime}}% \|_{p} \displaystyle{}+\|K_{h}*Q^{\prime}_{\mathfrak{h}^{\prime}}-K^{\prime}_{h^{% \prime}}*Q^{\prime}_{\mathfrak{h}^{\prime}}\|_{p}.

Using the Young inequality (the first statement of Lemma 3), Assumption (K1) and (Proof of Lemma 10) we obtain

 \displaystyle\|K_{h}*Q_{\mathfrak{h}}-K_{h}*Q^{\prime}_{\mathfrak{h}^{\prime}}% \|_{p} \displaystyle\leq \displaystyle\|K_{h}\|_{1}\|Q_{\mathfrak{h}}-Q^{\prime}_{\mathfrak{h}^{\prime}% }\|_{p} \displaystyle\leq \displaystyle\mathrm{k}_{\infty}(V_{\mathfrak{h}\vee\mathfrak{h}^{\prime}})^{-% 1+1/p}D\bigl{(}\mathrm{d}_{1}^{(1)}(\zeta_{Q},\zeta^{\prime}_{Q})\bigr{)}.

On the other hand, applying the Young inequality and (Proof of Lemma 10) with p=1, we have

 \displaystyle\|K_{h}*Q_{\mathfrak{h}}-K_{h}*Q^{\prime}_{\mathfrak{h}^{\prime}}% \|_{p} \displaystyle\leq \displaystyle\|K_{h}\|_{p}\|Q_{\mathfrak{h}}-Q^{\prime}_{\mathfrak{h}^{\prime}% }\|_{1}\leq\mathrm{k}_{\infty}V_{h}^{-1+1/p}D\bigl{(}\mathrm{d}_{1}^{(1)}(% \zeta_{Q},\zeta^{\prime}_{Q})\bigr{)} \displaystyle\leq \displaystyle\mathrm{k}_{\infty}(V_{h\vee h^{\prime}})^{-1+1/p}\exp\{d\Delta_{% \mathcal{H}}(h,h^{\prime})\}D\bigl{(}\mathrm{d}_{1}^{(1)}(\zeta_{Q},\zeta_{Q}^% {\prime})\bigr{)} \displaystyle\leq \displaystyle\mathrm{k}_{\infty}(V_{h\vee h^{\prime}})^{-1+1/p}D\bigl{(}2% \mathrm{d}_{1}^{(2)}(z,z^{\prime})\bigr{)},

where we have used the definition of \Delta_{\mathcal{H}}(\cdot,\cdot) and monotonicity of the function D(\cdot). Combining the last two inequalities, we have

 \|K_{h}*Q_{\mathfrak{h}}-K_{h}*Q^{\prime}_{\mathfrak{h}^{\prime}}\|_{p}\leq% \mathrm{k}_{\infty}[(V_{h\vee h^{\prime}})\vee(V_{\mathfrak{h}\vee\mathfrak{h}% ^{\prime}})]^{-1+1/p}D\bigl{(}2\mathrm{d}^{(2)}_{1}(z,z^{\prime})\bigr{)}.

Repeating the previous computations, we obtain the same bound for the second term on the right-hand side of (Proof of Lemma 10), namely,

 \|K_{h}*Q^{\prime}_{\mathfrak{h}^{\prime}}-K^{\prime}_{h^{\prime}}*Q^{\prime}_% {\mathfrak{h}^{\prime}}\|_{p}\leq\mathrm{k}_{\infty}[(V_{h\vee h^{\prime}})% \vee(V_{\mathfrak{h}\vee\mathfrak{h}^{\prime}})]^{-1+1/p}D\bigl{(}2\mathrm{d}^% {(2)}_{1}(z,z^{\prime})\bigr{)}.

Thus, we finally get

 \|K_{h}*Q_{\mathfrak{h}}-K^{\prime}_{h^{\prime}}*Q^{\prime}_{\mathfrak{h}^{% \prime}}\|_{p}\leq 2\mathrm{k}_{\infty}[(V_{h\vee h^{\prime}})\vee(V_{% \mathfrak{h}\vee\mathfrak{h}^{\prime}})]^{-1+1/p}D\bigl{(}2\mathrm{d}^{(2)}_{1% }(z,z^{\prime})\bigr{)},

as claimed.

### Proof of Lemma 11

If w\in\mathbb{H}_{d}(1,P), then for any

 x\in\bigotimes_{i=1}^{d}\biggl{[}\tilde{x}_{i}-\frac{\|w\|_{\infty}}{2P\sqrt{d% }},\tilde{x}_{i}+\frac{\|w\|_{\infty}}{2P\sqrt{d}}\biggr{]}

we have by the triangle inequality

 |w(x)|\geq|w(\tilde{x})|-|w(x)-w(\tilde{x})|\geq\|w\|_{\infty}-P|x-\tilde{x}|% \geq\tfrac{1}{2}\|w\|_{\infty}.

This completes the proof.

### Proof of Lemma 12

Recall that

 \phi_{2}[z](t)=(K_{h}*Q_{\mathfrak{h}})(t)=\int K_{h}(t-y)Q_{\mathfrak{h}}(y)% \,{d}y,\qquad t\in{\mathbb{R}}^{d}.

1{}^{0}. Let \mathcal{J} denote the set of indexes j\in\{1,\ldots,d\} such that h_{j}\leq\mathfrak{h}_{j}:

 \mathcal{J}:=\{j\in(1,\ldots,d)\dvtx h_{j}\leq\mathfrak{h}_{j}\}.

Given two arbitrary vectors u,v\in{\mathbb{R}}^{d}, let \Delta[u,v] and \delta[u,v] denote the vectors in {\mathbb{R}}^{d} with the coordinates

 \Delta_{j}[u,v]=\cases{u_{j},&\quad$j\in\mathcal{J}$,\cr v_{j},&\quad$j\notin% \mathcal{J}$,}\qquad\delta_{j}[u,v]=\cases{u_{j},&\quad$j\notin\mathcal{J}$,% \cr v_{j},&\quad$j\in\mathcal{J}$.}

With this notation, we can write

 (K_{h}*Q_{\mathfrak{h}})(t)=\frac{1}{V_{h}V_{\mathfrak{h}}}\int K\biggl{(}% \Delta\biggl{[}\frac{t-v}{h},\frac{v}{h}\biggr{]}\biggr{)}Q\biggl{(}\delta% \biggl{[}\frac{t-v}{\mathfrak{h}},\frac{v}{\mathfrak{h}}\biggr{]}\biggr{)}\,{d% }v,\qquad t\in{\mathbb{R}}^{d}.

Then changing the variables v\mapsto u=(t-v)/(h\wedge h^{\prime}) and setting for brevity \eta=(h\wedge\mathfrak{h})/(h\vee\mathfrak{h}) (as usual, all operations are understood in the coordinate-wise sense), we come to the formula

 \displaystyle(K_{h}*Q_{\mathfrak{h}})(t) \displaystyle\qquad=\frac{V_{h\wedge\mathfrak{h}}}{V_{h}V_{\mathfrak{h}}}\int K% \bigl{(}\Delta[u,t/(h\vee\mathfrak{h})-\eta u]\bigr{)}Q\bigl{(}\delta[t/(h\vee% \mathfrak{h})-\eta u,u]\bigr{)}\,{d}u (15) \displaystyle\qquad=\frac{1}{V_{h\vee\mathfrak{h}}}F\biggl{(}\frac{t}{h\vee% \mathfrak{h}}\biggr{)},

where we have denoted

 F(t):=\int K(\Delta[u,t-\eta u])Q(\delta[t-\eta u,u])\,{d}u,\qquad t\in{% \mathbb{R}}^{d}. (16)

Now we note some properties of the function F that will be useful in the sequel. First, Assumption (K1) implies that the integration over {\mathbb{R}}^{d} in (16) can be replaced by the integration over [-1/2,1/2]^{d}. Indeed, if at least one of the coordinates of u lies outside the interval [-1/2,1/2] then, in view of (K1), one of the functions K or Q vanishes. This fact along with Assumption (K2) and (16) imply that \|F\|_{\infty}\leq\mathrm{k}^{2}_{\infty}; in addition,

 \operatorname{supp}(F)\subseteq[-1,1]^{d}. (17)

Taking into account these facts and using (Proof of Lemma 12), we obtain

 \|K_{h}*Q_{\mathfrak{h}}\|_{p}\leq(V_{h\vee\mathfrak{h}})^{-1+1/p}\|F\|_{p}% \leq 2^{d/p}\mathrm{k}^{2}_{\infty}(V_{h\vee\mathfrak{h}})^{-1+1/p},

and the statement (i) of the lemma is proved.

To get the assertion (ii) of the lemma, we note that

 \displaystyle\biggl{|}\int F(t)\,{d}t\biggr{|} \displaystyle= \displaystyle\biggl{|}\int\!\!\int K(\Delta[u,t-\eta u])Q(\delta[t-\eta u,u])% \,{d}u\,{d}t\biggr{|} \displaystyle= \displaystyle\biggl{|}\int K(x)\,{d}x\biggr{|}\biggl{|}\int Q(x)\,{d}x\biggr{|% }\geq\mathrm{k}_{1}^{2}.

The second equality follows from the fact that functions K and Q are integrated over t and over u over disjoint sets of components; and the last inequality is a consequence of (K2). Therefore, invoking (17) we have

 \displaystyle\|G\|_{p} \displaystyle= \displaystyle(V_{h\vee\mathfrak{h}})^{-1+1/p}\|F\|_{p}\geq(2^{d}V_{h\vee% \mathfrak{h}})^{-1+1/p}\|F\|_{1} \displaystyle\geq \displaystyle 2^{{d(1-p)}/{p}}\mathrm{k}^{2}_{1}(V_{h\vee\mathfrak{h}})^{-1+1/% p},

as claimed in the statement (ii) of the lemma.

2{}^{0}. Now we turn to the proof of the statements (iii) and (iv) of the lemma. The idea in the proof of these statements is to show that F satisfies the Lipschitz condition and then to apply Lemma 11.

By (16) for any x,y\in{\mathbb{R}}^{d}, we have

 \displaystyle\qquad|F(x)-F(y)| \displaystyle\leq \displaystyle\mathrm{k}_{\infty}\sup_{u\in[-{1}/{2},{1}/{2}]^{d}}|K(\Delta[u,x% -\eta u])-K(\Delta[u,y-\eta u])| \displaystyle{}+\mathrm{k}_{\infty}\sup_{u\in[-{1}/{2},{1}/{2}]^{d}}|Q(\delta[% x-\eta u,u])-Q(\delta[y-\eta u,u])| \displaystyle\leq \displaystyle L_{\mathcal{K}}\mathrm{k}_{\infty}\Biggl{\{}\sqrt{\sum_{j\notin% \mathcal{J}}(x_{j}-y_{j})^{2}}+\sqrt{\sum_{j\in\mathcal{J}}(x_{j}-y_{j})^{2}}% \Biggr{\}} \displaystyle\leq \displaystyle 2L_{\mathcal{K}}\mathrm{k}_{\infty}|x-y|.

The obtained inequality means that F\in\mathbb{H}_{d}(1,P) with P=2L_{\mathcal{K}}\mathrm{k}_{\infty}; moreover, (17) implies that

 \|F\|_{\infty}\geq 2^{-d}\mathrm{k}^{2}_{1}. (19)

Applying Lemma 11 and using (Proof of Lemma 12), we obtain

 \biggl{\{}x\in{\mathbb{R}}^{d}\dvtx F(x)\geq\frac{1}{2}\|F\|_{\infty}\biggr{\}% }\supseteq\bigotimes_{i=1}^{d}\biggl{[}\tilde{x}_{i}-\frac{\|F\|_{\infty}}{2P% \sqrt{d}},\tilde{x}_{i}+\frac{\|F\|_{\infty}}{2P\sqrt{d}}\biggr{]},

where, recall, F(\tilde{x})=\|F\|_{\infty}. Using (19), we obviously deduce from (Proof of Lemma 12) that

 \displaystyle\biggl{\{}x\dvtx(K_{h}*Q_{\mathfrak{h}})(x)\geq\frac{1}{2}\|K_{h}% *Q_{\mathfrak{h}}\|_{\infty}\biggr{\}} \displaystyle\qquad\supseteq\bigotimes_{i=1}^{d}\biggl{[}\tilde{x}_{i}(h\vee% \mathfrak{h})_{i}-\frac{\mathrm{k}^{2}_{1}(h\vee\mathfrak{h})_{i}}{2^{d+1}P% \sqrt{d}},\tilde{x}_{i}(h\vee\mathfrak{h})_{i}+\frac{\mathrm{k}^{2}_{1}(h\vee% \mathfrak{h})_{i}}{2^{d+1}P\sqrt{d}}\biggr{]},

which implies that

 \hskip 28.0pt\operatorname{mes}\biggl{\{}x\dvtx(K_{h}*Q_{\mathfrak{h}})(x)\geq% \frac{1}{2}\|K_{h}*Q_{\mathfrak{h}}\|_{\infty}\biggr{\}}\geq V_{h\vee\mathfrak% {h}}\biggl{[}\frac{\mathrm{k}^{2}_{1}}{2^{d+1}\sqrt{d}L_{\mathcal{K}}\mathrm{k% }_{\infty}}\biggr{]}^{d}. (20)

Then the statement (iii) of the lemma follows because

 \operatorname{mes}\{\operatorname{supp}(K_{h}*Q_{\mathfrak{h}})\}\geq% \operatorname{mes}\bigl{\{}x\dvtx(K_{h}*Q_{\mathfrak{h}})(x)\geq\tfrac{1}{2}\|% K_{h}*Q_{\mathfrak{h}}\|_{\infty}\bigr{\}}.

It remains to note that (17) implies that \mathrm{mes}\{\operatorname{supp}(K_{h}*Q_{\mathfrak{h}})\}\leq 2^{d}V_{h\vee% \mathfrak{h}}. Therefore by (20),

 \displaystyle\operatorname{mes}\biggl{\{}x\dvtx(K_{h}*Q_{\mathfrak{h}})(x)\geq% \frac{1}{2}\|K_{h}*Q_{\mathfrak{h}}\|_{\infty}\biggr{\}} \displaystyle\qquad\geq\biggl{[}\frac{\mathrm{k}^{2}_{1}}{2^{d+2}\sqrt{d}L_{% \mathcal{K}}\mathrm{k}_{\infty}}\biggr{]}^{d}\operatorname{mes}\{\operatorname% {supp}(K_{h}*Q_{\mathfrak{h}})\}.

This completes the proof of the lemma.

## Acknowledgments

We thank two anonymous referees for useful comments that led to significant improvements in the presentation.

## References

• Alexander (1984) {barticle}[mr] \bauthor\bsnmAlexander, \bfnmKenneth S.\binitsK. S. (\byear1984). \btitleProbability inequalities for empirical processes and a law of the iterated logarithm. \bjournalAnn. Probab. \bvolume12 \bpages1041–1067. \bidmr=0757769 \endbibitem
• Barron, Birgé and Massart (1999) {barticle}[mr] \bauthor\bsnmBarron, \bfnmAndrew\binitsA., \bauthor\bsnmBirgé, \bfnmLucien\binitsL. \AND\bauthor\bsnmMassart, \bfnmPascal\binitsP. (\byear1999). \btitleRisk bounds for model selection via penalization. \bjournalProbab. Theory Related Fields \bvolume113 \bpages301–413. \biddoi=10.1007/s004400050210, mr=1679028 \endbibitem
• Beirlant and Mason (1995) {barticle}[mr] \bauthor\bsnmBeirlant, \bfnmJ.\binitsJ. \AND\bauthor\bsnmMason, \bfnmD. M.\binitsD. M. (\byear1995). \btitleOn the asymptotic normality of L_{p}-norms of empirical functionals. \bjournalMath. Methods Statist. \bvolume4 \bpages1–19. \bidmr=1324687 \endbibitem
• Bousquet (2002) {barticle}[mr] \bauthor\bsnmBousquet, \bfnmOlivier\binitsO. (\byear2002). \btitleA Bennett concentration inequality and its application to suprema of empirical processes. \bjournalC. R. Math. Acad. Sci. Paris \bvolume334 \bpages495–500. \biddoi=10.1016/S1631-073X(02)02292-6, mr=1890640 \endbibitem
• Cavalier and Golubev (2006) {barticle}[mr] \bauthor\bsnmCavalier, \bfnmL.\binitsL. \AND\bauthor\bsnmGolubev, \bfnmYu.\binitsY. (\byear2006). \btitleRisk hull method and regularization by projections of ill-posed inverse problems. \bjournalAnn. Statist. \bvolume34 \bpages1653–1677. \biddoi=10.1214/009053606000000542, mr=2283712 \endbibitem
• Devroye and Lugosi (2001) {bbook}[vtex] \bauthor\bsnmDevroye, \bfnmLuc\binitsL. \AND\bauthor\bsnmLugosi, \bfnmGábor\binitsG. (\byear2001). \btitleCombinatorial Methods in Density Estimation. \bpublisherSpringer, \baddressNew York. \bidmr=1843146 \endbibitem
• Dümbgen and Fatalov (2002) {barticle}[mr] \bauthor\bsnmDümbgen, \bfnmL.\binitsL. \AND\bauthor\bsnmFatalov, \bfnmV. R.\binitsV. R. (\byear2002). \btitleAsymptotics of the rate of convergence for nonparametric density estimators: A new approach based on the Laplace method. \bjournalMath. Methods Statist. \bvolume11 \bpages465–476 (2003). \bidmr=1979745 \endbibitem
• Einmahl and Mason (2000) {barticle}[mr] \bauthor\bsnmEinmahl, \bfnmUwe\binitsU. \AND\bauthor\bsnmMason, \bfnmDavid M.\binitsD. M. (\byear2000). \btitleAn empirical process approach to the uniform consistency of kernel-type function estimators. \bjournalJ. Theoret. Probab. \bvolume13 \bpages1–37. \biddoi=10.1023/A:1007769924157, mr=1744994 \endbibitem
• Folland (1999) {bbook}[vtex] \bauthor\bsnmFolland, \bfnmGerald B.\binitsG. B. (\byear1999). \btitleReal Analysis: Modern Techniques and Their Applications, \bedition2nd ed. \bpublisherWiley, \baddressNew York. \bidmr=1681462 \endbibitem
• Giné, Koltchinskii and Zinn (2004) {barticle}[mr] \bauthor\bsnmGiné, \bfnmEvarist\binitsE., \bauthor\bsnmKoltchinskii, \bfnmVladimir\binitsV. \AND\bauthor\bsnmZinn, \bfnmJoel\binitsJ. (\byear2004). \btitleWeighted uniform consistency of kernel density estimators. \bjournalAnn. Probab. \bvolume32 \bpages2570–2605. \biddoi=10.1214/009117904000000063, mr=2078551 \endbibitem
• Giné and Koltchinskii (2006) {barticle}[mr] \bauthor\bsnmGiné, \bfnmEvarist\binitsE. \AND\bauthor\bsnmKoltchinskii, \bfnmVladimir\binitsV. (\byear2006). \btitleConcentration inequalities and asymptotic results for ratio type empirical processes. \bjournalAnn. Probab. \bvolume34 \bpages1143–1216. \biddoi=10.1214/009117906000000070, mr=2243881 \endbibitem
• Giné, Mason and Zaitsev (2003) {barticle}[mr] \bauthor\bsnmGiné, \bfnmEvarist\binitsE., \bauthor\bsnmMason, \bfnmDavid M.\binitsD. M. \AND\bauthor\bsnmZaitsev, \bfnmAndrei Yu.\binitsA. Y. (\byear2003). \btitleThe L_{1}-norm density estimator process. \bjournalAnn. Probab. \bvolume31 \bpages719–768. \biddoi=10.1214/aop/1048516534, mr=1964947 \endbibitem
• Giné and Nickl (2008) {barticle}[mr] \bauthor\bsnmGiné, \bfnmEvarist\binitsE. \AND\bauthor\bsnmNickl, \bfnmRichard\binitsR. (\byear2008). \btitleUniform central limit theorems for kernel density estimators. \bjournalProbab. Theory Related Fields \bvolume141 \bpages333–387. \biddoi=10.1007/s00440-007-0087-9, mr=2391158 \endbibitem
• Giné and Zinn (1984) {barticle}[mr] \bauthor\bsnmGiné, \bfnmEvarist\binitsE. \AND\bauthor\bsnmZinn, \bfnmJoel\binitsJ. (\byear1984). \btitleSome limit theorems for empirical processes. \bjournalAnn. Probab. \bvolume12 \bpages929–998. \bidmr=0757767 \bptnotecheck related \endbibitem
• Goldenshluger and Lepski (2008) {barticle}[mr] \bauthor\bsnmGoldenshluger, \bfnmAlexander\binitsA. \AND\bauthor\bsnmLepski, \bfnmOleg\binitsO. (\byear2008). \btitleUniversal pointwise selection rule in multivariate function estimation. \bjournalBernoulli \bvolume14 \bpages1150–1190. \biddoi=10.3150/08-BEJ144, mr=2543590 \endbibitem
• Goldenshluger and Lepski (2009) {barticle}[mr] \bauthor\bsnmGoldenshluger, \bfnmAlexander\binitsA. \AND\bauthor\bsnmLepski, \bfnmOleg\binitsO. (\byear2009). \btitleStructural adaptation via \mathbb{L}_{p}-norm oracle inequalities. \bjournalProbab. Theory Related Fields \bvolume143 \bpages41–71. \biddoi=10.1007/s00440-007-0119-5, mr=2449122 \endbibitem
• Golubev and Spokoiny (2009) {barticle}[mr] \bauthor\bsnmGolubev, \bfnmYuri\binitsY. \AND\bauthor\bsnmSpokoiny, \bfnmVladimir\binitsV. (\byear2009). \btitleExponential bounds for minimum contrast estimators. \bjournalElectron. J. Statist. \bvolume3 \bpages712–746. \biddoi=10.1214/09-EJS352, mr=2534199 \endbibitem
• Johnson, Schechtman and Zinn (1985) {barticle}[mr] \bauthor\bsnmJohnson, \bfnmW. B.\binitsW. B., \bauthor\bsnmSchechtman, \bfnmG.\binitsG. \AND\bauthor\bsnmZinn, \bfnmJ.\binitsJ. (\byear1985). \btitleBest constants in moment inequalities for linear combinations of independent and exchangeable random variables. \bjournalAnn. Probab. \bvolume13 \bpages234–253. \bidmr=0770640 \endbibitem
• Ledoux and Talagrand (1991) {bbook}[vtex] \bauthor\bsnmLedoux, \bfnmMichel\binitsM. \AND\bauthor\bsnmTalagrand, \bfnmMichel\binitsM. (\byear1991). \btitleProbability in Banach Spaces: Isoperimetry and Processes. \bpublisherSpringer, \baddressBerlin. \bidmr=1102015 \endbibitem
• Massart (2000) {barticle}[mr] \bauthor\bsnmMassart, \bfnmPascal\binitsP. (\byear2000). \btitleAbout the constants in Talagrand’s concentration inequalities for empirical processes. \bjournalAnn. Probab. \bvolume28 \bpages863–884. \biddoi=10.1214/aop/1019160263, mr=1782276 \endbibitem
• Pinelis (1990) {barticle}[vtex] \bauthor\bsnmPinelis, \bfnmI. F.\binitsI. F. (\byear1990). \btitleInequalities for distributions of the sums of independent random vectors and their application to the estimation of a density. \bjournalTheory Probab. Appl. \bvolume35 \bpages605–607. \biddoi=10.1137/1135088, mr=1091221 \endbibitem
• Pinelis (1994) {barticle}[mr] \bauthor\bsnmPinelis, \bfnmIosif\binitsI. (\byear1994). \btitleOptimum bounds for the distributions of martingales in Banach spaces. \bjournalAnn. Probab. \bvolume22 \bpages1679–1706. \bidmr=1331198 \endbibitem
• Talagrand (1994) {barticle}[mr] \bauthor\bsnmTalagrand, \bfnmM.\binitsM. (\byear1994). \btitleSharper bounds for Gaussian and empirical processes. \bjournalAnn. Probab. \bvolume22 \bpages28–76. \bidmr=1258865 \endbibitem
• van de Geer (2000) {bbook}[mr] \bauthor\bparticlevan de \bsnmGeer, \bfnmSara A.\binitsS. A. (\byear2000). \btitleApplications of Empirical Process Theory. \bseriesCambridge Series in Statistical and Probabilistic Mathematics \bvolume6. \bpublisherCambridge Univ. Press, \baddressCambridge. \bidmr=1739079 \endbibitem
• van der Vaart and Wellner (1996) {bbook}[vtex] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W.\binitsA. W. \AND\bauthor\bsnmWellner, \bfnmJon A.\binitsJ. A. (\byear1996). \btitleWeak Convergence and Empirical Processes: With Applications to Statistics. \bpublisherSpringer, \baddressNew York. \bidmr=1385671 \endbibitem
• von Bahr and Esseen (1965) {barticle}[vtex] \bauthor\bparticlevon \bsnmBahr, \bfnmBengt\binitsB. \AND\bauthor\bsnmEsseen, \bfnmCarl-Gustav\binitsC.-G. (\byear1965). \btitleInequalities for the rth absolute moment of a sum of random variables, 1\leq r\leq 2. \bjournalAnn. Math. Statist. \bvolume36 \bpages299–303. \bidmr=0170407 \endbibitem
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters