On the Accuracy of Fixed Sampled and Fixed Width Confidence Intervals Based on the Vertically Weighted Average

# On the Accuracy of Fixed Sampled and Fixed Width Confidence Intervals Based on the Vertically Weighted Average

\fnmsAnsgar \snmSteland

Vertically weighted averages perform a bilateral filtering of data, in order to preserve fine details of the underlying signal, especially discontinuities such as jumps (in dimension one) or edges (in dimension two). In homogeneous regions of the domain the procedure smoothes the data by averaging nearby data points to reduce the noise, whereas in inhomogenous regions the neighboring points are only taken into account when their value is close to the current one. This results in a denoised reconstruction or estimate of the true signal without blurring finer details.

This paper addresses the lack of results about the construction and evaluation of confidence intervals based on the vertically weighted average, which is required for a proper statistical evaluation of its estimation accuracy. Based on recent results we discuss and investigate in greater detail fixed sample as well as fixed width (conditional) confidence intervals constructed from this estimator. The fixed width approach allows to specify explicitly the estimator’s accuracy and determines a random sample size to ensure the required coverage probability. This also fixes to some extent the inherent property of the vertically weighted average that its variability is higher in low-density regions than in high-density regions. To estimate the variances required to construct the procedures, we rely on resampling techniques, especially the bootstrap and the jackknife.

Extensive Monte Carlo simulations show that, in general, the proposed confidence intervals are highly reliable in terms of their coverage probabilities for a wide range of parameter settings. The performance can be further increased by the bootstrap.
MSC: 62L10, 62G09, 62G15

\@mathmargin

=42minus42

Keywords:Bilateral filter, Bootstrap, Jackknife, Jump-preserving estimation, Two stage confidence interval, Signal estimation

.

\@afterheading

## 1 Introduction

The analysis of noisy data, for instance obtained from sensors monitoring a signal, is a well studied but still challenging problem. This especially applies in applications such as brain imaging based on magnetic resonance imaging, where increasing the precision of the physical measurement system, which is given by squids making use of quantum effects to record weak magnetic fields, is very expensive. In other applications the preservation of discontinuities in terms of their heights and/or their locations matters. To reduce the noise one may apply a denoising procedure such as a low pass filter, but this often leads to a blurred signal corresponding to a substantial loss of information. Therefore, if discontinuities such as jumps are expected in the signal, one should apply jump-preserving procedures, which are often called edge-preserving in the literature due to their importance when processing two-dimensional image data. Various approaches have been studied in the literature such as wavelets as studied by [1], point-wise adaptive approaches proposed by [8], kernel smoothing, see [20] and [10] for its application to image analysis, or jump-preserving regression, [9], to mention just a few works.

A large amount of smoothing statistics aiming at denoising can be (approximately) written as weighted averages of the observations , , attaining values in the range , so that the definition of the weights attached to the data points , , of the corresponding scatterplot matters. In a univariate setting as studied here, the domain space is usually equipped with the distance for ; in higher dimensions the Euclidean distance is the common choice. In order to preserve discontinuities, methods such as the bilateral filter, see [18], use weights which depend on both the distance in the domain space (horizontal weighting) and the range space (vertical weighting), whereas, for example, local polynomial estimators use weights only depending on the distance in the domain space.

In this article, we focus on a specification of the vertical weighting approach where the weight attached to each observation is a function of its difference to the observation of interest, resulting in a vertically weighted average. This approach has been extended to vertically weighted regressions for image analysis, [13], and jump-preserving monitoring procedures, see e.g. [6], [14] and [15]. A related clipping median statistic has been investigated in [16] for a general mixture model. In the present work, we confine our discussion to the univariate setting and study the problem how to evaluate the accuracy of the vertically weighted average in terms of confidence intervals.

The computational costs of the vertically weighted average are of the order , if denotes the sample size, which makes it attractive for realtime applications. Hence it is much faster to compute than other popular methods. For example, the computational costs of the fastest algorithm to calculate a -penalized -estimator are of the order , see [2]. This of particular importance for large-scale applications to big data, e.g. large databases , say, of image data, where the procedure is applied thousands or even millions of times, first to apply to denoise the images and, second, to assess by resampling methods the precision of the resulting denoised reconstruction, e.g. in terms of a confidence interval. The version of the vertically weighted average studied in this paper has the advantage that it can be even implemented in hardware, such that it is well suited for real-time applications.

Basically, there are two approaches to construct a confidence interval. The more common approach is the fixed (but large) sample confidence interval, whose width is random, such that the precision indicated by the interval is random and cannot be specified. In order to set up a fixed width confidence interval, one needs to rely on a random sample size, which is determined in such a way that the confidence interval attains the preassigned coverage probability, asymptotically. We investigate a two stage approach. The first stage sample (small) is used to estimate the dispersion of the estimator on which the interval is based. Using that estimator one estimates the optimal sample size, which is therefore random. There is a rich literature on such two stage approaches to construct fixed width intervals, we refer to [4] and the references given there.

To estimate the variability of the vertically weighted average, we propose to rely on resampling techniques, since they are easy to apply and typically lead to convincing results. We study the jackknife variance estimator as a versatile and fast method, whose consistency for the vertically weighted average has been shown in [17], and the bootstrap as a general tool.

The question arises how the proposed confidence intervals work in practice. To address this issue, extensive simulations have been conducted to study the accuracy in terms of the coverage probablity. It turns out that the accuracy is, in general, very good, but the coverage may be lower than nominal in the tails of the distribution for the conditional approach.

The organisation of the paper is as follows: Section 2 introduces the vertically weighted average, establishes some properties justifying its interpretation as a signal estimator and discusses the fixed sample intervals. Section 3 provides some further background on fixed width confidence intervals and introduces the proposed procedure in detail. The simulation studies are presented in Section 4.

## 2 The vertically weighted average

Let us assume for a moment that interest focuses on , its mean and the relationship of to the means , of the remaining sample of size . To keep the presentation simple, we focus on this one-dimensional problem formulation and notice that it also covers a simplified version of the bilateral filter as studied in image processing: Here , , are the gray values of pixels of a connected (usually rectangular) area of the image, where is the gray value of the center and represent the gray values of the neighboring pixels. However, the spatial distance of the pixels to the center and other issues which are of some importance in image processing are not taken into account and should be addressed by future research.

Let be independent real–valued observations following the model

 Yi=mi+ϵi,i=1,…,n, (2.1)

where are i.i.d. random variables with common distribution function , denoted by i.i.d(), symmetrically distributed around and having finite second moment. , , specifying the underlying true signal as , . It is common to assume that the observed data are obtained by sampling equidistantly at time instants , , an underlying function , where the sampling period (either constant or given by, say, ), and the domain is (if is fixed) or (for ), see e.g. [7]. In this case is regarded as the true signal. The methods and results discussed in the present paper, however, do not need to assume this sampling model but are valid for the more general model (2.1).

Focusing on a neighborhood of the current observation , we want to dampen the noise by some averaging procedure in such a way that large differences in the means, , are preserved. This can be achieved by averaging those data points whose values are close to the observation of interest, . Therefore, the weights used to define a weighted mean should depend on the differences between the observations and . It is natural to evaluate the difference by means of a nonnegative and symmetric kernel function which is assumed to satisfy the conditions

 E(k(ϵ1))<∞andE(∥ϵ1∥2kr(ϵ1))<∞, r=1,2.

The vertically weighted average is now defined as

 ˆμn=ˆμn(Yn)=∑i≤mYik(Yi−Yn) / ∑i≤mk(Yi−Yn),m=n−1,

Typical kernels used in applications are the Gaussian kernel, i.e. the density of the distribution, or kernels with support such as the uniform kernel . Further, often one incorporates a scale parameter and considers the choice

 k(z)=˜k(z/σ)

for some generic kernel . If is the uniform kernel, only those observations are averaged which are close to in the sense that . Figure 1 illustrates the denoising effect and the jump–preserving property of the approach.

In what follows, it will be sometimes convenient to write . In the same vain, the corresponding vertically weighted average associated to is given by

 ˆμn(Yi)=ˆμn(Yi;Y1,…,Yi−1,Yi+1,…,Yn),

for . Then is regarded as the denoised reconstructed signal.

The statistic is related to the vertically weighted mean squared functional

 τ(x;i):=E(|Yi−x|2k(ϵ1)),x∈R,

where . As shown in [5], see also [12], the functional is minimized by the true mean and is a fix point of the nonlinear equation, i.e. a solution of the associated fix point equation

 x=E(Yik(Yi−x))Ek(Yi−x).

For extensions to Hilbert–valued random elements and further discussion see [17]. The statistic is obtained by replacing the expectations by their empirical sample analogs based on the sample and substituting the true mean, , by the current observation .

The question arises how one may evaluate the accuracy of this statistic. A common approach is to consider confidence intervals for the underlying parameter when the sample is homogeneous, i.e. if the null hypothesis of a constant signal,

 H0:m1=mi, for all i≥2, (2.2)

holds true. This requires appropriate large sample asymptotics. But the statistic is not a member of standard classes of statistics, which hinders the application of known limit theorems to construct procedures for statistical inference, and therefore requires a special treatment. In [17] general invariance principles have been established which imply the following central limit theorem: If are i.i.d. with existing fourth moment, then

 √n[ˆμn(Yn)−θ(Yn)]/σξ(Yn)\lx@stackreld→N(0,1), (2.3)

as , where with

 ξi(y)=k(Yi−y)Yi−μ(y)ν(y)+θ(y)k(Yi−y)−ν(y)ν(y),i=1,…,n,

and

 θ(y)=μ(y)ν(y),withμ(y)=E(k(Y−y)Y)\ % and\ ν(y)=E(k(Y−y))>0.

for . The CLT (2.3) holds true under the conditional law given and therefore also unconditionally.

What is the relation between the centering term in (2.3), i.e. , and the true underlying signal, i.e. the constants in model (2.1)? The following theorem, which is related to the results in [5] and [12], shows that the centering term vanishes in median and expectation, respectively, for i.i.d. samples.

###### Theorem 2.1.

Assume that are i.i.d. with and , . If is a symmetric kernel, then

 Med\,θ(Y1)=m

and if exists.

###### Proof.

The result can be shown using equal-in-distribution arguments. If and are two random variables or random vectors of the same dimension with distributions and , we write , if , i.e. if for all measurable sets . If , then for any measurable mapping which does not depend on . Further, in what follows, we write when the expectation is taken with respect to (the distribution of) . Then, e.g., , if and are independent. We shall apply those basic results to the representation

 θ(Y1)=EY2[k(Y2−Y1)Y2]EY2k(Y2−Y1),

where one has to take into account that numerator and denominator are depend. First notice that

 Med\,θ(Y1) =MedY1(EY2[k(Y2−Y1)Y2]EY2k(Y2−Y1)) =Medϵ1(Eϵ2[k(ϵ2−ϵ1)(ϵ2+m)]Eϵ2k(ϵ2−ϵ1)) =Medε1(m+Eϵ2[k(ϵ2−ϵ1)ϵ2]Eϵ2k(ϵ2−ϵ1)) =m+Medϵ1(g(ϵ1)h(ϵ1))

where

 g(ϵ1) =Eϵ2[k(ϵ2−ϵ1)ϵ2], h(ϵ1) =Eϵ2k(ϵ2−ϵ1).

We show that

 S(ϵ1):=(g(ϵ1),h(ϵ1))\lx@stackreld=(−g(ϵ1),h(ϵ1)), (2.4)

which implies

 Medϵ1(g(ϵ1)h(ϵ1))=−Medϵ1(g(ϵ1)h(ϵ1)),

thus showing . The corresponding property for the expectation follows now easily. To verify (2.4) observe that

 S(ϵ1) =(Eϵ2[k(ϵ1−ϵ2)ϵ2],Eϵ2k(ϵ1−ϵ2)) =(−Eϵ2[k(ϵ1+ϵ2)ϵ2],Eϵ2k(ϵ1+ϵ2))

since and . Next, using , we obtain

 S(ϵ1)\lx@stackreld=˜S(ϵ1)=(−Eϵ2[k(ϵ1−ϵ2)ϵ2],Eϵ2k(ϵ1−ϵ2)).

But implies

 ˜S(ϵ1)=(−Eϵ2[k(ϵ2−ϵ1)ϵ2],Eϵ2k(ϵ2−ϵ1)),

which completes the proof. Q.E.D.

To the best of the author’s knowledge, the following result also does not appear in the literature.

###### Theorem 2.2.

Let be i.i.d. following the model , , for some with symmetric error terms . If is symmetric, then

 Med(ˆμn(Yi))=m

for .

###### Proof.

It suffices to show the result for . Notice that

 ˆμn(Yn)=m+∑i≤mk(ϵi−ϵn)ϵi∑i≤mk(ϵi−ϵn).

By symmetry and independence we have . Hence we obtain

 (∑i≤mk(ϵi−ϵn)ϵi,∑i≤mk(ϵi−ϵn))\lx@stackreld=(−∑i≤mk(ϵi−ϵn)ϵi,∑i≤mk(ϵi−ϵn))

Therefore

 ∑i≤mk(ϵi−ϵn)ϵi∑i≤mk(ϵi−ϵn)\lx@stackreld=−∑i≤mk(ϵi−ϵn)ϵi∑i≤mk(ϵi−ϵn).

But this implies . Q.E.D.

###### Remark 2.1

Note that the existence of and depends on the kernel and the error distribution. But it can be easily guaranteed by adding a positive but arbitrarily small constant to the kernel .

Although the validity of the central limit theorem greatly eases the interpretation of an estimator and its variation, the above formulas are cumbersome to construct practical procedures. A simple and effective approach to estimate the variance of an estimator is Efron’s jackknife, which dates back to the works of [11] and [19]. For the vertically weighted average it is given by

 ˆσ2n(Yn)=m−1mm∑i=1(ˆμn,−i(Yn)−¯¯¯ˆμn(Yn))2,

where

 ˆμn,−i(Yn)=m−1∑j=1j≠ik(Yj−Yn)Yj / m−1∑j=1j≠ik(Yj−Yn),

are the leave–one–out estimates, , and . The jackknife variance estimator is consistent if (2.2) holds and the r.v.s have a finite fourth moment, see [17].

Having a consistent estimator for at our disposal, one may calculate fixed sample asymptotic confidence intervals for , namely

 [ˆμn(Yn)−Φ−1(1−α/2)ˆσn(Yn),ˆμn(Yn)+Φ−1(1−α/2)ˆσn(Yn)], (2.5)

for , in order to asses the estimator’s precision given . In (2.5) and in what follows, , , is the distribution function of the standard normal distribution and its quantile function. Below we shall discuss how one can determine the sample size to obtain uniform accuracy in terms of the width of the interval, for any . The coverage probability of those confidence intervals are investigated in Section 4.

The unconditional asymptotic normality of under the null hypothesis, which follows from the conditional central limit theorem, combined with Lemma 2.1 suggests that

 ˆμnˆsd(ˆμn)\lx@stackreld→N(0,1),

as , for any consistent estimator of . Hence, we shall investigate in Section 4 in greater detail the coverage probabilities of the bootstrap confidence interval

 [ˆμn−Φ−1(1−α/2)ˆsd(ˆμn),ˆμn+Φ−1(1−α/2)ˆsd(ˆμn)] (2.6)

where is the bootstrap variance of estimated from replications , ,

 ˆμ∗n(b)=∑i≤mY∗k(b)k(Y∗i(b)−Y∗n(b)) / ∑i≤mk(Y∗i(b)−Y∗n(b)),

where are with , .

## 3 Fixed–width confidence intervals

By construction and due to the very basic idea of the vertical weighting approach, the (conditional) variance of the estimator strongly depends on the value . This can be easily seen if for some fixed . Then is the sample mean of all with . The effective sample size is random and typically large, if is located in the center of the distribution, but it will be small if is in the tail.

The following approach to construct a fixed width confidence interval for the vertically weighted average, introduced in [17], overcomes that drawback and allows to determine a sample size that leads to a simple and sound interpretation, namely that the resulting estimator has a specified precision. The basic idea is to determine the sample size in such a way that, for a preassigned accuracy , the two–sided fixed width confidence interval

 In(d)=[ˆμn(Yn)−d,ˆμn(Yn)+d]

has conditional asymptotic coverage , given, i.e.

 P(ˆμn(Yn)−d,ˆμn(Yn)+d]∋θ(Yn)|Yn)=1−α+o(1), (3.1)

as , a.s. If the conditional asymptotic distribution of is normal and were completely known, one could determine the asymptotically optimal sample size required to ensure (3.1). But this fails in practice, since the asymptotic variance is unknown. In a two-stage procedure one starts with a deterministic initial sample size , depending on the precision parameter and the confidence level , and uses that initial sample to estimate the asymptotically optimal sample size required to achieve a fixed width confidence interval of length with asymptotic coverage . As the sample size is estimated using the first–stage sample, the final sample size is random.

The two-stage procedure studied in [17] adopts the two-stage procedure from [3] and works as follows: Let us fix the current observation and denote it by , although the sample size is not yet determined. The current observation will be always the last one and the additional observations then correspond to the first data points in the sample. We shall first determine an initial sample size for the first stage and set up the sample using observations in addition to the current one. Then the final sample size (or equivalently ) will be determined and, in the same way, a sample of size will be set up with the current observation put at the end of the sample. This is necessary because the formula for the vertically weighted average regards the last observation as the current one. At the first stage, one draws a first-stage (initial) sample of size given by

 n0=n0(d)=max{⌊Φ−1(1−α/2)/d⌋,3}. (3.2)

At the second stage calculate the random final sample size

 N=N(d)=max{n0,⌊˜σ2n0(YN)Φ−1(1−α/2)2/d2+2⌋}, (3.3)

where

 ˜σ2n0(YN)=(n0−1)ˆσ2n0(YN)=(n0−1)n0−1∑i=1(ˆμn0,−i(YN)−¯¯¯ˆμn0(YN))2 (3.4)

with is the jackknife estimator of the asymptotic variance of the vertically weighted average calculated from the first-stage initial sample of size (augmented by )); notice that we have leave-one-out estimates ( is fixed) leading to the formula (3.4). This means, if , we sample additional observations, where , to obtain the final sample .

For theoretical investigations interest focuses on the behavior of the proposed procedure when the precision parameter , and thus the length of the confidence interval, tends to . Notice that, by (3.2), this implies , which in turn ensures that , see (3.3). Since as well as tend to , as , comparisons are based on their ratio, in order to establish the statistical properties of the sequence , of fixed-width confidence intervals. In [17] it has been shown that the resulting confidence interval has the following properties, when (2.2) holds:

• has asymptotic coverage , as .

• is consistent for the asymptotically optimal fixed sample interval using the asymptotic optimal sample size , i.e. , as .

• is first-order asymptotic efficient in the sense of Chow and Robbins, i.e.

 E(N|YN)/nopt→1,

as .

Our simulations reported below show that the coverage probability of the two–stage fixed–width confidence interval is very good. Its construction is based on a central limit theorem, which suggests to use a resampling approach such as the nonparametric bootstrap, in order to improve the approximation. More specifically, a closer look at the proof behind the validity of the two–stage procedure discussed above reveals that it is based on the central limit theorem

 P(√MˆμN(YN)−θ(YN)˜σn0(YN)≤z)→Φ(z), (3.5)

as . The sample size is then determined such that

 Φ(√Md˜σn0(YN))\lx@stackrel!=1−α/2, (3.6)

leading to the formula for given in the previous section. (3.5) and (3.6) suggest to bootstrap the distribution of the statistic to improve upon the central limit theorem. This is achieved by substituting the bootstrap distribution estimator for the distribution function in (3.6).

The nonparametric simulation-based bootstrap, which estimates the bootstrap distribution by a simulation based on replications, was applied to the problem of interest as follows: Draw independent resamples of size , the bootstrap sample size, from ,

 Y∗1(b),…,Y∗m∗(b)\lx@stackreli.i.d.∼ˆFn0,b=1,…,B,

where , , is the empirical distribution function of the initial sample. Alternatively, one can use the smooth bootstrap which convolves with a Gaussian law, . Here one may use the cross validated bandwidth selector for the kernel density estimator or the asymptotically optimal choice, , for Gaussian data, where is the sample variance. For simplicity of presentation, let us denote the bootstrap observations by , whatever kind of bootstrap has been used. Then form the bootstrap samples

 (Y∗1(b),…,Y∗m∗(b),YN),b=1,…,B,

where is again the current observation, and calculate

 ˆtn∗,b=√m∗ˆμn∗,b−ˆμn0˜σn0,b=1,…,B, (3.7)

where and

 ˆμn∗,b=ˆμn∗(Y∗1(b),…,Y∗m∗(b),YN)=∑i≤m∗Y∗i(b)k(Y∗i(b)−YN) / ∑i≤mk(Y∗i(b)−YN),

for . Lastly, estimate the –quantile of the distribution of by the corresponding order statistic , where denotes the order statistic of the bootstrap replicates (3.7). This leads to the bootstrap final sample size

 N∗=N∗(d)=max{n0,⌊˜σ2n0(YN)2(ˆμ∗m∗,(B(1−α/2)))2/d2+2⌋}. (3.8)

To conduct this bootstrap procedure, one needs to select the bootstrap sample size resp. . In the simulation study was used.

## 4 Simulations

The simulations aim at studying the dispersion of the vertically weighted average and, especially, the performance of the proposed methods in terms of the coverage probability. Let us start with a first experiment to examine how the dispersion of the vertically weighted average depends on the location of the current observation . Figure 2 depicts the standard deviation given , , for the sample size when using a Gaussian kernel with standard deviation . The interval was discretized using a step size of and each resulting case was simulated using repetitions. It can be seen that the dispersion is substantially larger in the tails than in the center of the distribution.

Those simulations as well as all studies presented in the following subsections are based on observations following a standard normal distribution, as the focus of the simulations is to investigate the performance for the three different confidence intervals (classical fixed sample, fixed width with CLT asymptotics and fixed width with bootstrap) when varying the method parameter , the confidence level and the sample size (for fixed sample intervals) and the precision parameter (for the fixed width intervals), respectively.

### 4.1 Accuracy of conditional fixed sample confidence intervals

Let us start our investigation by studying the coverage probability of the proposed asymptotic fixed sample confidence intervals (2.5). The parameter has levels and the coverage probability was simulated across the distribution of by conditioning on for . Additionally, the coverage probability of the confidence interval as used in practice (real CI), i.e. for (randomly drawn) was considered. The sample sizes under investigation are . Each case was simulated using runs. The required true value was simulated based on a random sample of size .

The results for a confidence level of are shown in Table I, for -values between and . It can be seen that the coverage probabilities are fairly good in the center of the distribution, but decrease in the tails, especially for small sample sizes such as . This results in a lower than nominal coverage of the real CI.

### 4.2 Accuracy of unconditional fixed sample confidence intervals

The accuracy of the unconditional fixed sample confidence interval (2.6), in terms of its coverage probability, was investigated for confidence levels between and . The sample size was selected with levels and . The levels of the bandwidth parameter for the Gaussian kernel were chosen from to .

Table II provides the simulated coverages for a nonparametric bootstrap variance estimator based on replications, for nominal confidence levels between and . Each entry is based on runs. It can be seen that the proposed confidence interval is surprisingly accurate even for high confidence levels.

The accuracy can be improved further by using a larger number of bootstrap replications. For comparison, Table III provides the results when replicates are used to estimate the variance.