Discretized normal approximation by Stein’s method

# Discretized normal approximation by Stein’s method

\initsX.\fnmsXiao \snmFanglabel=e1]stafx@nus.edu.sg [ Department of Statistics and Applied Probability, National University of Singapore, 6 Science Drive 2, Singapore 117546, Republic of Singapore. \printeade1
\smonth1 \syear2012\smonth2 \syear2013
\smonth1 \syear2012\smonth2 \syear2013
\smonth1 \syear2012\smonth2 \syear2013
###### Abstract

We prove a general theorem to bound the total variation distance between the distribution of an integer valued random variable of interest and an appropriate discretized normal distribution. We apply the theorem to -runs in a sequence of i.i.d. Bernoulli random variables, the number of vertices with a given degree in the Erdös–Rényi random graph, and the uniform multinomial occupancy model.

\kwd
\aid

0 \volume20 \issue3 2014 \firstpage1404 \lastpage1431 \doi10.3150/13-BEJ527 \newproclaimdeff[theorem]Definition \newproclaimprop[theorem]Proposition \newremarkrem[theorem]Remark

\runtitle

Discretized normal approximation

{aug}

discretized normal approximation \kwdexchangeable pairs \kwdlocal dependence \kwdsize biasing \kwdStein coupling \kwdStein’s method

## 1 Introduction and the main result

Let be a sum of independent random variables. The Berry-Esseen theorem gives a bound on the Kolmogorov distance between the distribution of and the normal distribution with the same mean and variance as .

###### Theorem 1.1 ((Berry Be41 (), Esseen Es42 ()))

Assume where are independent random variables with , , . Let , . Then,

 dK(L(S),N(μ,σ2))≤cγ/σ3, (1)

where is an absolute constant and

 dK(L(X),L(Y))=supz∈R∣∣P(X≤z)−P(Y≤z)∣∣.

From (1), if and , then

 dK(L(S),N(μ,σ2))→0as n→∞. (2)

A stronger distance, the total variation between two distributions, is defined as

 dTV(L(X),L(Y))=supA⊂R∣∣P(X∈A)−P(Y∈A)∣∣. (3)

If is integer valued, the convergence in (2) is no longer valid under total variation distance because

 dTV(L(S),N(μ,σ2))=1∀n≥1. (4)

Equation (4) follows by taking to be the set of integers in the definition of total variation distance. Therefore, we need to find limiting distributions other than if small total variation distance is desired. Several alternatives have been studied, e.g., translated Poisson distribution Ro05 (), Ro07 (), shifted binomial distribution Ro08 () and a new family of discrete distributions GoXi06 (). A more natural limiting distribution, discretized normal distribution , is defined to be supported on the integer set and have probability mass function at any integer as

 (5)

where is a Gaussian variable with mean and variance .

Using Stein’s method, Chen and Leong ChLe10 () (see also Theorem 7.4 of ChGoSh11 ()) proved a bound on for sums of independent integer valued random variables. Stein’s method was introduced by Stein St72 (), and has become an important approach in proving distributional approximations because of its power in handling dependence within random variables. We refer to BaCh05 () for an introduction to Stein’s method.

Chen and Leong ChLe10 () used the zero-bias coupling approach in Stein’s method to obtain their result. In this paper, we develop a different approach in Stein’s method for discretized normal approximation. Our approach not only recovers the result of Chen and Leong ChLe10 (), but also works for general integer valued random variables. We work under the framework of Stein coupling, a concept introduced by Chen and Röllin ChRo10 () under which normal approximation results can be proved. {deff} Let be a random variable with mean . We say a triple of square-integrable random variables is a Stein coupling if

 E{Gf(S′)−Gf(S)}=E(S−μ)f(S) (6)

for all such that the above expectations exist. The above definition is adapted from ChRo10 () and includes many of the coupling structures employed in Stein’s method such as local dependence, exchangeable pairs, and size biasing. These coupling structures are discussed in Section 2. Under the framework of Stein coupling, we obtain the following theorem.

###### Theorem 1.2

Let be an integer valued random variable with mean and finite variance . Suppose we can construct a Stein coupling . Then, with ,

 dTV(L(S),Nd(μ,σ2)) ≤2σ2√Var(E(GD|S))+√π8E|GD2|σ3+√EG2D4σ3 (7)

where is a -field such that where denotes the -field generated by a random variable.

{rem}

The discretization defined in (5) has no loss of generality. For example, one may define another discretized normal distribution with probability mass function at as

 P(z≤Zμ,σ2

Then,

 dTV(Nd(μ,σ2),~Nd(μ,σ2)) = dTV(Nd(μ,σ2),Nd(μ−12,σ2)) ≤ dTV(N(μ,σ2),N(μ−12,σ2)) ≤ c/σ,

where is an absolute constant. It can be seen from (3) in the proof of Theorem 1.2 that the bound (1.2) will only differ by a constant factor if one changes the limiting distribution from to . {rem} The first three terms in the bound (1.2) are comparable to those appearing in the upper bounds of the Kolmogorov or Wasserstein distance for normal approximations (see, e.g., Corollary 2.2 of ChRo10 ()). The last term in the bound (1.2) arises because we are working in the total variation distance. It is easy to see that such a term must appear by considering the case when has support restricted to the even integers. Also in bounding this term, we choose appropriate so that is relatively easy to bound, yet of the same order as .

Röllin and Ross RoRo12 () provided a general method of bounding for a given integer valued random variable . It is our main tool for bounding the last term in the bound (1.2).

###### Lemma 1.3 ((Röllin and Ross RoRo12 ()))

For a given integer valued random variable , if we can construct an exchangeable pair (i.e., ) so that , then

 dTV(L(V),L(V+1)) (8) ≤√Var(E(I(V−V′=1)|V))+√Var(E(I(V−V′=−1)|V))P(V−V′=1).
{rem}

To apply Lemma 1.3, we need to construct exchangeable pairs such that the bound in (1.3) is small. A useful method to construct such exchangeable pairs when is a function of independent random variables is as follows. Suppose where are independent. Let be an independent uniform random index from . Given , let be an independent copy of . Define . Then is an exchangeable pair. We will use this construction in all the applications considered in this paper.

The remaining of the paper is organized as follows. In Section 2, we show the utility of Theorem 1.2 by adapting it to local dependence, exchangeable pairs, and size biasing, and bounding the total variation distance for discretized normal approximations for -runs in a sequence of i.i.d. Bernoulli random variables, the number of vertices with a given degree in the Erdös–Rényi random graph, and the uniform multinomial occupancy model. In Section 3, we give the proof of Theorem 1.2.

## 2 Applications

In this section, we apply Theorem 1.2 to prove discretized normal approximation results for integer valued random variables with different dependence structures including local dependence, exchangeable pairs, and size biasing.

### 2.1 Local dependence

A typical setting of local dependence is as follows. Let be a sum of integer valued random variables with , and . Suppose for each , there exist neighborhoods such that is independent of , and is independent of . It can be verified as in Section 3.2 of ChRo10 () that

 (S,S′,G)=(S,S−∑j∈AI(Xj−μj),−n(XI−μI))

is a Stein coupling where is a uniform random index from and independent of . Theorem 1.2 has the following corollary for local dependence.

###### Corollary 2.1

Under the above setting, assume that for every , where and denotes cardinality. Let

 ξi=Xi−μiσ,ηi=∑j∈Aiξj.

Then,

 dTV(L(S),Nd(μ,σ2)) ≤2 ⎷θn∑i=1Eξ2iη2i+√π8n∑i=1E∣∣ξiη2i∣∣+ ⎷nn∑i=1Eξ2iη4i (9) +12n∑i=1E[(σ∣∣ξiη2i∣∣+|ξiηi|)dTV(L(S|Fi),L(S+1|Fi))],

where is a -field such that .

{pf}

Let be a uniform random index from and independent of . Let , , and let . We bound the right-hand side of (1.2) as follows. From the definition of neighborhoods , the inequality and the bound , we have

 Var(E(GD|S))≤Var(E(GD|{X1,…,Xn})) =Var(n∑i=1(Xi−μi)∑j∈Ai(Xj−μj)) ≤∑i,i′:XAi,XAi′not independentCov((Xi−μi)∑j∈Ai(Xj−μj),(Xi′−μi′)∑j′∈Ai′(Xj′−μj′)) ≤∑i,i′:XAi,XAi′not independent{E[(Xi−μi)∑j∈Ai(Xj−μj)]22 ≤∑i,i′:XAi,XAi′not independent{+E[(Xi′−μi′)∑j′∈Ai′(Xj′−μj′)]22} ≤θn∑i=1E[(Xi−μi)∑j∈Ai(Xj−μj)]2 =σ4θn∑i=1Eξ2iη2i.

Moreover,

 E|GD|=σ2n∑i=1E|ξiηi|,E∣∣GD2∣∣=σ3n∑i=1E∣∣ξiη2i∣∣,EG2D4=nσ6n∑i=1Eξ2iη4i.

The corollary is proved by applying the above bounds in (1.2) with .

We remark that in the case that is a sum of independent integer valued random variables, a modification of the arguments from intermediate terms in the proof of Theorem 1.2 yields a result similar to Theorem 7.4 of ChGoSh11 ().

#### 2.1.1 2-runs

We provide a concrete example of local dependence here. Let be independent and identically distributed Bernoulli variables with where . Suppose . Let and . Here and in the rest of this example, indices outside are understood as one plus their residues mod . We can apply Corollary 2.1 with , , so that . The mean and variance of can be calculated as

 μ=ES=np2,σ2=Var(S)=n(p2+2p3−3p4). (10)

Applying (2.1) with , along with the upper bounds , we have

 dTV(L(S),Nd(μ,σ2))≤c′p1√n+c′′psupa,b∈{0,1}dTV(L(Va,b),L(Va,b+1)),

where are constants depending on and with and given,

 Va,b=aζ1+m∑j=2ζj−1ζj+bζm.

Regarding , we define where is uniformly chosen from , independent of and given , is an independent copy of . From Remark 1, is an exchangeable pair. Since given and ,

we have

 E(I(Va,b−V′a,b=1)|{ζ1,…,ζm}) =1−pm[I(a+ζ2=1,ζ1=1)+I(b+ζm−1=1,ζm=1) (11) =1−pm[+m−1∑i=2I(ζi−1+ζi+1=1,ζi=1)].

Taking expectation on both sides of (2.1.1) and lower bounding the right-hand side by the last term lead to

 P(Va,b−V′a,b=1)≥2(n−6)n−4p2(1−p)2.

In calculating the variance of the right-hand side of (2.1.1), we use the fact that each indicator is only correlated with at most two other indicators. Therefore,

 √Var(E(I(Va,b−V′a,b=1)|Va,b)) ≤ √Var(E(I(Va,b−V′a,b=1)|{ζ1,…,ζm})) ≤ 1−pn−4√3(n−4).

Similarly,

 √Var(E(I(Va,b−V′a,b=−1)|Va,b))≤pn−4√3(n−4).

Applying Lemma 1.3, we have

 dTV(L(Va,b),L(Va,b+1))≤√3(n−4)2(n−6)p2(1−p)2.

Therefore, we have proved the following proposition. {prop} For , let be independent and identically distributed Bernoulli variables with where . Let and . We have

 (12)

where and are defined as in (10) and is a constant depending on .

We remark that the above argument also applies to -runs for with straightforward modifications, for example, enlarging the neighborhoods and , changing the definition of , etc.

Total variation approximation for -runs was studied by Barbour and Xia BaXi99 () and Röllin Ro05 () using the translated Poission approximation. Barbour and Xia BaXi99 () assumed some extra conditions on to obtain a bound on the total variation distance between and a translated Poisson distribution. Although the result in Ro05 () is of the same order as the bound in (12) in terms of and applies for all , the approach used was different from ours.

### 2.2 Exchangeable pairs

A systematic introduction on the exchangeable pair approach can be found in Stein St86 (). The basic setting is as follows. Let be an exchangeable pair (i.e., ) of integer valued random variables with , . Suppose we have the approximate linearity condition,

 E(S−S′|S)=λ(S−μ)+σE(R|S), (13)

for a positive number and a random variable . A simple modification of Theorem 1.2 yields the following corollary for exchangeable pairs.

###### Corollary 2.2

Let be an exchangeable pair of integer valued random variables satisfying (13). Let . We have

 dTV(L(S),Nd(μ,σ2)) ≤(√π2+2)√ER2λ+√Var(E((S′−S)2|S))λσ2 (14) +√π8E|S′−S|32λσ3+√E|S′−S|62λσ3 +14λσ2E[(∣∣S′−S∣∣3+(S′−S)2)dTV(L(S|F),L(S+1|F))],

where is a -field such that .

{pf}

We follow the proof of Theorem 1.2 with minor modification. Let and . From the exchangeability of ,

 EG(f(S′)+f(S))=0.

By (13) and the above equality,

 E(S−μ)f(S)=E{Gf(S′)−Gf(S)}−σλEf(S)R.

Therefore, (3) has an extra term , which is bounded by from (69). Moreover, from the exchangeability of and (13),

 EGD = 12λE(S′−S)2 = = 1λE(S−S′)S=1λE(S−S′)(S−μ) = σ2+σE((S−μ)R)/λ.

Corollary 2.2 follows from Theorem 1.2 and the above arguments. A special case worth mentioning is when the exchangeable pair satisfies . Examples of such exchangeable pairs include binary expansion of a random integer [Diaconis Di77 ()] and anti-voter model [Rinott and Rotar RiRo97 ()]. The following result shows that under this special assumption, bounding the total variation distance requires no more effort than bounding the Kolmogorov distance.

###### Corollary 2.3

Let be an exchangeable pair of integer valued random variables satisfying the approximate linearity condition (13). In addition, suppose . Then we have

 dTV(L(S),Nd(μ,σ2)) (15) ≤(√π2+2)√ER2λ+√Var(E((S′−S)2|S))λσ2+√π/8+12λσ3,

where and are the mean and variance of .

{pf}

Let , . Then for defined in (67),

 EG∫D0(h(S+t)−h(S))dt =12λE(S′−S)∫S′−S0(h(S+t)−h(S))dt =12λE[∫10(h(S+t)−h(S))dtI(S′−S=1) (16) =14λE[(h(S+1)−h(S))I(S′−S=1)+(h(S−1)−h(S))I(S′−S=−1)] =14λE[(h(S′)−h(S))I(S′−S=1)−(h(S)−h(S′))I(S−S′=1)] =0.

We used the exchangeability of in the last equality. From (2.2), the upper bound in (3) can be replaced by . Therefore, the bound on can be deduced similarly as Corollary 2.2 except that we do not have the last term on the right-hand side of (2.2). {rem} Under the condition of Corollary 2.3, Röllin Ro07 () obtained a bound on the total variation distance between and a translated Poisson distribution. His result, together with the triangle inequality and easy bounds on the total variation distance between the translated Poisson distribution and the discretized normal distribution, yields a similar bound as (2.3).

### 2.3 Size biasing

Size biasing was first introduced in the context of Stein’s method by Goldstein and Rinott GoRi96 (). For being a nonnegative integer valued random variable with mean , we say has the -size biased distribution if

 ESf(S)=Eμf(Ss)

for all such that the above expectations exist. If in addition is defined on the same probability space as , then

 (S,S′,G)=(S,Ss,μ) (17)

is a Stein coupling. Theorem 1.2 has the following corollary for size biasing which easily follows from (17).

###### Corollary 2.4

Let be a nonnegative integer valued random variable with mean and variance . Let be defined on the same probability space and have the -size biased distribution. Then

 dTV(L(S),Nd(μ,σ2)) ≤2μσ2√Var(E(Ss−S|S))+√π8μσ3E∣∣Ss−S∣∣2+μσ3√E∣∣Ss−S∣∣4 (18)

where is a -field such that .

Next, we apply Corollary 2.4 to bound the total variation distance for discretized normal approximations for the number of vertices with a given degree in the Erdös–Rényi random graph, and the uniform multinomial occupancy model. These two models were recently studied by Goldstein Go12 () and Bartroff and Goldstein BaGo12 (), respectively. They obtained the same bound for the Kolmogorov distance using the inductive size bias coupling technique introduced by Goldstein Go12 ().

#### 2.3.1 Number of vertices with a given degree in the Erdös–Rényi random graph

Let be an Erdös–Rényi random graph with vertex set and edge probability . Let be the number of vertices with a given degree in . The asymptotic normality of was proved in BaKaRu89 () when . Under the condition

 there exist 0<θ′≤θn≤θ′′<∞,n0>0 such that (19) pn=θn/(n−1)for all n≥n0,

Goldstein Go12 () proved a bound on the Kolmogorov distance between the distribution of and ,

 dK(L(Sn),N(μn,σ2n))≤cd/√n,

where and are the mean and variance of , respectively. Here and in the rest of this example, let denote positive constants which may depend on . In the following proposition, we prove a bound on the total variation distance between the distribution of and . {prop} Let , , be a sequence of Erdös–Rényi random graphs satisfying (2.3.1). Let be the number of vertices with a given degree in . We have

 dTV(L(Sn),Nd(μn,σ2n))≤cd/√n. (20)
{pf}

Since the total variation distance is always bounded by , for , (20) holds true by choosing . Therefore, we assume in the rest of the proof.

In Go12 (), it was proved that under condition (2.3.1),

 ncd≤μn≤cdn,ncd≤σ2n≤cdn. (21)

Let denote the degree of vertex . Then can be expressed as

 Sn=n∑i=1I(deg(i)=d).

Following the construction of size bias coupling in Goldstein and Rinott GoRi96 (), let be uniformly chosen from and independent of . If , then we define , the size biased graph, to be the same as . If , then we obtain from by removing edges chosen uniformly at random from the edges that connect to in . If , then we obtain from by connecting to vertices chosen uniformly at random from those not connected to in . Let be the number of vertices with degree in the graph . It was proved in GoRi96 () that has the -size biased distribution and

 Var(E(Ssn−Sn|Sn))≤cd/n. (22)

From the construction of , at most vertices have different degrees in and . Therefore,

 ∣∣Ssn−Sn∣∣≤∣∣deg(I)−d∣∣+1. (23)

Given , . This, together with (2.3.1), implies that for any positive integer ,

 Edeg(I)k≤cd. (24)

From (23) and (24),

 E∣∣Ssn−Sn∣∣k≤cd,k≤4. (25)

Applying (21), (22) and (25) in (2.4), the proof will be complete after we show that

 E[(∣∣Ssn−Sn∣∣2+∣∣Ssn−Sn∣∣)dTV(L(Sn|F),L(Sn+1|F))]≤cd/√n (26)

for a -field such that . For a given , define

where () is the indicator that there is an edge connecting and in (). Let

 F=σ(I,AI,BI,{euv:u∈AI,v∈AI∪BI},{esIv:v∈AI}). (27)

From the construction of , we have . Let denote cardinality when the argument is a set. From (23), (24) and ,

 E(∣∣Ssn−Sn∣∣2+∣∣Ssn−Sn∣∣)I(|AI|>√n) =2√nE(max(deg(I),d)+1)3 ≤cd/√n.

Similarly,

 E(∣∣Ssn−Sn∣∣2+∣∣Ssn−Sn∣∣)I(|BI|>√n) ≤2E|AI|2|BI|/√n≤2E|AI|2[E(|BI||I,AI)]/√n≤cdE|AI|3/√n≤cd/√n,

where we used , which is from the fact that the expected degree of a given vertex is bounded by under condition (2.3.1). Therefore, to prove (26), we only need to prove

 E[(∣∣Ssn−Sn∣∣2+∣∣Ssn−Sn∣∣)I(|AI|,|BI|≤√n)dTV(L(Sn|F),L(Sn+1|F))] (28) ≤cd/√n,

where was defined in (27). Given with