Long runs under point conditioning. The real case.

# Long runs under point conditioning. The real case.

## Abstract

This paper presents a sharp approximation of the density of long runs of a random walk conditioned on its end value or by an average of a functions of its summands as their number tends to infinity. The conditioning event is of moderate or large deviation type. The result extends the Gibbs conditional principle in the sense that it provides a description of the distribution of the random walk on long subsequences. An algorithm for the simulation of such long runs is presented, together with an algorithm determining their maximal length for which the approximation is valid up to a prescribed accuracy.

## 1 Introduction and notation

### 1.1 Context and scope

This paper explores the asymptotic distribution of a random walk conditioned on its final value as the number of summands increases. Denote  a set of independent copies of a real random variable with density on and ..  We consider approximations of the density of the vector on when and is either fixed different from or tends slowly to and is an integer sequence such that

 0≤limsupn→∞k/n≤1 (1)

together with

 limn→∞n−k=∞. (2)

Therefore we may consider the asymptopic behavior of the density of the trajectory of the random walk on long runs. For sake of applications we also address the case when is substituted by for some real valued measurable function , and when the conditioning event writes

The interest in this question stems from various sources. When is fixed (typically ) this is a version of the Gibbs Conditional Principle which has been studied extensively for fixed , therefore under a large deviation condition. Diaconis and Freedman  have considered this issue also in the case for , in connection with de Finetti’s Theorem for exchangeable finite sequences. Their interest was related to the approximation of the density of by the product density of the summands ’s, therefore on the permanence of the independence of the ’s under conditioning. Their result is in the spirit of van Camperhout and Cover  and to be paralleled with Csiszar’s  asymptotic conditional independence result, when the conditioning event is with fixed and positive. In the same vein and under the same large deviation condition Dembo and Zeitouni  considered similar problems. This question is also of importance in Statistical Physics. Numerous papers pertaining to structural properties of polymers deal with this issue, and we refer to  and  for a description of those problems and related results. In the moderate deviation case Ermakov  also considered a similar problem when Although out of the scope of the present paper the result which is presented here is a cornerstone in the development of fast Importance Sampling procedures for rare event simulation; see a first attempt in this direction in . In Statistics, estimators have the same weak behavior as the empirical mean of their influence functions on the sampling points in the moderate deviation zone. Simulating samples under a given value of the estimator leads to improved test procedures under small values.

We exhibit the change in the dependence structure of the ’s under the conditioning as and provide an explicit and constructive solution to the approximation scheme. The approximating density is obtained as an adaptive change in the classical tilting argument combined with an adaptive change in the variance. Also when our result improves on existing ones since it provides a sharp approximation of the conditional density. The present result is optimal in the sense that it coincides with the exact conditional density in the gaussian case.

The crucial aspect of our result is the following. The approximation of the density of is not performed on the sequence of entire spaces but merely on a sequence of subsets of which bear the trajectories of the conditioned random walk with probability going to as tends to infinity; therefore the approximation is performed on typical paths. The reason which led us to consider approximation in this peculiar sense is twofold. First the approximation on typical paths is what is in fact needed for the applications of the present results in the field of simulation and of rare event analysis; second it avoids a number of technical conditions which are necessary in order to get an approximation on all and which are indeed central in the above mentioned works; those conditions pertain to the regularity of the characteristic function of the underlying density in order to get a good approximation in remote regions of . Since the approximation is handled on paths generated under the conditional density of the ’s under the conditioning, much is known on the region of which is reached with large probability by the conditioned random walk, through the analysis of the large values of the ’s.

For sake of numerical applications we provide explicit algorithms for the generation of such random walks together with a number of comments for the practical implementation. Also an explicit rule for the maximal value of compatible with a given accuracy of the approximating scheme is presented and numerical simulation supports this rule; an algorithm for its calculation is presented.

### 1.2 Notation and hypotheses

In the context of the point conditioning

 En:=(Sn1=n(an√VarX+EX))

the hypotheses are as below. The case when is substituted by is postponed to Section 3, together with the relevant hypotheses and notation.

We assume that satisfies the Cramer condition, i.e. has a finite moment generating function in a non void neighborhood of denote

 m(t):=ddtlogΦ(t)

and

 s2(t):=ddtm(t).

The values of and are the expectation and the variance of the tilted density

 πα(x):=exptxΦ(t)p(x) (3)

where is the only solution of the equation when belongs to the support of , see Barnfoff-Nielsen  for details. Denote the probability measure with density .

We also assume that the characteristic function of is in for some which is necessary for the Edgeworth expansions to be performed.

The probability measure of the random vector on conditioned upon is denoted We also denote the corresponding distribution of conditioned upon ; the vector then has a density with respect to the Lebesgue measure on for ,which will be denoted , which might seem ambiguous but recalls that the conditioned distribution pertains to the value of from which the density of is obtained. For a generic r.v. with density , we denote the value of at point

This paper is organized as follows. Section 2 presents the approximation scheme for the conditional density of under the point conditioning sequence In section 3, it is extended to the case when the conditioning family of events writes The value of for which this approximation is fair is discussed; an algorithm for the implementation of this rule is proposed. Section 4 presents an algorithm for the simulation of random variables under the approximating scheme. We have kept the main steps of the proofs in the core of the paper; some of the technicalities is left to the Appendix.

## 2 Random walks conditioned on their sum

We introduce a positive sequence  which satisfies

 limn→∞ϵn√n−k =∞ (E1) limn→∞ϵn(logn)2 =0. (E2)

It will be shown that is the rate of accuracy of the approximating scheme.

We denote the generic term of the bounded sequence which we assume positive, without loss of generality The event is of moderate or large deviation type, since we assume that

 limn→∞a2ϵn(logn)2=∞. (A)

The case when does not depend on satisfies (A) for any sequence under (E1,2). Conditions (A) and (E1,2) jointly imply that cannot satisfy for some fixed the Central Limit zone is not covered by our result. In order that there exists a sequence such that the approximation of holds with rate , a sufficient condition on is

 limn→∞√na2(logn)2=∞ (4)

which covers both the moderate and the large deviation cases.

Under these assumptions can be fixed or can grow together with with the restriction that should tend to infinity; when is fixed this rate is governed through (E1) (or reciprocally given , is governed by ) independently on In the moderate deviation case for a given sequence close to , has rapid decrease, which in turn forces to grow rapidly.

In this section we assume that has expectation and variance For clearness the dependence in of all quantities involved in the coming development is omitted in the notation.

### 2.1 Approximation of the density of the runs

Let denote the current term of a sequence satisfying (A). Define a density on as follows. Set

 g0(y1|y0):=πa(y1)

with arbitrary, and for define recursively.

Set the unique solution of the equation

 mi:=m(ti)=nn−i(a−si1n) (5)

where The tilted adaptive family of densities is the basic ingredient of the derivation of approximating scheme Let

 s2i:=d2dt2(logEπmiexptX)(0)

and

 μij:=djdtj(logEπmiexptX)(0), j=3,4

which are the second , third and fourth centered moments of Let

 gi(yi+1|yi1)=Cip(yi+1)n(a+αβ,α,yi+1) (6)

where is the normal density with mean and variance at . Here

 α=s2i(n−i−1) (7)
 β=ti+μi32s2i(n−i−1) (8)

and is a normalizing constant.

Define

 ga(yk1):=k−1∏i=0gi(yi+1|yi1). (9)

We then have

###### Theorem 1

Assume that (E1,2) holds together with (A) Let be a sample with distribution Then

 pn(Yk1):=p(Xk1=Yk1∣∣Sn1=na)=ga(Yk1)(1+oPn(ϵn(logn)2)). (10)

Proof. The proof uses Bayes formula to write as a product of conditional densities of individual terms of the trajectory evaluated at . Each term of this product is approximated through an Edgeworth expansion which together with the properties of under concludes the proof. This proof is rather long and we have differed its technical steps to the Appendix.

Denote ,   and It holds

 p(Xk1=Yk1∣∣Sn1 =na)=p(X1=Y1|Sn1=na) (11) k−1∏i=1p(Xi+1=Yi+1|Xi1 =Yi1,Sn1=na) =k−1∏i=0p(Xi+1=Yi+1|Sni+1=na−Σi1)

using independence of the r.v’s .

We make use of the following property which states the invariance of conditional densities under the tilting: For for all in the range of for all and

 p(Sji=u∣∣Sn1=s)=πa(Sji=u∣∣Sn1=s). (12)

Define through

 m(ti)=nn−i(a−Σi1n)

a function of the past r.v’s and set and By (12)

 p(Xi+1=Yi+1|Sni+1=na−Σi1) =πmi(Xi+1=Yi+1|Sni+1=na−Σi1) =πmi(Xi+1=Yi+1)πmi(Sni+2=na−Σi+11)πmi(Sni+1=na−Σi1)

where we used the independence of the ’s under A precise evaluation of the dominating terms in this lattest expression is needed in order to handle the product (11).

Under the sequence of densities the i.i.d. r.v’s define a triangular array which satisfies a local central limit theorem, and an Edgeworth expansion. Under , has expectation and variance Center and normalize both the numerator and denominator in the fraction which appears in the last display. Denote the density of the normalized sum when the summands are i.i.d. with common density Accordingly is the density of under i.i.d.  sampling. Hence, evaluating both and its normal approximation at point

 p(Xi+1=Yi+1|Sni+1=na−Σi1) (13) =√n−i√n−i−1πmi(Xi+1=Yi+1)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯πn−i−1((mi−Yi+1)/si√n−i−1)¯¯¯¯¯¯¯¯¯¯πn−i(0) :=√n−i√n−i−1πmi(Xi+1=Yi+1)NiDi.

The sequence of densities converges pointwise to the standard normal density under (E1) which implies that tends to infinity, and an Edgeworth expansion to the order 5 is performed for the numerator and the denominator. The main arguments used in order to obtain the order of magnitude of the envolved quantities are (i) a maximal inequality which controls the magnitude of for all between and (Lemma 13) (ii) the order of the maximum of the (Lemma 14). As proved in the Appendix, under (A)

 Ni=ϕ(−Yi+1/(si√n−i−1)).A.B+OPn(1(n−i−1)3/2) (14)

where is the standard normal density and,

 A:=(1+aYi+1s2i(n−i−1)−a22s2i(n−i−1)+oPn(ϵnlogn)n−i−1) (15)

and

 B:=⎛⎜ ⎜ ⎜ ⎜⎝1−μi32s4i(n−i−1)(a−Yi+1)−μi3−s4i8s4i(n−i−1)−15(μi3)272s6i(n−i−1)+OPn((logn)2)(n−i−1)2⎞⎟ ⎟ ⎟ ⎟⎠ (16)

The term in (14) is uniform upon Turn back to (13) and do the same Edgeworth expansion in the demominator, which writes

 Di=ϕ(0)(1−μi3−s4i8s4i(n−i)−15(μi3)272s6i(n−i))+OPn(1(n−i)3/2). (17)

The terms in follow from an expansion in the ratio of the two expressions (14) and (17) above. The gaussian contribution is explicit in (14) while the term is the dominant term in . Turning to (13) and comparing with (10) it appears that the normalizing factor in compensates the term where the term comes from Further the product of the remaining terms in the above approximations in (14) and (17) turn to build the approximation rate, as claimed Details are differed to the Appendix. This yields

 p(Xk1=Yk1∣∣Sn1=na)=(1+oPn(ϵn(logn)2))k−1∏i=0gi(Yi+1|Yi1)

which closes the proof of the Theorem.

###### Remark 2

When the ’s are i.i.d. with a standard normal density, then the result in the above approximation Theorem holds with stating that for all in . This extends to the case when they have an infinitely divisible distribution. However formula (10) holds true without the error term only in the gaussian case. Similar exact formulas can be obtained for infinitely divisible distributions using (11) making no use of tilting. Such formula is used to produce Tables 1 and 2 in order to assess the validity of the selection rule for in the exponential case.

###### Remark 3

The density in (6) is a slight modification of The modification from to is a small shift in the location parameter depending both on and on the skewness of , and a change in the variance : large values of have smaller weigth for large so that the distribution of tends to concentrate around as approaches

###### Remark 4

In the previous Theorem, as in Lemma 14, we use an Edgeworth expansion for the density of the normalized sum of the th row of some triangular array of row-wise independent r.v’s with common density. Consider the i.i.d. r.v’s with common density where may depend on but remains bounded The Edgeworth expansion pertaining to the normalized density of under can be derived following closely the proof given for example in , pp 532 and followings substituting the cumulants of by those of . Denote the characteristic function of Clearly for any there exists such that and since is bounded, Therefore the inequality (2.5) in  p533 holds. With defined as in , (2.6) holds with replaced by and by (2.9) holds, which completes the proof of the Edgeworth expansion in the simple case. The proof goes in the same way for higher order expansions.

### 2.2 Sampling under the approximation

Applications of Theorem 1 in Importance Sampling procedures and in Statistics require a reverse result. So assume that is a random vector generated under with density Can we state that is a good approximation for ? This holds true. We state a simple Lemma in this direction.

Let and denote two p.m’s on with respective densities and

###### Lemma 5

Suppose that for some sequence which tends to as tends to infinity

 rn(Yn1)=sn(Yn1)(1+oRn(εn)) (18)

as tends to Then

 sn(Yn1)=rn(Yn1)(1+oSn(εn)). (19)

Proof. Denote

 An,εn:={yn1:(1−εn)sn(yn1)≤rn(yn1)≤sn(yn1)(1+εn)}.

It holds for all positive

 limn→∞Rn(An,δεn)=1.

Write

 Rn(An,δεn)=∫1An,δεn(yn1)rn(yn1)sn(yn1)sn(yn1)dyn1.

Since

 Rn(An,δεn)≤(1+δεn)Sn(An,δεn)

it follows that

 limn→∞Sn(An,δεn)=1,

which proves the claim.

As a direct by-product of Theorem 1 and Lemma 5 we obtain

###### Theorem 6

Assume (A), (E1,2). Then when is generated under the distribution it holds

 pn(Yk1)=ga(Yk1)(1+oGa(ϵn(logn)2))

with defined in (10).

## 3 Random walks conditioned by the mean of a function of their summands

This section extends the above results to the case when the conditioning event writes

 Un1:=f(X1)+...+f(Xn)=n(σa+μ). (20)

The function is real valued, and The characteristic function of the random variable is assumed to belong to for some As previously is assumed positive. Let denote the density of the r.v. .

Assume

 ϕf(t):=Eexptf(X)<∞

for in a non void neighborhood of Define the functions and as the first, second and third derivatives of

Denote

 παf(x):=exptf(x)ϕf(t)pX(x)

with and belongs to the support of , the distribution of with density Conditions on which ensure existence and uniqueness of are referred to as steepness properties, and are exposed in 

Assume that (A) holds and the sequence satisfies (E1,2).

### 3.1 Approximation of the density of the runs

Define a density with c.d.f. on as follows. Set

 h0(y1|y0):=πσa+μf(y1)

with arbitrary and for define recursively.

Set the unique solution of the equation

 mi:=mf(ti)=nn−i(σa+μ−ui1n) (21)

where

Define

 hi(yi+1|yi1)=CipX(yi+1)n(αβ+(σa+μ),α,f(yi+1)) (22)

where is a normalizing constant. Here

 α=s2f(ti)(n−i−1) (23)
 β=ti+μf,3(ti)2s4f(ti)(n−i−1). (24)

Set

 hσa+μ(yk1):=k−1∏i=0hi(yi+1|yi1). (25)

Denote the distribution of conditioned upon and its density when restricted on therefore

 (26)
###### Theorem 7

Assume (A) and (E1,2) . Then (i)

 pfn(Xk1=Yk1)=hσa+μ(Yk1)(1+oPfn(ϵn(logn)2))

and (ii)

 pfn(Xk1=Yk1)=hσa+μ(Yk1)(1+oHσa+μ(ϵn(logn)2)).

Proof. We only sketch the initial step of the proof of (i), which rapidly follows the same track as that in Theorem 1. Denote

As in the proof of  Theorem 1 evaluate

 p(Xi+1=Yi+1|Uni+1=n(σa+μ)−Ui1) =p(Xi+1=Yi+1)p(Uni+2=n(σa+μ)−Ui+11)p(Uni+1=n(σa+μ)−Ui1).

Use the tilting invariance under leading to

 p(Xi+1=Yi+1|Uni+1=n(σa+μ)−Ui1) =πmif(Xi+1=Yi+1)πmif(Uni+2=n(σa+μ)−Ui+11)πmif(Tni+1=n(σa+μ)−Ui1) =p(Xi+1=Yi+1)etif(Yi+1)ϕf(ti)πmif(Uni+2=n(σa+μ)−Ui+11)πmif(Uni+1=n(σa+μ)−Ui1)

and proceed through the Edgeworth expansions in the above expression, following verbatim the proof of Theorem 1. We omit details. The proof of (ii) follows from Lemma 5

### 3.2 How far is the approximation valid?

This section provides a rule leading to an effective choice of the crucial parameter in order to achieve a given accuracy bound for the relative error. The generic r.v. has density and has mean and variance The density is defined in (26). The accuracy of the approximation is measured through

 ERE(k):=Ehσa+μ1Dk(Yk1)pfn(Yk1)−hσa+μ(Yk1)pfn(Yk1)

and

 VRE(k):=Varhσa+μ1Dk(Yk1)pfn(Yk1)−hσa+μ(Yk1)pfn(Yk1) (27)

respectively the expectation and the variance of the relative error of the approximating scheme when evaluated on , the subset of  where with and therefore The r.vs are sampled under Note that the density is usually unknown. The argument is somehow heuristic and unformal; nevertheless the rule is simple to implement and provides good results. We assume that the set can be substituted by in the above formulas, therefore assuming that the relative error has bounded variance, which would require quite a lot of work to be proved under appropriate conditions, but which seems to hold, at least in all cases considered by the authors. We keep the above notation omitting therefore any reference to .

Consider a two-sigma confidence bound for the relative accuracy for a given , defining

 CI(k):=[ERE(k)−2√VRE(k),ERE(k)+2√VRE(k)].

Let denote an acceptance level for the relative accuracy. Accept until belongs to For such the relative accuracy is certified up to the level roughly.

The calculation of and should be done as follows.

Write

 VRE(k)2 =EpX⎛⎝h3σa+μ(Yk1)pfn(Yk1)2pX(Yk1)⎞⎠ (28) −EpX⎛⎝h2σa+μ(Yk1)pfn(Yk1)pX(Yk1)⎞⎠2 (29) =:A−B2. (30)

where is defined in (3). By Bayes formula

 (31)

The following Lemma holds; see  and .

###### Lemma 8

Let be i.i.d. random variables with common density on and satisfying the Cramer conditions with m.g.f. . Then with

 pSn1/n(u)=√nϕn(t)exp−ntus(t)√2π(1+o(1))

in the range of the large or moderate deviations, i.e. when and is bounded from above.

Introduce

 D:=[πaf(a)pX(a)]n

and

 N:=⎡⎣πmkf(mk)pX(mk)⎤⎦(n−k)

with defined in (21) Define by By (31) and Lemma 8 it holds

 pfn(Yk1)=√nn−kpX(Yk1)NDsf(t)sf(tk)(1+oPX(1)).

The approximation of is obtained through Monte Carlo simulation. Define

 A(Yk1):=n−kn(hσa+μ(Yk1)pX(Yk1))3(DN)2s2f(tk)s2f(t) (32)

and simulate i.i.d. samples , each one made of i.i.d. replications under ; set

 ˆA:=1LL∑l=1A(Yk1(l)).

We use the same approximation for Define

 (33)

and

 ˆB:=1LL∑l=1B(Yk1(l))

with the same as above.

Set

 ¯¯¯¯¯¯¯¯¯¯¯¯¯VRE(k):=ˆA−(ˆB)2 (34)

which is a fair approximation of

The curve is a proxy for

 ERE(k):=