A Technical Lemmas

Process-level quenched large deviations for random walk in random environment


We consider a bounded step size random walk in an ergodic random environment with some ellipticity, on an integer lattice of arbitrary dimension. We prove a level 3 large deviation principle, under almost every environment, with rate function related to a relative entropy.


[language=french] Nous considérons une marche aléatoire en environment aléatoire ergodique. La marche est elliptique et à pas bornés. Nous prouvons un principe de grandes déviations au niveau 3, sous presque tout environnement, avec une fonctionnelle d’action liée à une entropie relative.


Quenched LDP for RWRE



Received 4 September 2009; revised 8 April 2010; accepted 16 April 2010


class=AMS] \kwd60K37 \kwd60F10 \kwd82D30 \kwd82C44

random walk \kwdrandom environment \kwdRWRE \kwdlarge deviation \kwdenvironment process \kwdrelative entropy \kwdhomogenization

1 Introduction

We describe the standard model of random walk in random environment (RWRE) on . Let be a Polish space and its Borel -algebra. Let be a group of continuous commuting bijections on : and is the identity. Let be a -invariant probability measure on that is ergodic under this group. In other words, the -algebra of Borel sets invariant under is trivial under .

Denote the space of probability distributions on by and give it the weak topology or, equivalently, the restriction of the product topology. Let be a continuous mapping from to . For define . We call and also an environment because it determines the transition probabilities of a Markov chain.

The set of admissible steps is denoted by . One can then redefine and transition probabilities are defined only for such that .

Given and a starting point , let be the law of the Markov chain on , starting at and having transition probabilities . That is,

is called a random walk in environment and is called the quenched distribution. The joint distribution is . Its marginal on is also denoted by and called the averaged (or annealed) distribution since is averaged out:

The canonical case of the above setting is and .

Next a quick description of the problem we are interested in. Assume given a sequence of probability measures on a Polish space and a lower semicontinuous function . Then the large deviation upper bound holds with rate function if

Similarly, rate function governs the large deviation lower bound if

If both hold with the same rate function , then the large deviation principle (LDP) holds with rate . We shall use basic, well known features of large deviation theory and relative entropy without citing every instance. The reader can consult references ((3)), ((4)), ((5)), ((15)), and ((21)).

If the upper bound (resp. lower bound, resp. LDP) holds with some function , then it also holds with the lower semicontinuous regularization of defined by

Thus the rate function can be required to be lower semicontinuous, and then it is unique.

Large deviations arrange themselves more or less naturally in three levels. Most of the work on quenched large deviations for RWRE has been at level 1, that is, on large deviations for . Greven and den Hollander ((10)) considered the product one-dimensional nearest-neighbor case, Comets, Gantert, and Zeitouni ((2)) the ergodic one-dimensional nearest-neighbor case, Yilmaz ((24)) the ergodic one-dimensional case with bounded step size, Zerner ((25)) the multi-dimensional product nestling case, and Varadhan ((22)) the general ergodic multidimensional case with bounded step size. Rosenbluth ((17)) gave a variational formula for the rate function in ((22)). Level 2 quenched large deviations appeared in the work of Yilmaz ((24)) for the distributions . Here denotes the step of the walk.

Our object of study, level 3 or process level large deviations concerns the empirical process


where denotes the entire sequence of future steps. Quenched distributions are probability measures on the space . This is the space of Borel probability measures on endowed with the weak topology generated by bounded continuous functions.

The levels do form a hierarchy: higher level LDPs can be projected down to give LDPs at lower levels. Such results are called contraction principles in large deviation theory.

The main technical contribution of this work is the extension of a homogenization argument that proves the upper bound to the multivariate level 2 setting. This idea goes back to Kosygina, Rezakhanlou, and Varadhan ((12)) in the context of diffusions with random drift, and was used by both Rosenbluth ((17)) and Yilmaz ((24)) to prove their LDPs.

Before turning to specialized assumptions and notation, here are some general conventions. , , and denote, respectively, the set of non-negative, non-positive, and positive integers. denotes the -norm on . is the canonical basis of . In addition to for the space of probability measures on , we write for the set of Markov transition kernels on . Our spaces are Polish and the -algebras Borel. Given and , is the probability measure on defined by

and is its second marginal. For a probability measure , denotes the corresponding expectation operator. Occasionally may replace .

2 Main result

Fix a dimension . Following are the hypotheses for the level 3 LDP. In Section 3 we refine these to state precisely what is used by different parts of the proof.

is finite and is a compact metric space. (2.1)
, and such that . (2.2)
such that . (2.3)

When is finite the canonical is compact. The commonly used assumption of uniform ellipticity, namely the existence of such that for and contains the unit vectors, implies assumptions (2.2) and (2.3).

We need notational apparatus for backward, forward, and bi-infinite paths. The increments of a bi-infinite path in with are denoted by . The sequences and are in 1-1 correspondence. Segments of sequences are denoted by , also for or , and also for random variables: .

In general denotes the pair , but when and are clear from the context we write simply . We will also sometimes abbreviate . The spaces to which elements belong are , and . Their relevant shift transformations are

where . We use the same symbols , , and to act on , , and in the same way.

The empirical process (1.1) lives in but the rate function is best defined in terms of backward paths. Invariance allows us to pass conveniently between theses settings. If is -invariant, it has a unique -invariant extension on . Let , the restriction of to its marginal on . There is a unique kernel on that fixes (that is, ) and satisfies



(Uniqueness here is -a.s.) Indeed, on the one hand, the above does leave invariant. On the other hand, if is a kernel supported on shifts and leaves invariant, and if is a bounded measurable function on , then

The RWRE transition gives us the kernel defined by

If satisfies , then -a.s. and their relative entropy is given by


Let denote the marginal of on . Our main theorem is the following.

Theorem 2.1.

Let be an ergodic system. Assume (2.1), (2.2), and (2.3). Then, for -a.e. , the large deviation principle holds for the laws , with rate function equal to the lower semicontinuous regularization of the convex function


We make next some observations about the rate function .

Remark 2.1.

As is often the case for process level LDPs, the rate function is affine. This follows because we can replace with a “universal” kernel whose definition is independent of . Namely, define

Then, on the event where exists define


On the complement, set , for some fixed .

Remark 2.2.

Let us also recall the convex analytic characterization of l.s.c. regularization. Let denote the space of bounded continuous functions on . Given a function , let be its convex conjugate defined by

and let be its convex biconjugate defined by

If is convex and not identically infinite, is the same as its lower semicontinuous regularization ; see Propositions 3.3 and 4.1 of ((8)) or Theorem 5.18 of ((15)). Thus the rate function in Theorem 2.1 is .

As expected, rate function has in fact an alternative representation as a specific relative entropy. For a probability measure on , define the probability measure on by

On any of the product spaces of environments and paths, define the -algebras . Let denote the relative entropy of the restrictions of the probability measures and to the -algebra . Let be the kernel of the environment chain , defined as .

Lemma 2.2.

Let be -invariant. Then the limit


exists and equals .


Fix . Let denote the conditional distribution of under , given . Then by the -invariance,

For we must interpret and simply as . Observe also that the conditional distribution of under , given , is .

By two applications of the conditional entropy formula (Lemma 10.3 of ((21)) or Exercise 6.14 of ((15))),


As , the -algebras generate the -algebra , and consequently


We have taken some liberties with notation and regarded and as measures on the variables , instead of on pairs . This is legitimate because the simple structure of the kernels and , namely (2.4) implies that and almost surely under these measures.

The claim follows by dividing through (2.9) by and letting . ∎

Note that the specific entropy in (2.8) is not an entropy between two -invariant measures unless is -invariant. The next lemma exploits the previous one to say something about the zeros of .

Lemma 2.3.

If then for some -invariant .

Note that it is not necessarily true that in the above lemma.

Remark 2.3.

One can show that under (2.1) and (2.2) there is at most one that is -invariant and such that ; see for example ((13)). In fact, in this case . The above lemma shows that the zeros of consist of (if exists) and possibly measures of the form , with being -invariant but such that .


There is a sequence of -invariant probability measures such that and . (If then we can take .) Let denote the marginal distribution on which can be identified with and converges to the corresponding marginal . By the continuity of the kernel , . From these limits and the lower semicontinuity of relative entropy,

This tells us that is -invariant, which in turn implies that is -invariant, and together with the -invariance of implies also that . (The last point can also be seen from (2.9) and (2.10).) ∎

We close this section with some examples.

Let with . Let be the law of a classical random walk; i.e.  with , for some . Then . However, if is not in the set , then, . Note that by the contraction principle, is the zero set of the level-1 rate function. Hence if is product, by ((22)) consists of a singleton or a line segment. Thus we can pick so that the mean does not lie in , and consequently we have measures for which . That is, the rate does not have to pick up the entropy value.

Lower semicontinuity of relative entropy implies when . This equality can still happen when ; i.e. the l.s.c. regularization can bring the rate down from infinity all the way to the entropy. Here is a somewhat singular example. Assume and let be the constant sequence of -steps in . For each define the (trivially -invariant) probability measure on . Then, for all in the (minimal closed) support of ,


The case is allowed here, which can of course happen if uniform ellipticity is not assumed.

The second equality in (2.11) is clear from definitions, because the kernel is trivial: . Since is defined by l.s.c. regularization and entropy itself is l.s.c., entropy always gives a lower bound for . If then and the first equality in (2.11) is true by definition. If pick a sequence of open neighborhoods . The assumption that lies in the support of implies . Define a sequence of approximating measures by with entropies

The above entropies converge to by continuity of . We have verified (2.11).

3 Multivariate level 2 and setting the stage for the proofs

The assumptions made for the main result are the union of all the assumptions used in this paper. To facilitate future work, we next list the different assumption that are needed for different parts of the proof.

The lower bounds in Theorem 2.1 above and Theorem 3.1 below do not require compact nor finite. They hold under the assumption that is ergodic for and the following two conditions are satisfied.

for all . (3.1)

Note that (3.1) is a regularity condition that says that either all environments allow the move or all prohibit it.

Our proof of the upper bound uses stricter assumptions. The upper bound holds if is ergodic for , is finite, is compact, the moment assumption (2.3) holds, and

, , such that . (3.3)

On its own, (3.3) is weaker than (2.2). However, since the additive group generated by is isomorphic to for some , we always assume, without any loss of generality, that

is the smallest additive group containing . (3.4)

Then, under (3.4), (3.3) is equivalent to (2.2).

The only place where the condition (in (2.3)) is needed is for Lemma 5.1 to hold. See Remark 5.3. The only place where (2.2) (or (3.3)) is needed is in the proof of (5.6) in Lemma 5.5. This is the only reason that our result does not cover the so-called forbidden direction case. A particularly interesting special case is the space-time, or dynamic, environment; i.e. when . A level 1 quenched LDP can be proved for space-time RWRE through the subadditive ergodic theorem, as was done for elliptic walks in ((22)). Yilmaz ((23)) has shown that for i.i.d. space-time RWRE in 4 and higher dimensions the quenched and averaged level 1 rate functions coincide in a neighborhood of the limit velocity. In contrast with large deviations, the functional central limit theorem of i.i.d. space-time RWRE is completely understood; see ((14)), and also ((1)) for a different proof for steps that have exponential tails.

Next we turn to the strategy of the proof of Theorem 2.1. The process level LDP comes by the familiar projective limit argument from large deviation theory. The intermediate steps are multivariate quenched level 2 LDPs. For each define the multivariate empirical measure

This empirical measure lives on the space whose generic element is now denoted by .

We can treat as the position level (level 2) empirical measure of a Feller-continuous Markov chain. Denote by (with expectation ) the law of the Markov chain on with initial state and transition kernel


This Markov chain has empirical measure

that satisfies the following LDP. Define an entropy on by


is convex by an argument used below at the end of Section 4. Recall Remark 2.2 about l.s.c. regularization.

Theorem 3.1.

Same assumptions as in Theorem 2.1. For any fixed , for -a.e. , and for all , the large deviation principle holds for the sequence of probability measures on with convex rate function .

The lower bound in Theorem 3.1 follows from a change of measure and the ergodic theorem, and hints at the correct rate function. Donsker and Varadhan’s ((6)) general Markov chain argument gives the upper bound but without the absolute continuity restriction in (3.5). Thus the main issue is to deal with the case when the rate is infinite. This is nontrivial because the set of measures with is dense in the set of probability measures with the same support as . This is where the homogenization argument from ((12)), ((17)) and ((24)) comes in.

We conclude this section with a lemma that contains the projective limit step.

Lemma 3.2.

Assume is invariant for the shifts and satisfies the regularity assumption (3.1). Assume that for each fixed there exists a rate function that governs the large deviation lower bound for the laws , for -almost-every and all . Then, for -a.e. , the large deviation lower bound holds for with rate function , for

When is finite and is compact the same statement holds for the upper bound and the large deviation principle.


Observe first that is the law of under , conditioned on . Since -a.s. we have for all open sets ,

Similarly, in the case of the upper bound, and when is finite, we have for all closed sets ,

We conclude that conditioning is immaterial and, -a.s., the laws of induced by satisfy a large deviation lower (resp. upper) bound governed by . The lemma now follows from the Dawson-Gärtner projective limit theorem (see Theorem 4.6.1 in ((3))). ∎

The next two sections prove Theorem 3.1: lower bound in Section 4 and upper bound in Section 5. Section 6 finishes the proof of the main theorem 2.1.

4 Lower bound

We now prove the large deviation lower bound in Theorem 3.1. This section is valid for a general that can be infinite and a general Polish . Lemmas 4.1 and 4.2 are valid under (3.1) only while the lower bound proof also requires (3.2). Recall that assumption (3.4) entails no loss of generality.

We start with some ergodicity properties of the measures involved in the definition of the function . Recall that and that for a measure , is its marginal on . Denote by the law of under .

Lemma 4.1.

Let be ergodic and assume (3.1) and (3.4) hold. Fix and let be such that . Let be a Markov transition kernel on such that

  • is -invariant (i.e. );

  • for all and -a.e. ;

  • , for -a.e. .

Then, and the Markov chain on with kernel and initial distribution is ergodic. In particular, we have for all