An invariance principle for branching diffusions in bounded domains

An invariance principle for branching diffusions in bounded domains

Ellen Powell
Abstract

We study branching diffusions in a bounded domain of in which particles are killed upon hitting the boundary . It is known that any such process undergoes a phase transition when the branching rate exceeds a critical value: a multiple of the first eigenvalue of the generator of the diffusion. We investigate the system at criticality and show that the associated genealogical tree, when the process is conditioned to survive for a long time, converges to Aldous’ Continuum Random Tree under appropriate rescaling. The result holds under only a mild assumption on the domain, and is valid for all branching mechanisms with finite variance, and a general class of diffusions.

1 Introduction

This paper concerns branching diffusions in a bounded domain of . For us, these are processes in which individual particles move according to the law of some diffusion, are killed upon exiting the domain, and branch into a random number of particles (with distribution , independent of position) at constant rate . Whenever such a branching event occurs each of the offspring then stochastically repeats the behaviour of its parent, starting from the point of fission, and independently of everything else. The configuration of particles at time will be represented by the -valued point process

where is the set of individuals alive at time (its size will be denoted by ). We write for the law of the process initiated from a point . Finally, we will always assume that the offspring distribution satisfies

(1.1)

and that the generator

(1.2)

of the diffusion is uniformly elliptic (see Section 2.1) with coefficients for .

It is known that such a system exhibits a phase transition in the branching rate: for large enough there is a positive probability of survival for all time, but for small , including at criticality, there is almost sure extinction. The critical value of is equal to , where is the first eigenvalue of on with Dirichlet boundary conditions (see (2.1)). The main goal of this paper will be to study the system at criticality and find a scaling limit for the resulting genealogical tree. This is the continuous plane tree that is generated purely by the birth and death times of particles in the system, and encodes no information about the spatial motion.

More precisely, for we condition the critical branching diffusion to survive until time at least , and look at the associated genealogical tree , equipped with its natural distance . Rescaling distances by a factor gives us a sequence of laws on random compact metric spaces:

(1.3)

We will prove that this sequence converges in distribution to a conditioned Brownian continuum random tree as , with respect to the Gromov-Hausdorff topology. Indeed, if we let be a Brownian excursion conditioned to reach height at least , and write for the real tree whose contour function is given by , then we obtain the following result.

Theorem 1.1.

Suppose that is a domain 111We say that is a {Lipschitz / / } domain (for and ) if, at each point , there exists and a {Lipschitz / / } function such that relabelling and reorienting axes as necessary, . and that as in (1.2) is uniformly elliptic with coefficients for . Further suppose that satisfies (1.1), and that where is the first eigenfunction of on (see Section 2.1). Then, at the critical branching rate , and for any starting point ,

in distribution, with respect to the Gromov–Hausdorff topology.

Remark 1.2.

One sufficient condition to ensure that the hypotheses of Theorem 1.1 are satisfied is to assume that is for some (see Lemma 2.3). However, this is also satisfied in many other cases, so we leave the assumptions of Theorem 1.1 as general as possible.

On the way to proving Theorem 1.1 we also obtain new proofs of several other results concerning critical branching diffusions, some of which are already known in various forms. The reason for detailing these proofs here is threefold: firstly, it allows us to pin down the regularity required on the domain ; secondly, it provides a new and somewhat more probablistic approach to the theory, that we believe is interesting in its own right; and finally, the proofs serve to introduce many concepts and ideas that are crucial for the proof of Theorem 1.1.

Let us first look at the phase transition. This result was originally proved by Sevast’yanov [Sev58] and Watanabe [Wat65], in the case when is a constant multiple of . However, it has also been reworked and generalised since then. In [AH83, Chapter 6], a more general version of the result is given for branching Markov processes whose moment semigroup satisfies a certain criterion. One of the main examples discussed is when the process is a branching diffusion on a manifold with killing at the boundary. This is slightly more general than the set up of the present paper, in that the diffusion is on a manifold and the branching mechanism is allowed to be spatially dependent, however, a (fairly abstract) condition on the moment semigroup is required. In [Her78b] the criterion is shown to be satisfied, for example, if the manifold has boundary and the generator of the diffusion is uniformly elliptic with coefficients. Here we will prove the result under a weaker assumption on the domain and generator, but in our slightly more specific framework. Related results have also been shown in [EK04], which studies local extinction versus local growth on compactly contained subsets of a (possibly infinite) domain , and in [HHK12], where is taken to be a bounded interval of the real line.

Theorem 1.3 ([Sev58], [Wat65]).

Let be a bounded domain with Lipschitz boundary and suppose that and are as in Theorem 1.1. Then, for any starting position , if is the first eigenvalue of on with Dirichlet boundary conditions then,

  • for the process survives for all time with positive -probability.

  • for the process becomes extinct - almost surely.

Moreover, if then uniformly in .

The rest of the paper will focus on the behaviour of the system at criticality, starting with an asymptotic for the survival probability. The results of Theorems 1.4, Theorem 1.6 and Corollary 1.7 have also been shown in [AH83, Chapter 6] in the same framework discussed above (see also [Her78a, Her78b], and [Her74] for the one-dimensional case). Here we provide new proofs, which hold under fewer assumptions on the domain , and which we can also modify to give key ingredients for the proof of Theorem 1.1 (see for example Lemma 5.3).

Theorem 1.4.

Suppose that is and that and are as in Theorem 1.3. Then, in the critical case , for all we have

(1.4)

as . Here is the first eigenfunction of on , normalised to have unit norm.

This asymptotic then allows us to study the behaviour of the system when it is conditioned to survive for a long time, which is important for the proof of Theorem 1.1. One tool that we will use is a classical spine change of measure, under which the process has a distinguished particle, the spine, which is conditioned to remain in forever (as in [Pin85]). Along this spine, families of ordinary critical branching diffusions immigrate at rate according to a biased offspring distribution. Note that there is no extinction under this new measure, which we denote by . We will prove that changing measure in this way is in fact somewhat close to conditioning on survival for all time, in the sense of the following proposition.

Proposition 1.5.

Assume the hypotheses of Theorem 1.4. Then for any , and , where is the natural filtration of the process, we have

(1.5)

To our knowledge, Proposition 1.5 does not appear in the existing literature.

Finally, we prove a Yaglom-type limit theorem for the positions of the particles in the system at time , given survival.

Theorem 1.6.

For any measurable function on such that , we have

in distribution as , where is an exponential random variable with mean

One consequence of Theorem 1.6 (or rather its proof) is that it allows us to describe the limiting distribution of the particles in the system at time , given survival. It turns out that this is the law with density , normalised to be a probability distribution.

Corollary 1.7.

Let

be the uniform distribution on all particles alive at time , given survival. Then, for each as in Theorem 1.6, we have that

in distribution, and hence in probability, as , where

1.1 Context

It is interesting to note the analogy between Theorems 1.3-1.6 and classical results from the theory of Galton–Watson processes. Indeed, for critical Galton–Watson processes, Kolmogorov [Kol38] proved an asymptotic for the probability of survival up to time :

where is the population size at time , and the constant depends on the variance of the offspring distribution. Moreover, Aldous [Ald91, Ald93] and Duquesne and Le Gall [LGD02] showed that if you condition a critical Galton–Watson process to reach a large generation or have a large total progeny, then you have a scaling limit for the resulting tree. This limit is in the Gromov–Hausdorff topology, after rescaling distances in the tree appropriately, and the limiting object is the Continuum Random Tree, [Ald91]. In fact, this result can be extended to multitype Galton–Watson processes with a finite number of types, as in [Mie08], where the same scaling limit exists. Since a branching diffusion can be thought of as a limit of multitype Galton–Watson processes (considering the types to be positions and discretising the domain appropriately) it is reasonable to conjecture that such a process will have the same limiting genealogy when conditioned to survive for a long time.

On the other hand this result must be non-trivial, given what is known and expected for other types of domain. For example, [Kes78, BBS11, BBS13, BBS14, BBS15] have studied branching Brownian motion on the positive half line (with absorption at the origin) where each particle moves with a drift . In this set up there is a critical value of the branching rate , equal to , such that for extinction occurs with probability one. In [BBS14], an asymptotic for the survival probability is calculated, which is very different to that of Theorem 1.4. Moreover, results of [BBS13] in the near critical regime suggest that the critical genealogy in this case should be closely related to the Bolthausen–Sznitman coalescent [BS98]. This is also related to non-rigourous physics predictions of Brunet, Derrida, Mueller and Munier, [BDMM06, BDMM]: see [Ber09] for a survey of the area, further references and a general discussion of these predictions. This is of course not just a one-dimensional effect, as the behaviour will trivially be the same when the domain D is a half-space with a drift in the direction of the hyperplane. Hence Theorem 1.1 is not likely to hold if we drop the boundedness assumption on the domain . More generally, this raises the following question:

Question 1.8.

If the domain is allowed to be unbounded, or very irregular, what other behaviours do we see appearing at criticality (whenever we can make sense of this notion)?

In general it is an open, and we believe extremely interesting problem, to try and classify all the possible different behaviours that can occur at criticality depending on the geometry of the domain.

1.2 Organisation of the Paper and Main Ideas

After setting up the relevant notations and preliminary theory, we begin with the proof of Theorem 1.3. The main idea, which differs from the more analytic proofs given in [Sev58, Wat65, AH83], is to exploit the existence of the martingale

(1.6)

(also appearing in [EK04, HHK12]) that arises naturally from the definition of the process. Since roughly tells us the size of the system at time , and converges almost surely, the behaviour of the exponential term in (1.6) governs the possible survival or extinction of the process.

We then turn to the proofs of Theorems 1.4 to 1.6. The proof of Theorem 1.4 proceeds by a combination of probabilistic arguments, and the analysis of a system of coupled ordinary differential equations. Naively, we expand the survival probability (as a function of , for each fixed ) with respect to the orthonormal basis of given by the eigenfunctions of . Then because the survival probability satisfies a certain partial differential equation (the FKPP equation for branching Brownian motion, [McK75]) we get a family of coupled ODEs from the coefficients. In fact, we do not explicitly use that the survival probability satisfies this PDE (as we can derive the ODEs for the coefficients directly and avoid potentially complicated technical assumptions), but this is the motivation behind the proof. Unfortunately however, the system of ODEs is not immediately easy to analyse, and this is where the probabilistic line of reasoning comes into play. Changing measure using the martingale (to get a spine characterisation of the system as discussed in the introduction) allows us to deduce that the survival probability actually just decays like as , where is the first coefficient in the expansion. Thus, our problem is reduced to the study of a single ODE. From here elementary analysis, combined with some extra information obtained from the probabilistic arguments, yields the result. Proposition 1.5, Theorem 1.6 and Corollary 1.7 then follow fairly straightforwardly.

The remainder of the paper is devoted to the proof of Theorem 1.1. To do this, we take an i.i.d. sequence of critical processes, and concatenate the height functions of their associated continuous genealogical trees. A key idea is to define a suitable analogue of the Lukazewicz path for Galton–Watson trees: that is, something that will approximate this concatenated height process well, and will converge after rescaling to a reflected Brownian motion. At first it seems too much to hope that such a precise combinatorial structure survives in this spatial context. However, it turns out that we can exploit the martingale by “exploring it in a different order.” Just as roughly measures the size of our population when we let time evolve in the usual way, when we explore the genealogical tree in a “depth first” order, the martingale corresponding to becomes, perhaps surprisingly, a kind of spatial analogue of the height function. After strengthening Corollary 1.7, we can prove that the quadratic variation of this martingale is essentially linear, and thus obtain an invariance principle.

Of course, we have to prove that this martingale is indeed a good approximation to the height process. This is one of the main difficulties, as the reversibility tools that are key to proving the analagous statement for the Lukasiewicz path in the Galton–Watson case are lost. Instead, we must use precise estimates, and a delicate ergodicity argument related to our spine change of measure. This is one of the reasons that our machinery from the proof of Theorem 1.4 is so essential. Tightness arguments then allow us to conclude.

Acknowledgements I would like to thank Nathanaël Berestycki, for suggesting this problem and for many helpful discussions and suggestions.

2 Preliminaries

2.1 Spectral Theory and Diffusions

Let us first assume that is a bounded domain satisfying a uniform exterior cone condition. This means that: (1) is an open connected set of with and boundary ; and (2) there exist such that , we can find with and

Such a condition is satisfied, for example, if is Lipschitz, see eg. [Dav89, p.27].

Let

be a self-adjoint differential operator on , which is uniformly elliptic, meaning that there exists a constant such that for all and a.e.

see [Eva98, §6.1]. We also assume that is symmetric, i.e. , and that for all .

We say that 222We define to be the closure of (the space of infinitely differentiable functions with compact support strictly inside ) with respect to the norm , where is the th partial derivative of in the weak sense. is an eigenvector of with Dirichlet boundary conditions, and associated eigenvalue , if

(2.1)

for every , as in [Eva98, §6.3]. That is, is a weak solution of with zero boundary conditions. Given the assumptions made on and , the following properties then hold (see [Dav89, Theorem 1.6.8] and [GT83, Theorem 9.30].):

  • The eigenvalues of are all real and can be written (repeated according to their finite multiplicity) with as .

  • The associated eigenfunctions (normalised correctly) form an orthonormal basis of . Moreover, the first eigenfunction is strictly positive in , and for all .

Now we consider the diffusion associated to on . This is the Markov process on , where we identify the boundary of with the single isolated point , such that:

  • evolves as a diffusion with generator for all ; and

  • for all .

Thus is the diffusion with generator , killed or absorbed upon hitting . We write for its law when started from . Then by [Dav89, Theorems 2.1.4 and 2.3.6], the function

(2.2)

is well defined as a uniform limit on for any , and is the transition density of the process , restricted to . We also have the estimate, [Dav89, Corollary 3.2.8]

(2.3)

for some constant . In particular, for any and any we have

(2.4)

The properties (2.2)-(2.4) of the killed diffusion and associated transition kernel above, are consequences of the fact that the symmetric Markov semigroup associated with the killed diffusion is ultracontractive (see [Dav89, §2]) when satisfies a uniform exterior cone condition. In fact, if we assume some more regularity on the domain, then it will satisfy a certain stronger form of contractivity known as intrinsic ultracontractivity, first defined in [DS84]. Intrinsic ultracontractivity is satisfied by the semigroup of the killed diffusion, for example, if the domain is bounded and Lipschitz [Bañ99, Theorem 1]. The key property of intrinsic ultracontractivity that we will use is the following.

Lemma 2.1.

Suppose that is a bounded Lipschitz domain (or more generally, a domain such that the semigroup of is intrinsically ultracontractive). If

is the transition density for conditioned to remain in for all time [Pin85] then for any there exists a constant depending only on the domain such that

for all and , where is the spectral gap for on .

Proof.

See for example [DS84] or [Bañ99, Equation (1.8)]. ∎

We also have the following estimate:

Lemma 2.2.

Suppose we are in the set up of Lemma 2.1. Then for some constants , we have

for all .

Proof.

See [Bañ99, Eq.(1.2)] or [Dav89, P.89]. ∎

We also have the following result, which gives us extra regularity on the eigenfunctions of , if we assume some extra regularity on the domain.

Lemma 2.3.

Suppose that the boundary of is for some and that is a generator satisfying the conditions assumed throughout this section. Let be an orthonormal basis of eigenfunctions for . Then

for all .

Proof.

[GT83, Theorem 6.31]

2.2 Branching diffusions

As stated in the introduction, we can view a branching diffusion in as a point process

taking values in . This is often all that we will need to speak about, but since we are eventually interested in the genealogy of such processes, it will be helpful at various points to to view them as elements of a larger state space: the space of marked trees. The set up described in this section closely follows [HH09, HR15].

We begin by recalling the Ulam–Harris labelling system. Let

be the set of finite labels on . A subset is called a tree if:

  • ;

  • and implies that ; and

  • for all there exists such that .

We will refer to elements as particles or individuals in . We think of the element as representing an initial ancestor, and individuals as describing its descendants. For example, if is given by the label then would be the first child of the second child of . For we write for the concatenation of the words and , so for example if , then . We also set for all . We say that is an anscestor of (written ) if there exists such that , and write for the length (or generation) of , where if . Then the above tree condition simply means that: has an initial ancestor or root ; contains all of the ancestors of all of its individuals; and finally, each individual has a finite (possibly ) number of children, labelled in a consecutive way. We write for the set of trees.

We will want to consider marked trees, where the marks will correspond to the behaviour of particles in our branching diffusion. If we have a tree , we will mark each with a lifetime , and a motion in ,

where is the death time of the particle .

We write

for the set of all marked trees on . With an abuse of notation, if we have a marked tree and we will also sometimes extend the definition of to the whole of the anscestry of . That is we will set to be equal to if , and is the unique ancestor of alive at time . If has no children, we write for all , where is an additional cemetry point that we introduce for use later on. Finally, we write

for the set of particles alive in at time , and let be the number of such particles. As in the introduction, we let

be the point process on corresponding to the marked tree .

2.2.1 Probability measures on marked trees

Let be the filtration on the space of marked trees defined by

and set

Then is the natural filtration associated with the point process . We let be the probability measure on such that:

  • evolves under the law described in Section 2.1 for , where , is the first time that hits and is an exponential random variable independent of .

  • on the event that . On the complementary event, is distributed as an independent copy of the offspring distribution .

  • At any branching event where a positive number of children are born, all children repeat stochastically, and independently, the behaviour of their parent, starting from the point of fission.

That is, is the law of the system described in the introduction, with offspring distribution and constant branching rate .

2.2.2 The many-to-few formulae

One particularly useful property of the branching diffusions considered in this section are the so-called many-to-few formulae, which allow one to calculate certain expectations for the system with relative ease. We state the two simplest cases here; for the more general formula, see for example [HR15, Lemma 1].

Lemma 2.4 (Many-to-one).

Suppose that is a measurable function on the Borel sets of . Then

where we recall that .

Lemma 2.5 (Many-to-two).

Suppose that and are measurable functions on the Borel sets of . Then

where for and .

For the proof of the above lemmas, see [HR15, Lemma 1], which is stated in a more general setting. For an explanation of how this general statement gives the lemmas above, see [HR15, §4.1, §4.2].

2.2.3 The continuous genealogical plane tree

If we have a marked tree corresponding to a branching diffusion, we will also want to associate with it a continuous genealogical tree . This tree (which we emphasise is different to ) is the main object of Theorem 1.1: it is the plane tree with branch lengths given by the lifetimes of particles in the system.

We first need to give a few definitions. A metric space is said to be a real tree if, for all the following two conditions hold [LG05]:

  1. There exists a unique isometric map with and .

  2. Any continuous injective map that joins and has the same image as .

One way to define is a real tree is the following: take a continuous function with and define a “distance” function on by

whenever . It is easy to verify that this defines a pseudometric on . Thus, quotienting by the equivalence relation that identifies points with we obtain a metric space One can prove, see for example [LG05], that this metric space is a real tree. The function is called the contour function of the tree.

In our set up, if we have a marked tree we let be the real tree with contour function described as follows. Let be the set of labels of in depth first order. This is the ordering on such that is less than iff at the first coordinate where the labels of and differ, the coordinate of is less than that of . For any two individuals we let denote their most recent common ancestor: that is, the (unique) with largest, such that and . We can then define, for such that :

  • to be the length of time between the birth time of and the death time of , and

  • to be the length of time between the birth time of and the birth time of .

We set if . Let , and for set

That is: is positive and linear with unit speed on and , and is negative and linear with unit speed on . Finally, for we let interpolate linearly between and . The definition of the function is probably clearest from a picture: we draw a tree with branch lengths corresponding to lifetimes of individuals in the system, and traverse it (with backtracking) at speed one. measures how “high” we are in the tree at time .

Figure 2.1: An example of the continuous tree generated by a branching diffusion. Here and . Every branch of corresponds to a particle , and this branch has length .

2.3 Martingales

Suppose that is bounded satisfying a uniform exterior cone condition, and , , and are as in Section 2.1. Then by (2.2), and using the fact that the eigenfunctions form an orthonormal basis of we see that for any and

(2.5)

where we write throughout the paper. In particular,

(2.6)

for all One consquence of this is that the process

(2.7)

is a (positive) martingale under . Furthermore, this single-particle martingale implies the existence of a martingale for the entire branching diffusion under . Indeed, a straightforward application of the Markov property for the branching diffusion and Lemma 2.4 yields the following.

Lemma 2.6.

The process

is a positive martingale under , for each . It therefore converges almost surely to an almost surely finite limit .

This martingale is the natural analogue of the well-known martingale for Galton–Watson processes (with offspring mean ). Variants of the martingale for general branching processes have been studied extensively in the literature: see for example [BK77, Lyo97] for the branching random walk case and [CR88, Kyp04, EK04, HH07, HHK12] for branching Brownian motion, among many others.

2.4 Spine theory

It turns out that a helpful approach in many parts of this paper will be to study the behaviour of the system under a change of measure. Precisely, the change of measure defined by the martingale from the previous section. To give a useful description of this, we need to view our process on a yet larger state space: the space of marked trees with spines. This is a classical technique first introduced by [CR88], and since used extensively by many authors [HHK12, HH09, Rob10, HR15]. For a thorough exposition of the subject, we refer the reader to [HH09], and the overview in this section will closely follow that given in [HH09, HR15].

Suppose we have a marked tree . A spine on is a subset of (where we recall that () is an isolated cemetry point) such that:

  • and for all ;

  • and implies that ; and

  • if and , then there exists a unique with . If , then for all .

We write

for the space of marked trees with spines. Given we let be the unique element and write for its position. With a slight abuse of notation, we set if and also in this case say that if and for some .

2.4.1 Filtrations

There are several different filtrations we can place on . We give brief descriptions of each of these below : see [Rob10] for more rigorous definitions.

  • is the natural filtration of the branching process as before, and does not contain any information about spines. We write .

  • is the natural filtration of the branching process plus the spine. We write .

  • is the filtration generated by the spatial motion of the spine. We write

  • is the filtration that knows everything about the spine until time : which individuals are in the spine, their motions, fission times, and family sizes at fission times along the spine. We write .

2.4.2 Probability measures

We first want to define the probability measure on under which, informally speaking, the law of the tree is the same as under , and then the spine is chosen by picking one of the children uniformly at random at every branching event. More rigorously, if is an , measurable random variable, then can be written [HR15] as

(2.8)

where is measurable with respect to . Given this representation, we define the measure on by setting for each -measurable :

(2.9)

where . Note that .

2.4.3 Change of measure

Recall from Section 2.3 that defines a mean-one martingale under . This implies, see [HH09], that

where is the first time that leaves the domain , is a mean-one -martingale under . In fact, it is easy to check that Thus, if we define a new probability measure on via the martingale change of measure

(2.10)

then we have, defining , that

(2.11)

We have the following description for how the branching diffusion with a distinguished spine behaves under (see for example [CR88] or [HH09]):

  • we begin with one particle at position , which is the spine particle;

  • the spine particle evolves as if under the changed measure

    (2.12)
  • the spine particle branches at rate and is replaced by a number of children having the size-biased distribution , where

  • given that there are children born at such a branching event, one is chosen uniformly at random to be the spine and stochastically repeats the behaviour of its parent. Non-spine particles initiate independent branching diffusions with law , from the point of fission.

We now make a number of remarks about this. Firstly, note that the spine particle under never leaves the domain , and that it always has a positive number of children. This means that the branching diffusion under never becomes extinct, and that the spine particle is never in the cemetery state. Also note that under the change of measure (2.12), the spine particle evolves as a diffusion with transition kernel : the kernel we defined in Lemma 2.1. Its motion is that of the diffusion under , but conditioned to remain in for all time, see [Pin85]. In particular, Lemma 2.1 tells us that if the domain is Lipschitz, its position converges quickly to an equilibrium distribution with density . Finally, we record that (by an easy calculation):

(2.13)

for .

3 The Phase Transition

In this section we will provide a proof of Theorem 1.3.

Proof of Theorem 1.3. First suppose that . Then we know by Lemma 2.6 that almost surely. Thus, by definition of , we can conclude the proof in this case as soon as we can show that . However, for this it is enough to show that is uniformly integrable, and in fact, an elementary calculation using Lemma 2.5 gives that if , then is uniformly bounded in . We leave this calculation to the reader.

Next suppose that . We write

(3.1)

where the second equality comes from Lemma 2.4. Observe that by Lemma 2.1, we also have the existence of a constant , depending only on , such that

(3.2)

for all . Since , this implies that when , the right hand side of (3.1) converges to as (uniformly in ). Hence we have almost sure extinction in this case.

Finally, we deal with the critical case . We make use of the following lemma, which can be found in [Wat65, Lemma 2.1].

Lemma 3.1 ([Wat65]).

For all

The proof we give is the same as that in [Wat65] (we include it only to show that it still works with our current assumptions on and .)

Proof.

Since is integer-valued, and , it is sufficient to prove that for every . Fix and define a sequence of hitting and leaving times , by letting be the first time that , and be the first time (necessarily after ) that . Then inductively, let be the first time after that , and the first time after that . We have to show that as . Set

which is strictly positive by Lemma 2.1, and let be the probability that an random variable is bigger than . Then we have that