Extensions of Billingsley’s Theorem via Multi-Intensities

Extensions of Billingsley’s Theorem via Multi-Intensities

Richard Arratia
Fred Kochman
Victor S. Miller
June 2012
Abstract

Let be the prime factors of a random integer chosen uniformly from to , and let

be the sequence of scaled log factors. Billingsley’s Theorem (1972), in its modern formulation, asserts that the limiting process, as , is the Poisson-Dirichlet process with parameter .

In this paper we give a new proof, inspired by the 1993 proof by Donnelly and Grimmett, and extend the result to factorizations of elements of normed arithmetic semigroups satisfying certain growth conditions, for which the limiting Poisson-Dirichlet process need not have . We also establish Poisson-Dirichlet limits, with , for ordinary integers conditional on the number of prime factors deviating from the usual value .

At the core of our argument is a purely probabilistic lemma giving a new criterion for convergence in distribution to a Poisson-Dirichlet process, from which the number-theoretic applications follow as straightforward corollaries. The lemma uses ingredients similar to those employed by Donnelly and Grimmett, but reorganized so as to allow subsequent number theory input to be processed as rapidly as possible.

A by-product of this work is a new characterization of Poisson-Dirichlet processes in terms of multi-intensities.

1 Introduction

In this paper we provide a new criterion for convergence in distribution to PD() — the Poisson-Dirichlet process with parameter — and then apply it to extend Billingsley’s theorem on the asymptotic distribution of log prime factors of a random integer to much more general number theoretic contexts.

The new criterion is an application of an existing general weak convergence lemma from [3], restated here as Proposition 1, which supplements Alexandrov’s Portmanteau Theorem on equivalent conditions for weak convergence [4]. That lemma governs, in particular, the convergence of a sequence of discrete nonlattice random variables to a continuum limit possessing a smooth density. Here, we adapt it for direct application to a sequence of random multisubsets of (0,1], with a hypothesis yielding the limiting PD().

The following is the simplest version of our new criterion: Given a sequence of multisubsets of , let denote the sum of the elements of , counting multiplicities, and for any set let denote the cardinality of the intersection, also counting multiplicities. Also, let , where the largest element of if , and if .

Lemma 2 then asserts the following: Suppose that almost surely, for all , and that for any collection of disjoint closed intervals satisfying , for any , we have

(1)

as Then converges in distribution to a .

The more general version, Lemma 3, allows a limiting PD() with , and has a somewhat more complicated expression on the right-hand side of the inequality. Nonetheless both versions are easy to use in our applications.

In proving Billingsley’s original theorem, for instance, the multiset appearing above consists of the log prime factors, , of a random integer in ; and so each number is simply the number of prime factors, counting multiplicities, falling into . The main step of the proof, confirmation of the hypothesis (1), reduces to scarcely more than a citation of Mertens’ formula [9]

(2)

where is a constant.

In the later sections of this paper, we apply our new criterion to give

  • A reproof of the original Billingsley’s theorem;

  • a generalization to a class of normed arithmetic semigroups for which an analogue of Landau’s prime ideal theorem is valid, still yielding a PD(1) limit;

  • a generalization to a class of normed arithmetic semigroups satisfying the growth hypotheses of Bredikhin’s theorem ([16], Section 2.5), yielding PD() limits with ; and

  • a final generalization to ordinary integers, conditional on the number of prime factors in the selected integer deviating unusually from the prescription of Turán’s theorem; here too the limiting PD() has .

Our work was inspired by the proof of Billingsley’s theorem given by Donnelly and Grimmett [6], whose ingredients are included amongst our own. Our initial motivation was to place as much of the burden as possible on self-contained probability tools and isolate the use of number theoretic input. One difference in our approach is that, internal to the new convergence lemma, we work directly with the density function of the GEM distribution instead of aiming for a limit process of the component uniforms, as they do.

2 Probability Background

2.1 Pd() and GEM()

For our present purposes the Poisson-Dirichlet point process with parameter can be characterized in the following two equivalent ways:

  • PD() is the Poisson point process with scale invariant intensity measure on , conditioned on the sum of the arrivals being 1;

  • Let be a sequence of independent uniforms on , let for , and let . Let be the outcome of sorting into descending order, or ranking them, as we will say. Then are the arrivals of the PD() in .

For information on this process and its role in combinatorial modeling, as well as alternative characterizations with proofs of equivalence see, e.g., [1, 2, 11].

The GEM() process

appearing in the second characterization, has itself been well studied (see, e.g.,[1]). It is easy to see that with probability one we have , for all . So, in particular, there are no positive accumulation points and hence ranking is actually possible.

We will exploit the following result from [7]; see also [4], p. 42–43.

Lemma 1.

Let

be a sequence of processes of nonnegative numbers, each with almost surely finite sum; and for each let

be the ranked version of . Suppose converges in distribution to GEM(). Then converges in distribution to PD().

For each the first coordinates of the GEM() possess a joint probability density function , with the formula (see (5.28) in Section 5.4 of [1])

(3)

for , where is the open set

(4)

While exclusion of the subdiagonals is not usually imposed in the definition of , excluding a set of Lebesgue measure does no harm, and it definitely finesses a technical issue arising in the proof of our main lemma.

2.2 A Proposition on Weak Convergence

For the reader’s convenience, we quote Proposition 2.1 from [3], upon which the new result will depend. Here we let be a random element of with density of the form , where is open, the function is continuous, and is the indicator function of .

Proposition 1.

Let be defined as above, and let , , be arbitrary random elements of .

Suppose that, for every , there exists for which every closed coordinate box

(5)

also satisfies

(6)

Then . That is, as , converges in distribution to .

3 Convergence to PD()

3.1 Preliminaries

Our convergence criterion is most conveniently cast in the language of random multisubsets111 By saying is a multisubset of , we mean that is a universal set, and for each , the multiplicity of as an element of is a nonnegative integer. There is of course an alternate reading of the phrase, with “ is a multisubset of ” to mean that both are multisets, and for each in the underlying universal set, . of the interval . In the course of the proof we will also need to consider size-biased permutations.

We consider only multisets whose multiplicities are all finite.

Informally, a process that generates random countable (or finite) multisets has been fully specified provided that for any finite collection of Borel subsets of , the cardinalities , including multiplicities, have well-defined joint probability distributions.222 This induces a probability measure on the space whose points are countable subsets of . For further information, including the identification of random multisets with random -finite integer-valued measures on the ambient space, see, e.g., [10], Chapter 12. Though infinite cardinalities may occur, for any singleton we must have . The joint distributions must obey any constraints implied by set inclusions.

Given an at most countable fixed multiset of numbers in (or, indeed, lying anywhere in the positive reals) with finite sum (where each summand is included according to its multiplicity), a size-biased permutation is an ordered list generated by the following process: The first element selected equals with probability proportional to where is the multiplicity of in ; explicitly, . Thereafter, conditional on selections already made, for any element remaining in the next element selected is with probability proportional to , where is the multiplicity of among those elements yet remaining to be selected. If , we explicitly set for all . (For multisets the count includes multiplicities.)

We may also take size-biased permutations of random multisets, with sum : The probability , say, that the first selections are is calculated by first conditioning on the random multiset and calculating recursively, as above, and then taking the expectation as varies.333 The probability distribution of is itself determined by the joint distributions of cardinalities of intersections, together with the sum : The occurrence or non-occurrence of the event { is determined by the multiplicities of and the sum of all remaining elements, taken with multiplicities; so the probability of that event is a function of the joint distribution of those quantities, i.e., the joint distribution of the intersection cardinalities , and . So the expectation of is taken, by definition, with respect to this latter joint distribution.

3.2 The Main Lemma,

Since the special case is a bit less complicated than the general case, yet already suffices for the classical version of Billingsley’s result as well as for our first extension, we state and prove the result for this case first.

Given an arbitrary sequence of random multisubsets of , for each define to be the sequence of elements of , including multiple occurrences, ranked by decreasing size, and padded with an infinite string of ’s if is finite. That is, we let where the largest element of if , and if . Also, define , where the defining sum is taken with multiplicities.

Lemma 2.

Given an arbitrary sequence of random multisubsets of , with associated ranked sequences of elements, assume the following: first,

(7)

and second, for for any collection of disjoint closed satisfying the hypothesis

(8)

we have

(9)

Then , the Poisson-Dirichlet distribution with .

If in place of (9) we assume

(10)

then the same conclusion holds.

Proof.

Since for we have , it suffices to consider only (9). Taking our cue from Donnelly and Grimmett [6], for each define a process,

whose components are the successive elements of a size biased permutation of , padded with zeros if is finite. We will use Lemma 1 with equal to the first coordinates of the GEM(1), in conjunction with (3) and (4), to show that as , the first coordinates of converge in distribution to the first coordinates of a GEM(), for each . Since this implies that converges to a GEM(), we will then conclude by Lemma 1 that , the ranked version of , converges to a PD().

So let be a coordinate box whose component intervals satisfy our hypotheses. Conditional on we see that

Conditional also on the first selections lying in , respectively, since their sum must be at least the conditional probability that

is at least

where we have used the disjointness of the intervals to infer that all of remains available.

Hence, we find that

(11)

Combining hypothesis (9) with the fact that , and using formula (3) with , we see that

To apply Proposition 1, given it will suffice to find large enough so that for all coordinate boxes satisfying (5), we have . Then (9) will imply (6). Note that while nothing in the statement of Lemma 1 explicitly allows us to restrict attention to boxes whose defining intervals are disjoint, as required for the invocation of (9), our crafty choice of domain in (4) makes that automatic. Any closed box lying in also satisfies , the other requirement.

Without harm we may restrict to . Since given any , a box satisfying (5) satisfies

(12)

it suffices to take

to get and hence .

With this choice of we have satisfied (6), so Lemma 1 applies; and by the discussion beginning the proof we are done. ∎

3.3 Characterization of PD(1) via Multi-intensity

Lemma 2 above gives a sufficient condition for convergence to the Poisson-Dirichlet distribution, with parameter . We now explain how this gives a new characterization of the PD distribution.

A standard concept for point processes is the intensity measure; in our setup with a random multiset set this is the deterministic measure on the Borel subsets, defined by . A standard result in measure theory, the theorem, implies that is determined by its values on closed intervals, for , . At this level, both the Poisson-Dirichlet process PD(1) and the scale invariant Poisson process with intensity on , have the same intensity with .

Second-order intensity has been considered, for example in [5]. It is natural to generalize, and define multi-intensity or -fold intensity for , by taking arbitrary choices of disjoint closed intervals , setting , and defining

In case there is a function on , such that , we say that the random set has multi-intensity density at . For example, any Poisson process with intensity has multi-intensity density . In particular, the scale invariant Poisson process with intensity on has multi-intensity density

(13)

for all choices of distinct .

The multi-intensity for the Poisson-Dirichlet is easily derived from I) in Section 2.1, the characterization of PD as PP conditional on , where is the sum of all the points of the Poisson process with intensity on . A special simplification arises from the property that the density for , given explicitly by where is Dickman’s function, satisfies for all . For distinct with , by conditioning the Poisson process on having we have for the PD

(14)

To summarize, by comparing (13) with (14), we see that for , the Poisson process and the Poisson-Dirichlet don’t have the same multi-intensity densities, but their densities agree, when restricted to with .

Corollary 1.

View the Poisson-Dirichlet process as the random multisubset of given by 444The condition implies that the multiset can be reconstructed from the sequence .. Then the PD is the unique random for which both

where the sum is taken with multiplicities, and for each , the multi-intensity density of is given by the right side of (14).

Proof.

If a multiset satisfies the given hypotheses, then we can apply Lemma 2 with , for each . Conversely, we have already noted that starting with the PD, we have , and multi-intensity density as given by (14). ∎

3.4 The Main Lemma, Arbitrary

We keep the notation of Section 3.2.

Lemma 3.

Let . Given an arbitrary sequence of random multisubsets of , with the associated ranked sequences of elements, make the following assumption: Suppose that for some with it is the case that for any collection of disjoint closed satisfying the hypothesis

(15)

we have both

(16)

and

(17)

Then , the Poisson-Dirichlet process with parameter .

If in place of (16) we assume

(18)

then the same conclusion holds.

Proof.

As in the proof of Lemma 2, we appeal to Lemma 1, this time using (3) with to specify the target limit density. Also, it suffices to consider only (16) and not (18). If the coordinates of are generated as a size-biased permutation of the elements of , padded with zeros if necessary, then (11) gives a lower bound on . This combines with hypothesis (16) and formula (3) to give

where we have written for .

We now show the preliminary factors can be replaced with : Given , setting

will serve, via (12) for boxes complying with (5), to ensure that

As for bounding

when , complying with (5) for a given also means

for each , where in the rightmost member we have measured distance from the hyperplane . Since we get

and so for sufficiently large we have

as well. Thus when , given pick small enough so that , and then choose to be the larger of and . Proposition 1 now applies, completing the argument. ∎

3.5 Characterization of PD() via Multi-intensity

We now treat the situation for general , thereby extending the results of Section 3.3. We will be brief and highlight only the changes.

The scale invariant Poisson process with intensity on has multi-intensity density

(19)

for all choices of distinct . For any , the density for , restricted to (0,1], is given (see for example [1], formula (4.20)) by

Hence, by conditioning the Poisson process on the event , for distinct with , we have multi-intensity density

(20)

For , this gives the intensity measure of the PD, on , which differs, when , from the intensity of the corresponding Poisson process.

Corollary 2.

Let . View the Poisson-Dirichlet process with parameter , , as the random multisubset of given by . Then the PD is the unique random for which both

where the sum is taken with multiplicities, and for each , the multi-intensity density of is given by the right side of (20).

Proof.

If a multiset satisfies the given hypotheses, then we can applyLemma 3 with , for each . Conversely, we have already noted that starting with the PD, we have , and multi-intensity density as given by (20). ∎

4 Classic Billingsley

We reprove Billingsley’s original theorem, even though it becomes a special case of a later result.

Theorem 1.

Given , let be the prime factors, including multiple factors, of a random integer chosen uniformly from to ; and for let

where we set if or exceeds the total number of prime factors, including multiplicities. Define

Then converges to a PD() as . Equivalently, for each the -tuple converges in distribution to the first coordinates of a PD().

Proof.

Define the multiset to contain the non-zero coordinate entries of , and let be the set underlying , i.e., with no multiple copies. We want to use Lemma 2. Since for any and we want lower bounds, it suffices to investigate .

Let be disjoint subintervals of , with . The key step is to see that

(21)

where the sum is over all -tuples of primes lying in .

To verify (21), for integers let be the indicator function of the event . If is our random integer we then have

where each ranges over the primes in , and where we have used the disjointness of the intervals in the last step. Note that since and each , each product in the final sum, and hence also the total number of summands, cannot exceed .

Since is uniform random in we have

uniformly in . Then

establishing (21).

Putting it all together then yields

where we have used Mertens’ formula (2) in the last step. This confirms hypothesis (10). As for (7), since for each we have it is always the case that . Therefore, Lemma 2 applies, completing the proof. ∎

5 New Billingsley for Normed Arithmetic
Semigroups

In this section we extend Billingsley’s theorem to the context of normed arithmetic semigroups satisfying growth conditions general enough to allow a PD() limit with .

5.1 Normed Arithmetic Semigroups

Following the terminology of [12], a normed arithmetic semigroup is a commutative semigroup whose only unit is , admitting unique factorization into prime elements,555i.e., free generators and equipped with a nonnegative multiplicative norm function for which any set of elements with bounded norm is finite. It follows at once that any element must have norm .

Certain growth conditions have been studied, with a view towards providing abstract settings for classical analytic number theory. For , let be the number of elements with , and let be the number of primes with . The asymptotic linear growth condition, that for some positive constants and , we have

(22)

has been studied by, e.g., Knopfmacher in [12] (as well as by Beurling before him) who shows, among many other things, that given (22) we have a generalized Mertens formula ([12], Lemma 2.5) asserting that for positive ,

(23)

where the constant depends on the semigroup. He also proves a prime element theorem, based on Landau’s prime ideal theorem, asserting that for we have

(24)

though we will not need that here.

Apart from the ordinary positive integers, of course, the semigroup of integral ideals in a number field is a standard example, with growth condition (22) derived, e.g., in [15], Theorem 39; and many other natural examples are given in [12]. For some additional examples of contemporary interest, see [13, 14].

B. M. Bredikhin has studied normed arithmetic semigroups in which satisfies for fixed and has shown, in particular, that if

(25)

for some , then

(26)

for some positive depending on and where . See [16], Section 2.5 for a complete account.

A generalized Mertens formula is given for this case as well, in passing, on p. 93 of [16], namely

but we will need the stronger form

(27)

for some constant depending only on . This follows, however from (25) via a standard Stieltjes integral argument: Write , so that ; also is clearly of bounded variation. Let be less than the minimum norm value of any prime element. Then we have

The first integral on the right is

As for the second integral, knowing that , where

entitles us to write, via formula (4.67) on p. 57 of [8],

an integration by parts trick well-known to analytic number theorists, but not often derived in textbooks. Thus the integral with respect to converges as , and so we may write

Collecting terms gives us (27), with