Extensions of Billingsley’s Theorem via MultiIntensities
Abstract
Let be the prime factors of a random integer chosen uniformly from to , and let
be the sequence of scaled log factors. Billingsley’s Theorem (1972), in its modern formulation, asserts that the limiting process, as , is the PoissonDirichlet process with parameter .
In this paper we give a new proof, inspired by the 1993 proof by Donnelly and Grimmett, and extend the result to factorizations of elements of normed arithmetic semigroups satisfying certain growth conditions, for which the limiting PoissonDirichlet process need not have . We also establish PoissonDirichlet limits, with , for ordinary integers conditional on the number of prime factors deviating from the usual value .
At the core of our argument is a purely probabilistic lemma giving a new criterion for convergence in distribution to a PoissonDirichlet process, from which the numbertheoretic applications follow as straightforward corollaries. The lemma uses ingredients similar to those employed by Donnelly and Grimmett, but reorganized so as to allow subsequent number theory input to be processed as rapidly as possible.
A byproduct of this work is a new characterization of PoissonDirichlet processes in terms of multiintensities.
1 Introduction
In this paper we provide a new criterion for convergence in distribution to PD() — the PoissonDirichlet process with parameter — and then apply it to extend Billingsley’s theorem on the asymptotic distribution of log prime factors of a random integer to much more general number theoretic contexts.
The new criterion is an application of an existing general weak convergence lemma from [3], restated here as Proposition 1, which supplements Alexandrov’s Portmanteau Theorem on equivalent conditions for weak convergence [4]. That lemma governs, in particular, the convergence of a sequence of discrete nonlattice random variables to a continuum limit possessing a smooth density. Here, we adapt it for direct application to a sequence of random multisubsets of (0,1], with a hypothesis yielding the limiting PD().
The following is the simplest version of our new criterion: Given a sequence of multisubsets of , let denote the sum of the elements of , counting multiplicities, and for any set let denote the cardinality of the intersection, also counting multiplicities. Also, let , where the largest element of if , and if .
Lemma 2 then asserts the following: Suppose that almost surely, for all , and that for any collection of disjoint closed intervals satisfying , for any , we have
(1) 
as Then converges in distribution to a .
The more general version, Lemma 3, allows a limiting PD() with , and has a somewhat more complicated expression on the righthand side of the inequality. Nonetheless both versions are easy to use in our applications.
In proving Billingsley’s original theorem, for instance, the multiset appearing above consists of the log prime factors, , of a random integer in ; and so each number is simply the number of prime factors, counting multiplicities, falling into . The main step of the proof, confirmation of the hypothesis (1), reduces to scarcely more than a citation of Mertens’ formula [9]
(2) 
where is a constant.
In the later sections of this paper, we apply our new criterion to give

A reproof of the original Billingsley’s theorem;

a generalization to a class of normed arithmetic semigroups for which an analogue of Landau’s prime ideal theorem is valid, still yielding a PD(1) limit;

a generalization to a class of normed arithmetic semigroups satisfying the growth hypotheses of Bredikhin’s theorem ([16], Section 2.5), yielding PD() limits with ; and

a final generalization to ordinary integers, conditional on the number of prime factors in the selected integer deviating unusually from the prescription of Turán’s theorem; here too the limiting PD() has .
Our work was inspired by the proof of Billingsley’s theorem given by Donnelly and Grimmett [6], whose ingredients are included amongst our own. Our initial motivation was to place as much of the burden as possible on selfcontained probability tools and isolate the use of number theoretic input. One difference in our approach is that, internal to the new convergence lemma, we work directly with the density function of the GEM distribution instead of aiming for a limit process of the component uniforms, as they do.
2 Probability Background
2.1 Pd() and GEM()
For our present purposes the PoissonDirichlet point process with parameter can be characterized in the following two equivalent ways:

PD() is the Poisson point process with scale invariant intensity measure on , conditioned on the sum of the arrivals being 1;

Let be a sequence of independent uniforms on , let for , and let . Let be the outcome of sorting into descending order, or ranking them, as we will say. Then are the arrivals of the PD() in .
For information on this process and its role in combinatorial modeling, as well as alternative characterizations with proofs of equivalence see, e.g., [1, 2, 11].
The GEM() process
appearing in the second characterization, has itself been well studied (see, e.g.,[1]). It is easy to see that with probability one we have , for all . So, in particular, there are no positive accumulation points and hence ranking is actually possible.
Lemma 1.
Let
be a sequence of processes of nonnegative numbers, each with almost surely finite sum; and for each let
be the ranked version of . Suppose converges in distribution to GEM(). Then converges in distribution to PD().
For each the first coordinates of the GEM() possess a joint probability density function , with the formula (see (5.28) in Section 5.4 of [1])
(3) 
for , where is the open set
(4) 
While exclusion of the subdiagonals is not usually imposed in the definition of , excluding a set of Lebesgue measure does no harm, and it definitely finesses a technical issue arising in the proof of our main lemma.
2.2 A Proposition on Weak Convergence
For the reader’s convenience, we quote Proposition 2.1 from [3], upon which the new result will depend. Here we let be a random element of with density of the form , where is open, the function is continuous, and is the indicator function of .
Proposition 1.
Let be defined as above, and let , , be arbitrary random elements of .
Suppose that, for every , there exists for which every closed coordinate box
(5) 
also satisfies
(6) 
Then . That is, as , converges in distribution to .
3 Convergence to PD()
3.1 Preliminaries
Our convergence criterion is most conveniently cast in the language of random multisubsets^{1}^{1}1 By saying is a multisubset of , we mean that is a universal set, and for each , the multiplicity of as an element of is a nonnegative integer. There is of course an alternate reading of the phrase, with “ is a multisubset of ” to mean that both are multisets, and for each in the underlying universal set, . of the interval . In the course of the proof we will also need to consider sizebiased permutations.
We consider only multisets whose multiplicities are all finite.
Informally, a process that generates random countable (or finite) multisets has been fully specified provided that for any finite collection of Borel subsets of , the cardinalities , including multiplicities, have welldefined joint probability distributions.^{2}^{2}2 This induces a probability measure on the space whose points are countable subsets of . For further information, including the identification of random multisets with random finite integervalued measures on the ambient space, see, e.g., [10], Chapter 12. Though infinite cardinalities may occur, for any singleton we must have . The joint distributions must obey any constraints implied by set inclusions.
Given an at most countable fixed multiset of numbers in (or, indeed, lying anywhere in the positive reals) with finite sum (where each summand is included according to its multiplicity), a sizebiased permutation is an ordered list generated by the following process: The first element selected equals with probability proportional to where is the multiplicity of in ; explicitly, . Thereafter, conditional on selections already made, for any element remaining in the next element selected is with probability proportional to , where is the multiplicity of among those elements yet remaining to be selected. If , we explicitly set for all . (For multisets the count includes multiplicities.)
We may also take sizebiased permutations of random multisets, with sum : The probability , say, that the first selections are is calculated by first conditioning on the random multiset and calculating recursively, as above, and then taking the expectation as varies.^{3}^{3}3 The probability distribution of is itself determined by the joint distributions of cardinalities of intersections, together with the sum : The occurrence or nonoccurrence of the event { is determined by the multiplicities of and the sum of all remaining elements, taken with multiplicities; so the probability of that event is a function of the joint distribution of those quantities, i.e., the joint distribution of the intersection cardinalities , and . So the expectation of is taken, by definition, with respect to this latter joint distribution.
3.2 The Main Lemma,
Since the special case is a bit less complicated than the general case, yet already suffices for the classical version of Billingsley’s result as well as for our first extension, we state and prove the result for this case first.
Given an arbitrary sequence of random multisubsets of , for each define to be the sequence of elements of , including multiple occurrences, ranked by decreasing size, and padded with an infinite string of ’s if is finite. That is, we let where the largest element of if , and if . Also, define , where the defining sum is taken with multiplicities.
Lemma 2.
Given an arbitrary sequence of random multisubsets of , with associated ranked sequences of elements, assume the following: first,
(7) 
and second, for for any collection of disjoint closed satisfying the hypothesis
(8) 
we have
(9) 
Then , the PoissonDirichlet distribution with .
Proof.
Since for we have , it suffices to consider only (9). Taking our cue from Donnelly and Grimmett [6], for each define a process,
whose components are the successive elements of a size biased permutation of , padded with zeros if is finite. We will use Lemma 1 with equal to the first coordinates of the GEM(1), in conjunction with (3) and (4), to show that as , the first coordinates of converge in distribution to the first coordinates of a GEM(), for each . Since this implies that converges to a GEM(), we will then conclude by Lemma 1 that , the ranked version of , converges to a PD().
So let be a coordinate box whose component intervals satisfy our hypotheses. Conditional on we see that
Conditional also on the first selections lying in , respectively, since their sum must be at least the conditional probability that
is at least
where we have used the disjointness of the intervals to infer that all of remains available.
Hence, we find that
(11) 
Combining hypothesis (9) with the fact that , and using formula (3) with , we see that
To apply Proposition 1, given it will suffice to find large enough so that for all coordinate boxes satisfying (5), we have . Then (9) will imply (6). Note that while nothing in the statement of Lemma 1 explicitly allows us to restrict attention to boxes whose defining intervals are disjoint, as required for the invocation of (9), our crafty choice of domain in (4) makes that automatic. Any closed box lying in also satisfies , the other requirement.
Without harm we may restrict to . Since given any , a box satisfying (5) satisfies
(12) 
it suffices to take
to get and hence .
3.3 Characterization of PD(1) via Multiintensity
Lemma 2 above gives a sufficient condition for convergence to the PoissonDirichlet distribution, with parameter . We now explain how this gives a new characterization of the PD distribution.
A standard concept for point processes is the intensity measure; in our setup with a random multiset set this is the deterministic measure on the Borel subsets, defined by . A standard result in measure theory, the theorem, implies that is determined by its values on closed intervals, for , . At this level, both the PoissonDirichlet process PD(1) and the scale invariant Poisson process with intensity on , have the same intensity with .
Secondorder intensity has been considered, for example in [5]. It is natural to generalize, and define multiintensity or fold intensity for , by taking arbitrary choices of disjoint closed intervals , setting , and defining
In case there is a function on , such that , we say that the random set has multiintensity density at . For example, any Poisson process with intensity has multiintensity density . In particular, the scale invariant Poisson process with intensity on has multiintensity density
(13) 
for all choices of distinct .
The multiintensity for the PoissonDirichlet is easily derived from I) in Section 2.1, the characterization of PD as PP conditional on , where is the sum of all the points of the Poisson process with intensity on . A special simplification arises from the property that the density for , given explicitly by where is Dickman’s function, satisfies for all . For distinct with , by conditioning the Poisson process on having we have for the PD
(14) 
To summarize, by comparing (13) with (14), we see that for , the Poisson process and the PoissonDirichlet don’t have the same multiintensity densities, but their densities agree, when restricted to with .
Corollary 1.
View the PoissonDirichlet process as the random multisubset of given by ^{4}^{4}4The condition implies that the multiset can be reconstructed from the sequence .. Then the PD is the unique random for which both
where the sum is taken with multiplicities, and for each , the multiintensity density of is given by the right side of (14).
3.4 The Main Lemma, Arbitrary
We keep the notation of Section 3.2.
Lemma 3.
Let . Given an arbitrary sequence of random multisubsets of , with the associated ranked sequences of elements, make the following assumption: Suppose that for some with it is the case that for any collection of disjoint closed satisfying the hypothesis
(15) 
we have both
(16) 
and
(17) 
Then , the PoissonDirichlet process with parameter .
Proof.
As in the proof of Lemma 2, we appeal to Lemma 1, this time using (3) with to specify the target limit density. Also, it suffices to consider only (16) and not (18). If the coordinates of are generated as a sizebiased permutation of the elements of , padded with zeros if necessary, then (11) gives a lower bound on . This combines with hypothesis (16) and formula (3) to give
where we have written for .
We now show the preliminary factors can be replaced with : Given , setting
will serve, via (12) for boxes complying with (5), to ensure that
As for bounding
when , complying with (5) for a given also means
for each , where in the rightmost member we have measured distance from the hyperplane . Since we get
and so for sufficiently large we have
as well. Thus when , given pick small enough so that , and then choose to be the larger of and . Proposition 1 now applies, completing the argument. ∎
3.5 Characterization of PD() via Multiintensity
We now treat the situation for general , thereby extending the results of Section 3.3. We will be brief and highlight only the changes.
The scale invariant Poisson process with intensity on has multiintensity density
(19) 
for all choices of distinct . For any , the density for , restricted to (0,1], is given (see for example [1], formula (4.20)) by
Hence, by conditioning the Poisson process on the event , for distinct with , we have multiintensity density
(20) 
For , this gives the intensity measure of the PD, on , which differs, when , from the intensity of the corresponding Poisson process.
Corollary 2.
Let . View the PoissonDirichlet process with parameter , , as the random multisubset of given by . Then the PD is the unique random for which both
where the sum is taken with multiplicities, and for each , the multiintensity density of is given by the right side of (20).
4 Classic Billingsley
We reprove Billingsley’s original theorem, even though it becomes a special case of a later result.
Theorem 1.
Given , let be the prime factors, including multiple factors, of a random integer chosen uniformly from to ; and for let
where we set if or exceeds the total number of prime factors, including multiplicities. Define
Then converges to a PD() as . Equivalently, for each the tuple converges in distribution to the first coordinates of a PD().
Proof.
Define the multiset to contain the nonzero coordinate entries of , and let be the set underlying , i.e., with no multiple copies. We want to use Lemma 2. Since for any and we want lower bounds, it suffices to investigate .
Let be disjoint subintervals of , with . The key step is to see that
(21) 
where the sum is over all tuples of primes lying in .
To verify (21), for integers let be the indicator function of the event . If is our random integer we then have
where each ranges over the primes in , and where we have used the disjointness of the intervals in the last step. Note that since and each , each product in the final sum, and hence also the total number of summands, cannot exceed .
5 New Billingsley for Normed Arithmetic
Semigroups
In this section we extend Billingsley’s theorem to the context of normed arithmetic semigroups satisfying growth conditions general enough to allow a PD() limit with .
5.1 Normed Arithmetic Semigroups
Following the terminology of [12], a normed arithmetic semigroup is a commutative semigroup whose only unit is , admitting unique factorization into prime elements,^{5}^{5}5i.e., free generators and equipped with a nonnegative multiplicative norm function for which any set of elements with bounded norm is finite. It follows at once that any element must have norm .
Certain growth conditions have been studied, with a view towards providing abstract settings for classical analytic number theory. For , let be the number of elements with , and let be the number of primes with . The asymptotic linear growth condition, that for some positive constants and , we have
(22) 
has been studied by, e.g., Knopfmacher in [12] (as well as by Beurling before him) who shows, among many other things, that given (22) we have a generalized Mertens formula ([12], Lemma 2.5) asserting that for positive ,
(23) 
where the constant depends on the semigroup. He also proves a prime element theorem, based on Landau’s prime ideal theorem, asserting that for we have
(24) 
though we will not need that here.
Apart from the ordinary positive integers, of course, the semigroup of integral ideals in a number field is a standard example, with growth condition (22) derived, e.g., in [15], Theorem 39; and many other natural examples are given in [12]. For some additional examples of contemporary interest, see [13, 14].
B. M. Bredikhin has studied normed arithmetic semigroups in which satisfies for fixed and has shown, in particular, that if
(25) 
for some , then
(26) 
for some positive depending on and where . See [16], Section 2.5 for a complete account.
A generalized Mertens formula is given for this case as well, in passing, on p. 93 of [16], namely
but we will need the stronger form
(27) 
for some constant depending only on . This follows, however from (25) via a standard Stieltjes integral argument: Write , so that ; also is clearly of bounded variation. Let be less than the minimum norm value of any prime element. Then we have
The first integral on the right is
As for the second integral, knowing that , where
entitles us to write, via formula (4.67) on p. 57 of [8],
an integration by parts trick wellknown to analytic number theorists, but not often derived in textbooks. Thus the integral with respect to converges as , and so we may write
Collecting terms gives us (27), with