Random weighted averages, partition structures and generalized arcsine laws
This article offers a simplified approach to the distribution theory of randomly weighted averages or -means , for a sequence of i.i.d.random variables , and independent random weights with and . The collection of distributions of , indexed by distributions of , is shown to encode Kingman’s partition structure derived from . For instance, if has Bernoulli distribution on , the th moment of is a polynomial function of which equals the probability generating function of the number of distinct values in a sample of size from : . This elementary identity illustrates a general moment formula for -means in terms of the partition structure associated with random samples from , first developed by Diaconis and Kemperman (1996) and Kerov (1998) in terms of random permutations. As shown by Tsilevich (1997), if the partition probabilities factorize in a way characteristic of the generalized Ewens sampling formula with two parameters , found by Pitman (1995), then the moment formula yields the Cauchy-Stieltjes transform of an mean. The analysis of these random means includes the characterization of -means, known as Dirichlet means, due to Von Neumann (1941), Watson (1956), and Cifarelli and Regazzini (1990), and generalizations of Lévy’s arcsine law for the time spent positive by a Brownian motion, due to Darling (1949), Lamperti (1958), and Barlow, Pitman, and Yor (1989).
- 1 Introduction
- 2 Overview
- 3 Transforms
- 4 Some basic theory of -means
- 5 Models for random discrete distributions
Consider the randomly weighted average or -mean of a sequence of random variables
where is a random discrete distribution meaning that the are random variables with and almost surely, where and are independent, and it is assumed that the series converges to a well defined limit almost surely. This article is concerned with characterizations of the exact distribution of under various assumptions on the random discrete distribution and the sequence . Interest is focused on the case when the are i.i.d. copies of some basic random variable . Then is a well defined random variable, called the -mean of , whatever the distribution of with a finite mean, and whatever the random discrete distribution independent of the sequence of copies of . These characterizations of the distribution of -means are mostly known in some form. But the literature of random -means is scattered, and the conceptual foundations of the theory have not been as well laid as they might have been. There has been recent interest in refined development of the distribution theory of -means in various settings, especially for the model of distributions of indexed by two-parameters , whose size-biased presentation is known as GEM after Griffiths, Engen and McCloskey, and whose associated partition probabilities were derived by Pitman (1995). See e.g. Regazzini et al. (2002), Regazzini et al. (2003), Lijoi and Regazzini (2004), James et al. (2008a), James (2010a, b), Lijoi and Prünster (2009). See also Ruggiero and Walker (2009), Petrov (2009), Canale et al. (2017), Lau (2013) for other recent applications of two-parameter model and closely related random discrete distributions, in which settings the theory of -means may be of further interest. So it may be timely to review the foundations of the theory of random -means, with special attention to governed by the model, and references to the historical literature and contemporary developments. The article is intended to be accessible even to readers unfamiliar with the theory of partition structures, and to provide motivation for further study of that theory and its applications to -means.
The article is organized as follows. Section 2 offers an overview of the distribution theory of -means, with pointers to the literature and following sections for details. Section 4 develops the foundations of a general distribution theory for -means, essentially from scratch. Section 5 develops this theory further for some of the standard models of random discrete distributions. The aim is to explain, as simply as possible, some of the most remarkable known results involving -means, and to clarify relations between these results and the theory of partition structures, introduced by Kingman (1975), then further developed in Pitman (1995), and surveyed in Pitman (2006, Chapters 2,3,4). The general treatment of -means in Section 4 makes many connections to those sources, and motivates the study of partition structures as a tool for the analysis of -means.
This article focuses attention on two particular instances of the general random average construction .
The are assumed to be independent and identically distributed (i.i.d.) copies of some basic random variable , with the independent of . Then is called the -mean of , typically denoted or .
The case , with only two non-zero weights and . It is assumed that is independent of . But and might be independent and not identically distributed, or they might have some more general joint distribution.
Of course, more general random weighting schemes are possible, and have been studied to some extent. For instance, Durrett and Liggett (1983) treat the distribution of randomly weighted sums for random non-negative weights not subject to any constraint on their sum, and a sequence of i.i.d. random variables independent of the weight sequence. But the theory of the two basic kinds of random averages indicated above is already very rich. This theory was developed in the first instance for real valued random variables . But the theory extends easily to vector-valued random elements , including random measures, as discussed in the next subsection.
Here, for a given distribution of , the collection of distributions of , indexed by distributions of , is regarded as an encoding of Kingman’s partition structure derived from (Corollary 9). That is, the collection of distributions of , the random partition of indices generated by a random sample of size from . For instance, if has Bernoulli distribution on , the th moment of the mean of is a polynomial in of degree , which is also the probability generating function of the number of distinct values in a sample of size from : (Proposition 10). This elementary identity illustrates a general moment formula for -means, involving the exchangeable partition probability function (EPPF), which describes the distributions of (Corollary 22). An equivalent moment formula, in terms of a random permutation whose cycles are the blocks of , was found by Diaconis and Kemperman (1996) for the model, and extended to general partition structures by Kerov (1998). As shown in Section 5.7, following Tsilevich (1997), this moment formula leads quickly to characterizations of the distribution of -means when the EPPF factorizes in a way characteristic of the two-parameter family of GEM models defined by a stick-breaking scheme generating from suitable independent beta factors. Then the moment formula yields the Cauchy-Stieltjes transform of an mean derived from an i.i.d. sequence of copies of . The analysis of these random means includes the includes the characterization of -means, commonly known as Dirichlet means, due to Von Neumann (1941), Watson (1956), and Cifarelli and Regazzini (1990), as well as generalizations of Lévy’s arcsine law for the time spent positive by a Brownian motion, due to Lamperti (1958), and Barlow, Pitman, and Yor (1989).
2.2 Random measures
To illustrate the idea of extending -means from random variables to random measures, suppose that the are random point masses
for a sequence of i.i.d. copies of a random element with values in an abstract measurable space , with ranging over . Then
is a measure-valued random -mean. This is a discrete random probability measure on which places an atom of mass at location for each . Informally, is a reincarnation of as a random discrete distribution on instead of the positive integers, obtained by randomly sprinkling the atoms over according to the distribution of . In particular, if the distribution of is continuous, on the event of probability one that there are no ties between any two -values, the list of magnitudes of atoms of in non-increasing order is identical to the corresponding reordering of the sequence . The original random discrete distribution on positive integers, and the derived random discrete distribution on , are then so similar, that using the same symbol for both of them seems justified. The integral of a suitable real-valued -measurable function with respect to is just the -mean of the real-valued random variable :
Hence the analysis of random probability measures of the form (2) on an abstract space reduces to an analysis of distributions of -means for real-valued . For a listing of the normalized jumps of a standard gamma process , that is a subordinator, or increasing process with stationary independent increments, with
Such a beta variable is conveniently constructed from the standard gamma process by the beta-gamma algebra
where is a copy of that is independent of , and
See Section 5.3 for further disussion.
Replacing the gamma process by a more general subordinator makes a homogeneous normalized random measure with independent increments (HRMI) as studied by Regazzini et al. (2003), James et al. (2009). from the perspective of Bayesian inference for given a random sample of size from . Basic properties of -means derived from normalized subordinators are developed here in Section 5.2.
2.3 Splitting off the first term
It is a key observation that the -mean of an i.i.d. sequence can sometimes be expressed as a -mean by the splitting off the first term. That is the decomposition
with the residual probability sequence defined on the event by first conditioning on and then shifting back to . In general, the residual sequence may be dependent on . Then and will typically not be independent, and analysis of will be difficult. However,
|if and are independent,||(11)|
then , and are mutually independent. So
The right side is the -mean of and , with independent of and , which are independent but typically not identically distributed.
This basic decomposition of a -mean by splitting off the first term leads naturally to discussion of -means for random discrete distributions defined by a recursive splitting of this kind, called residual allocation models or stick-breaking schemes, discussed further in Section 5.1.
2.4 Lévy’s arcsine laws
An inspirational example of splitting off the first term is provided by the work of Lévy (1939) on the distributions of the time spent positive up to time , and the time of the last zero before time , for a standard Brownian motion :
See e.g. Kallenberg (2002, Theorem 13.16) for background. To place this example in the framework of -means:
Let be the length of the meander interval .
Let be the indicator of the event with Bernoulli distribution.
Let for be an exhaustive listing of the lengths of excursion intervals of away from on , with the indicator of the event that for in the excursion interval of length .
If the lengths for are put in a suitable order, for instance by ranking, then will be a sequence of i.i.d. copies of a Bernoulli variable , with independent of the excursion lengths . Then by construction,
is the -mean of a Bernoulli indicator , representing the sign of a generic excursion. This is so for any listing of excursion lengths of on that is independent of their signs. But if puts the meander length first as above, then the residual sequence is identified with the sequence of relative lengths of excursions away from zero of on . But that is also the list of excursion lengths of the rescaled process , with corresponding positivity indicators . Lévy showed that is a standard Brownian bridge, equivalent in distribution to , and that a last exit decomposition of the path of at time makes the length of the meander interval independent of , hence also independent of the residual sequence and the positivity indicators , which are encoded in the path of . Let denote the total time spent positive by this Brownian bridge . So , while also by the previous construction. Then the last exit decomposition provides a splitting of of the general form (12). In this instance,
where on the right side
and are independent, with
a Bernoulli indicator,
the meander length,
the total time spent positive by , and
the last exit time.
Lévy showed the meander interval has length , known as the arcsine law, because
while the bridge occupation time has the uniform distribution . Lévy then deduced from (13) that the unconditioned occupation time has the same arcsine distribution as and :
2.5 Generalized arcsine laws
Lévy’s arcsine laws (15) for the Brownian occupation time , the time of the last zero in , and the meander length , and his associated uniform law for the Brownian bridge occupation times , have been generalized in several different ways. One of the most far-reaching of these generalizations gives corresponding results when the basic Brownian motion is replaced by process with exchangeable increments. Discrete time versions of these results were first developed by Andersen (1953). Feller (1971, §XII.8 Theorem 2) gave a refined treatment, with the following formulation for a random walk with exchangeable increments , started at : the random number of times that the walk is strictly positive up to time has the same distribution as the random index at which the walk first attains its maximum value . In the Brownian scaling limit, Sparre Andersen’s identity implies the equality in distribution , the last time in that Brownian motion attains its maximum on . That the distribution of is arcsine was shown also by Lévy, who then argued that , the time of the last zero of on , by virtue of his famous identity in distribution of reflecting processes
where is the running maximum process derived from the path of .
Many other generalizations of the arcsine law have been developed, typically starting from one of the many ways this distribution arises from Brownian motion, or from one of its many characterizations by identities in distribution or moment evaluations. See for instance Kallenberg (2002, Theorem 15.21) for the result that Lévy’s arcsine law (15) extends to the occupation time of up to time for any symmetric Lévy process with instead of , with replaced by , the last time in that attains its maximum on , and replaced by . See also Takács (1996a, b, 1999, 1998), Petit (1992) and Mansuy and Yor (2008, Chapter 8) regarding the distribution of occupation times of Brownian motion with drift and other processes derived from Brownian motion. See Getoor and Sharpe (1994), Bertoin and Yor (1996), Bertoin and Doney (1997) for more general results on Lévy processes, and Knight (1996) and Fitzsimmons and Getoor (1995), for an extension of the uniform distribution of for Brownian motion to more general bridges with exchangeable increments, and Yano (2006) for an extension to conditioned diffusions. Watanabe (1995) gave generalized arc-sine laws for occupation times of half lines of one-dimensional diffusion processes and random walks, which were further developed in Kasahara and Yano (2005) and Watanabe et al. (2005). Yet another generalization of the arcsine law was proposed by Lijoi and Nipoti (2012).
The focus here is on generalized arcsine laws involving the distributions of -means for some random discrete distribution . The framing of Lévy’s description of the laws of the Brownian occupation times and , as -means of a Bernoulli variable, for distributions of determined by the lengths of excursions of a Brownian motion or Brownian bridge, inspired the work of Barlow, Pitman, and Yor (1989) and Pitman and Yor (1992). These articles showed how Lévy’s analysis could be extended by consideration of the path of for a random time independent of with the standard exponential distribution of . For then by Brownian scaling, while the last exit decomposition at time breaks the path of on into two independent random fragments of random lengths and respectively. Thus
This realizes the instance of the beta-gamma algebra (6) in the path of Brownian motion stopped at the independent gamma distributed random time . A similar subordination construction was exploited earlier by Greenwood and Pitman (1980) in their study of fluctuation theory for Lévy processes by splitting at the time of the last maximum before an independent exponential time . See Bertoin (1996) and Kyprianou (2014) for more recent accounts of this theory. This involves the lengths of excursions of the Lévy process below its running maximum process . Lévy recognized that for a Brownian motion his famous identity in law of processes , as in (16), implied that the structure of excursions of below is identical to the structure of excursions of away from . This leads from the decomposition of at the time of the last zero of on to the corresponding decomposition for , discussed earlier. The same method of subordination was exploited further in Pitman and Yor (1997a, Proposition 21), in a deeper study of random discrete distributions derived from stable subordinators.
The above analysis of the -mean , for an indicator variable , and the list of lengths of excursions of a Brownian motion or Brownian bridge, was generalized by Barlow, Pitman, and Yor (1989) to allow any discrete distribution of with a finite number of values. That corresponds to a linear combination of occupation times of various sectors in the plane by Walsh’s Brownian motion on a finite number of rays, whose radial part is , and whose angular part is made by assigning each excursion of to the th ray with some probability , independently for different excursions. The analysis up to an independent exponential time relies only on the scaling properties of , the Poisson character of excursions of , and beta-gamma algebra, all of which extend straightforwardly to the case when is replaced by a Bessel process or Bessel bridge of dimension , for . Then becomes a list of excursion lengths of the Bessel process or bridge over , while and become independent gamma and gamma variables with sum that is gamma. So the distribution of the final meander length in the stable case is given by
by another application of the beta-gamma algebra (6). The excursion lengths in this case are a list of lengths of intervals of the relative complement in of the range of a stable subordinator of index , with conditioning of this range to contain in the bridge case. In particular, for , the -mean of a Bernoulli indicator represents the occupation time of the positive half line for a skew Brownian motion or Bessel process, each excursion of which is positive with probability and negative with probability . The distribution of such a -mean, say , associated with a stable subordinator of index and a selection probability parameter , was found independently by Darling (1949) and Lamperti (1958). Darling indicated the representation
where is the stable subordinator with
Darling also presented a formula for the cumulative distribution function of , corresponding to the probability density
where and . Later, Zolotarev (1957) derived the corresponding formula for the density of the ratio of two independent stable variables by Mellin transform inversion. This makes a surprising connection between the stable subordinator and the Cauchy distribution, discussed further in Section 3. Lamperti (1958) showed that the density of displayed in (19) is the density of the limiting distribution of occupation times of a recurrent Markov chain, under assumptions implying that the return time of some state is in the domain of attraction of the stable law of index , and between visits to this state the chain enters some given subset of its state space with probability . Lamperti’s approach was to first derive the the Stieltjes transform
where . The associated beta distribution of appearing in (17) is also known as a generalized arcsine law. In Lamperti’s setting of a chain returning to a recurrent state, the results of Dynkin (1961), presented also in Feller (1971, §XIV.3), imply that Lamperti’s limit law for occupation times holds jointly with convergence in distribution of the fraction of time since last visit to the recurrent state to the meander length as in (17), along with the generalization to this case of the distributional identity (13), which was exploited by Barlow, Pitman, and Yor (1989). Due to the results of Sparre Andersen mentioned earlier, this beta distribution also arises from random walks and Lévy processes as both a limit distribution of scaled occupation times, and as the exact distribution of the occupation time of the positive half line for a limiting stable Lévy process with for all . But in the context of the model for , this beta distribution appears either as the distribution of the length of the meander interval , as in (17), or as the distribution of a size-biased pick from . See also Pitman and Yor (1992) and (Pitman and Yor, 1997b, §4) for closely related results, and James (2010b) for an authoritative recent account of further developments of Lamperti’s work.
2.6 Fisher’s model for species sampling
A parallel but independent development of closely related ideas, from the 1940’s to the 1990’s, was initiated by Fisher (1943). See Pitman (1996b) for a review. Fisher introduced a theoretical model for species sampling, which amounts to random sampling from the random discrete distribution with the symmetric Dirichlet distribution with parameters equal to on the -simplex of with and . See Section 5.3 for a quick review of basic properties of Dirichlet distributions. Fisher showed that many features of sampling from this symmetric Dirichlet model for have simple limit distributions as with fixed. Ignoring the order of the , the limit model may be constructed directly by supposing that the are the normalized jumps of a standard gamma process on the interval . That model for a random discrete distribution, called here the model, was considered by McCloskey (1965) as an instance of the more general model, discussed in Section 5.2 in which the are the normalized jumps of a subordinator on a fixed time interval , which for a stable subordinator corresponds to the model involved in the Lévy-Lamperti description of occupation times. McCloskey showed that if the atoms of in the model are presented in the size-biased order of their appearance in a process of random sampling, then admits a simple stick-breaking representation by a recursive splitting like (9) with i.i.d. factors . Engen (1975) interpreted this GEM model as the limit in distribution of size-biased frequencies in Fisher’s limit model. This presentation of model was developed in various ways by Patil and Taillie (1977), Sethuraman (1994), and Pitman (1996a). In this model for in size-biased random order, the basic splitting (12) holds with a residual sequence that is identical in law to the original sequence , hence also . Then (12) becomes a characterization of the law of by a stochastic equation which typically has a unique solution, as discussed in Feigin and Tweedie (1989), Diaconis and Freedman (1999), Hjort and Ongaro (2005). See also Bacallado et al. (2017) for a recent review of species sampling models.
Ferguson (1973) and Kingman (1975) further developed McCloskey’s model of derived from the normalized jumps of subordinator, working instead with the ranked rearrangement of with . However, it is easily seen that the distribution of the -mean of a sequence of i.i.d. copies of is unaffected by any reordering of terms of , provided the reordering is made independently of the copies of . So for any random discrete distribution , and any distribution of , there is the equality in distribution
where can be any random rearrangement of terms of . This invariance in distribution of -means under re-ordering of the atoms of is fundamental to understanding the general theory of -means. In the analysis of by splitting off the first term, the distribution of is the same, no matter how the terms of may be ordered. But the ease of analysis depends on the joint distribution of and , which in turn depends critically on the ordering of terms of . Detailed study of problems of this kind by Pitman (1996a) explained why the size-biased random permutation of terms , first introduced by McCloskey in the setting of species sampling, is typically more tractable than the ranked ordering used by Ferguson and Kingman. The notation will be used consistently below to indicate a size-biased ordering of terms in a random discrete distribution.
2.7 The two-parameter family
The articles of Perman et al. (1992) and Pitman and Yor (1997a). introduced a family of random discrete distributions indexed by two-parameters , which includes the various examples recalled above in a unified way. Various terminology is used for different encodings of this family of random discrete distributions and associated random partitions.
The distribution of the size-biased random permutation is known as GEM, after Griffiths, Engen and McCloskey, who were among the first to study the simple stick-breaking description of this model recalled later in (150).
The model refers here to this model of a random discrete distribution , whose size-biased presentation is GEM. For such a the associated -mean will be called simply an -mean, with similar terminology for other attributes of the model, such as its partition structure.
Following further work by numerous authors including Cifarelli and Regazzini (1990), Diaconis and Kemperman (1996) and Kerov (1998), a definitive formula characterizing the distribution of an mean , for an arbitary distribution of a bounded or non-negative random variable , was found by Tsilevich (1997): for all for which the model is well defined, except if or , the distribution of is uniquely determined by the generalized Cauchy-Stieltjes transform
Companion formulas for the case with , , trace back to Lamperti for a Bernoulli variable, as in (20), while the case with is the case of Dirichlet means due to Von Neumann (1941), and Watson (1956) in the classical setting of mathematical statistics, involving ratios of quadratic forms of normal variables, and developed by Cifarelli and Regazzini (1990) and others in Ferguson’s Bayesian non-parametric setting. These formulas are all obtained as limit cases of the generic two-parameter formula (22), naturally involving exponentials and logarithms due to the basic approximations of these functions by large or small powers as the case may be e.g. and for . For the transform (22) was obtained earlier by Barlow et al. (1989) in their description of the distribution of occupation times derived from a Brownian or Bessel bridge, by a straightforward argument from the perspective of Markovian excursion theory. But Tsilevich’s extension of this formula to general is not obvious from that perspective. Rather, the simplest approach to Tsilevich’s formula involves analysis of partition structure associated with model, as discussed in Section 5.7.
Further development of the theory of means was made by Vershik, Yor, and Tsilevich (2001). See also the articles by James, Lijoi and coauthors, listed in the introduction, for the most refined analysis of -means by inversion of the Cauchy-Stieltjes transform.
Typical arguments for identifying the distribution of a -mean involve encoding the distribution by some kind of transform. This section reviews some probabilistic techniques for handling such transforms, by study of some key examples related to ratios of independent stable variables. See Chaumont and Yor (2003) for further exercises with these techniques, and James (2010b) for many deeper results in this vein.
3.1 The Talacko-Zolotarev distribution
[Talacko-Zolotarev distribution]. Let denote a standard Cauchy variable with probability density for , and
Let be a random variable with the conditional distribution of given the event , with :
with and the distribution of defined as the limit distribution of as . For each fixed with , the distribution of is characterized by each of the following three descriptions, to be evaluated for by continuity in , as detailed later in (34):
by the symmetric probability density
by the characteristic function
by the moment generating function
The linear change of variable (23) from the standard Cauchy density of makes
and the fact that for , to calculate
This proves (i). Now (ii) and (iii) are probabilistic expressions of the classical Fourier transform
This Fourier transform is equivalent, by analytic continuation, and the change of variable as above, to the classical Mellin transform of a truncated Cauchy density
Whittaker and Watson (1927, Example 4, P. 119) attribute this Mellin transform to Euler, and present it to illustrate a general techique of computing Mellin transforms by calculus of residues. This Mellin transform also appears as an exercise in complex variables in Morse and Feshbach (1953, Part I, Problem 4.10). (Talacko, 1956) gave details of the derivation of the Fourier transform (31) by contour integration. A more elementary proof of the key Fourier transform (31) is indicated below. ∎
The Fourier transform (31) appears also in Zolotarev (1957, formula (21)), attributed to Ryzhik and Gradshtein (1951, p. 282), but with a typographical error (the lower limit of integration should be , not ). Chaumont and Yor (2012, 4.23) present some of Zolotarev’s results below their (4.23.4), including (31) with the correct range of integration, but missing a factor of : the on their left side should be as in (31).
Talacko (1956) regarded the family of symmetric densities for as a one-parameter extension of the case , with
and the limit case with
These probability densities and their associated characteristic functions were found earlier by Lévy (1951) in his study of the random area
swept out by the path of two-dimensional a Brownian motion started at . In terms of the distribution of defined by the above proposition, Lévy proved that
Lévy first derived the characteristic functions and by analysis of his area functional of planar Brownian motion. He showed that the distributions of and are infinitely divisible, each associated with a symmetric pure-jump Lévy process, whose Lévy measure he computed. He then inverted and to obtain the densities and displayed above by appealing to the classical infinite products for the hyperbolic functions. Lévy’s work on Brownian areas inspired a number of further studies, which have clarified relations between various probability distributions derived from Brownian paths whose Laplace or Fourier transforms involve the hyperbolic functions. See Biane and Yor (1987), and Pitman and Yor (2003) for comprehensive accounts of these distributions, their associated Lévy processes, and several other appearances of the same Fourier transforms in the distribution theory of Brownian functionals, and Revuz and Yor (1999, §0.6) for a summary of formulas associated with the laws of and . Note from (26) and (34) that the characteristic function of is derived from by the identity
corresponding to the identity in distribution
where and are assumed to be independent. That is to say, the distribution of is self-decomposable, as discussed further in Jurek and Yor (2004).
An easier approach to these Fourier relations (33) and (34) for and , which extends to the Fourier transform (31) for all , is to recognize the distributions involved as hitting distributions of a Brownian motion in the complex plane. The Cauchy density of in (28) is well known to be the hitting density of on the real axis for a complex Brownian motion started at the point on the unit semicircle in the upper half plane
and stopped at the random time . Let be the usual representation of this complex Brownian motion in polar coordinates, with radial part and continuous angular winding , starting from and . Then by construction
According to Lévy’s theorem on conformal invariance of Brownian motion, the process is a time changed complex Brownian motion :
and . See Pitman and Yor (1986) for further details of this well known construction. The conclusion of the above argument is summarized by the following lemma, which combined with the next proposition provides a nice explanation of the basic Fourier transform (31).
The Talacko-Zolatarev distribution of introduced in Proposition 1 as the conditional distribution of given may also be represented as