On financial applications of the twoparameter PoissonDirichlet distribution Research Note
Abstract
Capital distribution curve is defined as loglog plot of normalized stock capitalizations ranked in descending order. The curve displays remarkable stability over periods of time.
Theory of exchangeable distributions on set partitions, developed for purposes of mathematical genetics and recently applied in nonparametric Bayesian statistics, provides probabilisticcombinatorial approach for analysis and modeling of the capital distribution curve. Framework of the twoparameter PoissonDirichlet distribution contains rich set of methods and tools, including infinitedimensional diffusion process.
The purpose of this note is to introduce framework of exchangeable distributions on partitions in the financial context. In particular, it is shown that averaged samples from the PoissonDirichlet distribution provide approximation to the capital distribution curves in equity markets. This suggests that the twoparameter model can be employed for modelling evolution of market weights and prices fluctuating in stochastic equilibrium.
A4 \setmargins1.1cm0.7cm19.0cm24.5cm12pt1cm0pt1.25cm
1 Introduction
The capital distribution curve is defined as loglog plot of stock market weights ranked in descending order. Temporal stability of the shape of this curve is one of the cornerstones of the Stochastic Portfolio Theory (SPT), developed by Fernholz, Karatzas et al. ([fernholz2002stochastic], [karatzas2009stochastic] and [fernholz2013second]). In contrast to the MPT and the CAPM, which are based on normative assumptions, the Stochastic Portfolio Theory is a descriptive theory, since it studies empirical dynamics and characteristics of equity markets. In particular, the SPT captures tendency of stocks of retaining their ranks. The SPT model employs machinery of rankinteracting Brownian particles and semimartingales.
Framework of partition structures, imported from mathematical genetics and comprising combinatorial and probabilistic methods, provides complementary approach for modeling and analysis of the capital distribution curve and can be summarized as follows.

The market is considered as a large combinatorial structure  partition of the set of the invested units of money. Capitalizations of individual stocks correspond to block or cluster sizes of the partition, represented by integers, for instance, measured in cents.

Number of set partitions defines number of ways each partition can be realized combinatorially. In other words, the market can be represented as a giant Young diagram with vector of capitalizations determining (potentially very large) number of ways such market configuration can be realized.
Partition structures are important for several reasons.

First of all, partition structures provide a model of random transitions with dynamic dimensions. In other words, at any time number of diffusion components may change due to appearance of a new stock or bankruptcy of existing firm.

Second, partition structure, with nontrivial limiting distribution, defines asymptotic shape of the corresponding combinatorial structure. In particular, mechanism of shape formation provides an explanation of the phenomenon of stability of the capital distribution curve.
The twoparameter PoissonDirichlet model is a remarkable and well studied instance of partition structures. It possesses analytically tractable limiting distribution defined in the simplex of ranked weights. PoissonDirichlet distribution. The Dirichlet distribution with dimensional vector of parameters defines probability for nonnegative proportions in a standard simplex. Kingman [kingman1975random] considered limiting behavior of this distribution with symmetric vector of parameters such that for and called distribution of ranked components the PoissonDirichlet () distribution (with one parameter ). This distribution is defined in the infinite simplex of ranked weights, known as Kingman simplex
Sizebiased permutation provides an efficient method of sampling from the Dirichlet and the PoissonDirichlet distributions. In a framework of population biology Engen [engen1978] suggested modification of the sizebiased method, which produced another class of PoissonDirichlet distributions. It was called the twoparameter PoissonDirichlet distribution by Perman, Pitman and Yor, who rediscovered it in the context of studying of ranked jumps of gamma and stable subordinators (see [perman1992size],[PY]). Monograph by Pitman [pitman2002combinatorial] contains wealth of information on the twoparameter PoissonDirichlet model. As shown by Chatterjee and Pal [chatterjee2010phase], limiting behaviour of rankinteracting system of Brownian particles is characterized by the distribution.
Aoki pioneered applications of exchangeable distributions in economics ([aoki2001modeling],[Aoki228]), in particular using finitary characterization by Garibaldi, Costantini, et al. ([garibaldi2004finitary], see also book [garibaldi2010finitary]). Markov chain approach with transitions in space of partitions was independently developed by Garibaldi, Costantini, et al. [garibaldi2004finitary], [garibaldi2007two]. Petrov [petrov2009two], inspired by works of Kerov, Fulman [fulman2005stein], Borodin and Olshanski [borodin2009infinite] constructed a diffusion process preserving the twoparameter PoissonDirichlet distribution in the infinitedimensional ranked simplex.
This research note aims at illustration of applications of partition structures and the twoparameter model for modeling of stochastic evolution of the capital distribution curve. In particular, it is shown in Section LABEL:secexamples that the twoparameter model provides reasonable approximation of capital distribution curves in equity markets. Moreover the model also provides fit for distribution of relative total capitalizations of stock exchanges.
Main results of this paper were presented at the 8th World Congress of the Bachelier Finance Society, 2014. The author is very grateful to Prof. I. Karatzas for useful advice and suggestions.
1.1 Capital distribution curve
Loglog plot of ranked market weights displays

power law behavior,

concavity of the curve and

stability over periods of time
For example, figure below shows capital distribution curves of the NASDAQ market on three dates in 2014.
More detailed chart reveals behavior of weights of top 100 stocks.
Capital distribution curves on majority of equity markets, as well as distribution of capitalizations of world stock exchanges, have shapes similar to one shown at Figure 1. Section LABEL:secexamples contains examples of fit of these curves by the model.
1.2 PoissonDirichlet distribution and market weights
Loglog plot of ranked samples from the PoissonDirichlet law is characterized by

power law behavior,

concavity of the curve and

stability around average shape
The infinitedimensional PoissonDirichlet distribution generalizes symmetric finitedimensional Dirichlet distribution. Moreover, as shown in Section 1.4, both distributions can be represented by normalization of sequences of random variables by their sum
with the property of independence of weights and the sum .
Figure below illustrates fit of NASDAQ market weights by averages of samples from the twoparameter distribution. Estimation of parameters is by least squares method.
Next figure displays typical behaviour of ranked random weights
1.3 Ranked capitalizations and market weights
Stock capitalization at time is calculated as product of the shares outstanding and the stock price
For capitalizations ordered as corresponding ranked market weights are determined by
where is total market capitalization at time . Stability of the capital distribution curve means
In other words, ranked weights remain approximately the same despite changes in capitalizations. This implies that for relatively short periods of time, when the stock retains its rank numéraire approach of pricing approximately holds
However, it should be noticed that the longer the time period , the less likely that stock retains its rank. More advanced approach of modelling market weights and stock capitalizations is based on application of diffusion theory and representation of the distribution in terms of jumps of subordinators. This representation is known as Proposition 21 in the celebrated paper of Pitman and Yor [PY].
1.4 GammaDirichlet algebra
There is close relationship between the gamma and Dirichlet distributions, characterized by number of important properties, which in the symmetric case can be summarized as follows. Let us consider independent and identically distributed gamma variables with shape and scale . The first, convolution property states that the sum of these variables also has gamma distribution with . The second property states that normalized components are independent of the sum , moreover, as it has been shown by Lukacs [lukacs1955characterization], this characterizing property holds if and only if are gamma distributed with the same scale . Finally, normalized vector has symmetric Dirichlet distribution .
Conversely, with Dirichlet distributed vector and independent gamma distributed ’restored’ variables , correspondingly, have gamma distributions .
Obviously, these properties hold as well in the case of the ordered Dirichlet distribution. For instance, with ranked components obtained from the symmetric Dirichlet distribution and independent , restored gamma variables are also ranked in descending order.
Similar characterization of the law is provided by the Proposition 21 in Pitman and Yor [PY], which informally can be restated as follows. Let us consider tempered stable subordinator with Lévy density in random time interval , with and denote ranked jumps of the subordinator in this interval by . Sum of these jumps is equal to value of the tempered subordinator stopped at random time
As in the case with the Dirichlet distribution, the Proposition 21 in [PY] states that sum of the jumps . The second statement of the proposition is that are independent of the sum . Finally, sequence of normalized jumps has the PoissonDirichlet distribution with parameters . In what follows Prop. 21 provides convenient way of modeling stochastic evolution of stock prices ’restored’ from dynamics of market weights.
1.5 market model
It is natural to employ stickbreaking and sizebiased sampling methods described in Sections LABEL:secGDSBP and LABEL:secPDD for modeling diffusion with stationary PoissonDirichlet distribution. At first this approach was proposed by Feng and Wang [fengwang], who also proved reversibility of corresponding infinitedimensional process. Let us recall that the WrightFisher diffusion process driven by the SDE
has reversible stationary beta distribution .
If denotes market weight of the th largest stock at time , then stochastic evolution of market weights can be determined from the stickbreaking process
where processes are determined by independent SDEs
with stationary beta distributions, corresponding to the sizebiased sampling definition (LABEL:TPSB)
Initial values of processes are determined by
Local evolution of overall market capitalization can be modelled by diffusion
with stationary gamma distribution , where
variable is defined by condition .
Correspondingly, local behaviour of stock prices is defined by product of independent processes
where denotes number of shares outstanding.
1.6 The brokenstick model
The brokenstick is a simple model illustrating how uniform partition produces inequality patterns. MacArthur [macarthur1957relative] proposed this model for explanation of relative species abundances in closed environment.
Let’s assume that stick of unit length represents some finite resource, such as territory, available food, water reservoir, etc., which must be shared between species. The resource is broken at random by throwing uniformly cutting points on this stick and breaking it into pieces. Length of each piece represents share, which is taken by some class of species. While on average length of each piece will be , ranked lengths of pieces display interesting behavior.
For instance, if stick is broken just into two pieces, then length of smaller piece is never larger than 50% and since cut point is uniformly distributed it is easy to see that smaller stick on average represents 25% of length, while larger one takes 75%. In general it can be shown that after breaking stick into pieces expected length of the th largest piece is given by
In case of 3 pieces expected proportions ranked in descending order are 61.1%, 27.8% and 11.1%. It can be checked by straightforward simulation that dropping 4 points at uniform on unit interval produces on average following ranked lengths of 5 subintervals
Obviously, sampled proportions will fluctuate around these expected lengths. For larger values of ranked expected proportions start to decay rapidly and it is more convenient to display them on a loglog plot.
This example illustrates that asymmetry in ranked proportions appears with completely uniform distribution of resource.
1.7 Toy model
Let us imagine that there are only two stocks with capitalizations 3 and 2 in the market with capitalization 5. Tickers or names do not play important role and used only to distinguish the stocks. Ten ways in which 5 units of money can form a state with these capitalizations is represented by the ten Young tableaux shown on the left
Since these partitions have the same block sizes it is convenient to use Young diagram, shown on the right, to denote all partitions with the same shape. The 10 partitions above arise by adding a new box:
Footnotes
 Data source is http://www.google.com/finance#stockscreener