Abstract
The usual method of fit quality assessment is a special case of the more general method of Bayesian model comparison which involves integrals of the likelihood and prior over all possible values of all parameters. We introduce new parametrisations based on systematic expansions around the stretched exponential or Fouriertransformed Lévy source distribution, and utilise the increased discriminating power of the Bayesian approach to evaluate the relative probability of these models to be true representations of a recently measured BoseEinstein correlation data in annihilations at LEP.
From to Bayesian model comparison and Levy expansions of BoseEinstein correlations in reactions^{1}^{1}17th Workshop on Particle Correlations and Femtoscopy, September 20–24, 2011, Tokyo
[12pt]
Michiel B. De Kock, Hans C. Eggers,
Department of Physics, University of Stellenbosch, ZA–7600 Stellenbosch, South Africa
[12pt]
Tamás Csörgő
Wigner RCP, RMKI, H–1525 Budapest 114, P. O. Box 49, Hungary
1 Bayes factors
The Bayesian definition of probability differs radically from the conventional “frequentist” one, necessitating the overhaul of many concepts and techniques used in statistics and its applications. Since its introduction in 1900 [1], the statistic has become the standard criterion for goodness of fit in physics and many other disciplines, while Laplace’s Bayesian approach [2] remained largely forgotten until revived by Jeffreys [3]. Later refinements such as the Maximum Likelihood occupy a middle ground between the two approaches.
In this contribution, we demonstrate the use of one Bayesian technique in the simple context of fitting or, more generally, the quantitative assessment of evidence in favour of a hypothesis as a description of given data, compared to a rival hypothesis . We do so by analysing the concrete example of binned data for the correlation function in the fourmomentum difference as published recently by the L3 Collaboration [4].
Suppose we have data consisting of measurements of particle fourmomentum differences, assumed to be mutually independent as is customary in femtoscopy. Typically, the experimentalist will want to test how well various parametrisations fit the data. For the purposes of Bayesian analysis, a given parametrisation with free parameters is considered a “model” or “hypothesis” . The starting point is the odds in favour of model compared to a different model ”, defined as the ratio , while the evidence for versus is the logarithm^{2}^{2}2We use ; other base units can be substituted as preferred. of the odds. Use of Bayes’ Theorem for both hypotheses yields
(1) 
The evidence of versus is therefore the same as the Bayes factor if there is no a priori reason to prefer above and therefore . A large Bayes factor says that the evidence for is stronger than the evidence for and vice versa. It can be written as a ratio of integrals over the respective parameter spaces of and ,
(2) 
Solving the highdimensional integrals will often be an arduous task. Fortunately, the independence of the measurements implies that the likelihood factorises into the product of likelihoods for individual data points, which by assumption have the same form,
(3) 
Due to the large exponent, even the slightest nonuniformity in will lead to the development of a strong peak in parameter space for the overall likelihood, situated at the maximum likelihood point . An asymmetric prior will shift the peak to a value , but it will not materially affect the width of the peak or its differentiability. Unless the shifted peak falls on a boundary of the parameter space or happens to be nondifferentiable, it can therefore be expanded around [5]:
(4) 
where is the Hessian of the expansion
(5) 
and is the parameter covariance matrix. As more data is accumulated, the peak narrows so that we can neglect the fact that parameters may have finite ranges. Integrating the above as if it were a Gaussian, one obtains Laplace’s result [2]
(6) 
which under the stated assumptions is a good approximation of the fullblown integral appearing in Eq. (2) if . The Bayes factor becomes simply the difference
(7)  
(8) 
Evidence can be determined for any single model , but has no meaning on its own; only differences are meaningful in quantifying the probability for to be true compared to ,
(9) 
2 Relationship to and the Maximum Likelihood
The Bayesian results obtained above differ from the traditional Maximum Likelihood Estimate (MLE), which ignores the priors and approximates the integral (2) to the maxima of the likelihoods,
(10) 
The traditional goodnessoffit is related to the above as follows. The measurements are binned into bins with bin midpoints , yielding the histogram version of the data, with . The most general “parametrisation” of the histogram contents is then the multinomial with the set of Bernoulli probabilities with degrees of freedom,
(11) 
which on use of the Stirling approximation becomes, up to a normalisation constant,
(12) 
Expanding the free parameters around the measured data and truncating
(13) 
we can identify the multinomial quantities with the measured correlation functions at midbin points by setting^{3}^{3}3 is an arbitrary large integer to ensure that is an integer. As it eventually cancels out, its size is immaterial. , , and . The in the denominator is almost equal to the measured bin variances so that the quadratic term is
(14) 
where , which includes all the constants, is the unnormalised parametrisation for in common use. Comparing this to the usual definition
(15) 
we see that the maximum likelihood is approximately equal to
(16) 
so that is seen to be an approximation of the Bayes formulation, using only a single point in the parameter space and thereby effectively assuming a uniform prior. Furthermore, truncates the expansion of (13); this is probably the approximation most vulnerable to criticism.
3 Parametrisations and Lévybased polynomial expansions
We now apply the above general ideas to the specific case of the various parametrisations shown in Table 1 for the correlation function data for twojet events published by the L3 Collaboration [4]. Hypotheses to are taken from the L3 paper. Realising that it is important to quantify the degree of deviation of BoseEinstein correlation data from the Gaussian or the exponential shape, the L3 Collaboration also studied a “Laguerre expansion” as well as the symmetric Lévy source distribution, characterized by the stretchedexponential correlation function of hypothesis . In and , we propose a new expansion technique that measures deviations from in terms of a series of “Lévy polynomials” that are orthogonal to the characteristic function of symmetric Lévy distributions, generalising the results presented in Ref. [6].
(17) 
where . These reduce, up to
a normalisation constant, to the Laguerre polynomials for
. Figure 1 displays two examples for various values of
. Polynomials cannot be both orthogonal and derivatives for
transcendental weight functions [9], and therefore in
and we also investigated nonorthogonal derivative functions of
the stretched exponential^{4}^{4}4Note the absence of the
longrange correction term. L3 demonstrated that
this term vanishes if the dip, the nonpositive definiteness of
, is taken into account by the parametrisation elsewhere,
e.g. by the cosine in and by the firstorder polynomials in
and , resulting in values consistent with
zero..
Hypothesis  Functional form  

Gauss  4  
Stretched Exponential  5  
Simplified model  5  
1storder Lévy polynomial  5  
3rdorder Lévy polynomial  6  
1storder derivative  5  
3rdorder derivative  6 
[8pt] Table 1: Summary of parametrisations tested
4 Application to L3 binned data
In Table 2, we show the results of applying the Laplace approximation (6) to the L3 twojet data, which is provided in terms of 100 binned values for the correlation function together with standard errors in the range GeV. Throughout, we used a Gaussian prior with a width which was determined by numerical integration over one of the L3 data points. To illustrate the contributions of the likelihood, prior and determinant factors entering in (8), we have listed their logarithmic contributions separately in the three columns headed L, P and F. These quantities are therefore the building blocks for calculating the odds between any two competing hypotheses. Thus one can, for example, deduce that the odds for compared to are . Also included in Table 2 are the traditional measure (C) and its associated confidence level (CL).
Hypothesis  L  P  F  C  CL  

Gauss  4  177.  8  3.  6  32.  2  206.  5  2.  57  3.  4%  
Stretched Exponential  5  138.  5  0.  5  34.  0  172.  0  2.  02  1.  5%  
Simplified model  5  68.  2  3.  4  37.  0  101.  8  1.  00  49.  1%  
1storder Lévy polynomial  5  66.  2  2.  2  30.  3  98.  8  0.  97  57.  3%  
3rdorder Lévy polynomial  6  65.  9  3.  8  41.  6  111.  3  0.  97  55.  7%  
1storder derivative  5  67.  3  4.  2  29.  1  100.  6  0.  98  53.  0%  
3rdorder derivative  6  60.  4  4.  9  31.  7  97.  0  0.  89  77.  0% 
[12pt] Table 2: Results of fitting parametrisations listed in Table 1. Legend: L P C F CL confidence level
It is inappropriate to generalise conclusions based on one specific dataset with its specific circumstances. The fact that in the twojet L3 data the correlation function drops well below 1.0 for GeV, for example, is probably the dominant influence on the goodness of fit. Under this caveat, we make the following observations regarding the results shown in Table 2:

At first sight, the Bayes factor and the methodologies deliver judgements which are rather similar: is consistently ranked best, while and are ranked worst (least likely). The two methodologies yield vastly different numbers when one hypothesis is bad. As shown below, there are surprising variations even among the better ones.

The determinant plays an important role. For example, factor F for is significantly larger than that of similar models and even though the three log likelihoods are similar. This can be traced to the fact that the uncertainty in the parameters for is larger, as expressed in the width of its Gaussian (4). While , based only on the likelihood, can hardly distinguish between and , the contribution of the large determinant ensures that the Bayesian odds for versus are 5800:1. In other words, by taking into account not only the best parameter values but also their uncertainties, the Bayes factor could distinguish what could not.

Our Bayes factor calculation takes the experimental standard errors into account by using (14) in the exponent of the likelihood; in other words, we assume that they are Gaussian. We can improve on this approximation by doing a more complete Bayesian analysis using not the binned data but the pair momenta themselves.

As Fig. 1 shows, the Lévy polynomials introduced here are well suited to describe onesided stronglypeaked data. It may be helpful to use them, as we have done here, merely as part of parametrisations of data to which they show some resemblance. More systematic use in GramCharlier or other expansions will be faced with issues inherent in all asymptotic series [7, 8].
5 Conclusions

In hypotheses to , we have presented new techniques to study deviations from a stretched exponential or Fouriertransformed Lévy shape. Details will be published elsewhere.

The standard measures of fit quality like or CL are useful in rejecting models which are inconsistent with a given dataset. Where two or more models are consistent with the data, however, they are unable to select the more probable. The Bayes factor (9) permits quantification of the evidence (relative probability) for the validity of models.

Besides the likelihood, the prior and determinant also play a role, sometimes decisively so.

By integrating over parameter space, Bayesian evidence takes into account all possible values of the parameters, while and Maximum Likelihood do not.

Bayes factors depend linearly on the two priors. This is good in that they are made explicit, but bad in the sense that results can and do change depending on the choice of priors.

The omission of priors in is to its disadvantage as it discards important information.

It may appear that does not need any alternative hypothesis to be of use. This is not so, however: the alternative implicit in is the “Bernoulli class” of multinomials [10].
Acknowledgements: We thank the L3 collaboration for making its results available electronically [4] and the organizers of WPCF 2011 for support and an excellent atmosphere. This work was supported in part by the South African National Research Foundation and by the Hungarian OTKA grant NK–101438.
References
 [1] K. Pearson, On a criterion that a given system of deviations …is such that it can be reasonably supposed to have arisen in random sampling, Phil. Mag. (5) 50 (1900) 157.
 [2] P.S. Laplace, Mémoires de Mathématique et de Physique, Tome Sixième (1774).
 [3] R. Jeffreys, Theory of Probability, Oxford University Press (1961).
 [4] L3 Collaboration, P. Achard et al., Test of the Model of BoseEinstein Correlations and Reconstruction of the Source Function in Hadronic boson Decay at LEP, Eur. Phys. J. C71 (2011) 1648 [arXiv:1105.4788], see also http://l3.web.cern.ch/l3/
 [5] R.E. Kass and A.E. Raftery, Bayes Factors, J. American Statistical Association 90 (1995) 773.
 [6] T. Csörgő and S. Hegyi, Model independent shape analysis of correlations in 1, 2 or 3 dimensions, Phys. Lett. B 489 (2000) 15.
 [7] M.B. de Kock, Gaussian and nonGaussianbased GramCharlier and Edgeworth expansions for correlations of identical particles in HBT interferometry, M.Sc., University of Stellenbosch (2009).
 [8] H.C. Eggers, M.B. de Kock and J. Schmiegel, Determining source cumulants in femtoscopy with GramCharlier and Edgeworth series, Mod. Phys. Lett. A 26 (2011) 1771 [arXiv:1011.3950].
 [9] A. Erdélyi et al., Higher Transcendental Functions, Vol. 2, McGrawHill, New York (1953).
 [10] E.T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press (2003).