1 Bayes factors
Abstract

The usual method of fit quality assessment is a special case of the more general method of Bayesian model comparison which involves integrals of the likelihood and prior over all possible values of all parameters. We introduce new parametrisations based on systematic expansions around the stretched exponential or Fourier-transformed Lévy source distribution, and utilise the increased discriminating power of the Bayesian approach to evaluate the relative probability of these models to be true representations of a recently measured Bose-Einstein correlation data in annihilations at LEP.

From to Bayesian model comparison and Levy expansions of Bose-Einstein correlations in reactions1117th Workshop on Particle Correlations and Femtoscopy, September 20–24, 2011, Tokyo

[12pt]

Michiel B. De Kock, Hans C. Eggers,

Department of Physics, University of Stellenbosch, ZA–7600 Stellenbosch, South Africa

[12pt]

Tamás Csörgő

Wigner RCP, RMKI, H–1525 Budapest 114, P. O. Box 49, Hungary

1 Bayes factors

The Bayesian definition of probability differs radically from the conventional “frequentist” one, necessitating the overhaul of many concepts and techniques used in statistics and its applications. Since its introduction in 1900 [1], the statistic has become the standard criterion for goodness of fit in physics and many other disciplines, while Laplace’s Bayesian approach [2] remained largely forgotten until revived by Jeffreys [3]. Later refinements such as the Maximum Likelihood occupy a middle ground between the two approaches.

In this contribution, we demonstrate the use of one Bayesian technique in the simple context of fitting or, more generally, the quantitative assessment of evidence in favour of a hypothesis as a description of given data, compared to a rival hypothesis . We do so by analysing the concrete example of binned data for the correlation function in the four-momentum difference as published recently by the L3 Collaboration [4].

Suppose we have data consisting of measurements of particle four-momentum differences, assumed to be mutually independent as is customary in femtoscopy. Typically, the experimentalist will want to test how well various parametrisations fit the data. For the purposes of Bayesian analysis, a given parametrisation with free parameters is considered a “model” or “hypothesis” . The starting point is the odds in favour of model compared to a different model , defined as the ratio , while the evidence for versus is the logarithm222We use ; other base units can be substituted as preferred. of the odds. Use of Bayes’ Theorem for both hypotheses yields

(1)

The evidence of versus is therefore the same as the Bayes factor if there is no a priori reason to prefer above and therefore . A large Bayes factor says that the evidence for is stronger than the evidence for and vice versa. It can be written as a ratio of integrals over the respective parameter spaces of and ,

(2)

Solving the high-dimensional integrals will often be an arduous task. Fortunately, the independence of the measurements implies that the likelihood factorises into the product of likelihoods for individual data points, which by assumption have the same form,

(3)

Due to the large exponent, even the slightest nonuniformity in will lead to the development of a strong peak in parameter space for the overall likelihood, situated at the maximum likelihood point . An asymmetric prior will shift the peak to a value , but it will not materially affect the width of the peak or its differentiability. Unless the shifted peak falls on a boundary of the parameter space or happens to be nondifferentiable, it can therefore be expanded around [5]:

(4)

where is the Hessian of the expansion

(5)

and is the parameter covariance matrix. As more data is accumulated, the peak narrows so that we can neglect the fact that parameters may have finite ranges. Integrating the above as if it were a Gaussian, one obtains Laplace’s result [2]

(6)

which under the stated assumptions is a good approximation of the full-blown integral appearing in Eq. (2) if . The Bayes factor becomes simply the difference

(7)
(8)

Evidence can be determined for any single model , but has no meaning on its own; only differences are meaningful in quantifying the probability for to be true compared to ,

(9)

2 Relationship to and the Maximum Likelihood

The Bayesian results obtained above differ from the traditional Maximum Likelihood Estimate (MLE), which ignores the priors and approximates the integral (2) to the maxima of the likelihoods,

(10)

The traditional goodness-of-fit is related to the above as follows. The measurements are binned into bins with bin midpoints , yielding the histogram version of the data, with . The most general “parametrisation” of the histogram contents is then the multinomial with the set of Bernoulli probabilities with degrees of freedom,

(11)

which on use of the Stirling approximation becomes, up to a normalisation constant,

(12)

Expanding the free parameters around the measured data and truncating

(13)

we can identify the multinomial quantities with the measured correlation functions at mid-bin points by setting333 is an arbitrary large integer to ensure that is an integer. As it eventually cancels out, its size is immaterial. , , and . The in the denominator is almost equal to the measured bin variances so that the quadratic term is

(14)

where , which includes all the constants, is the unnormalised parametrisation for in common use. Comparing this to the usual definition

(15)

we see that the maximum likelihood is approximately equal to

(16)

so that is seen to be an approximation of the Bayes formulation, using only a single point in the parameter space and thereby effectively assuming a uniform prior. Furthermore, truncates the expansion of (13); this is probably the approximation most vulnerable to criticism.

3 Parametrisations and Lévy-based polynomial expansions

We now apply the above general ideas to the specific case of the various parametrisations shown in Table 1 for the correlation function data for two-jet events published by the L3 Collaboration [4]. Hypotheses to are taken from the L3 paper. Realising that it is important to quantify the degree of deviation of Bose-Einstein correlation data from the Gaussian or the exponential shape, the L3 Collaboration also studied a “Laguerre expansion” as well as the symmetric Lévy source distribution, characterized by the stretched-exponential correlation function of hypothesis . In and , we propose a new expansion technique that measures deviations from in terms of a series of “Lévy polynomials” that are orthogonal to the characteristic function of symmetric Lévy distributions, generalising the results presented in Ref. [6].

(17)

where . These reduce, up to a normalisation constant, to the Laguerre polynomials for . Figure 1 displays two examples for various values of . Polynomials cannot be both orthogonal and derivatives for transcendental weight functions [9], and therefore in and we also investigated nonorthogonal derivative functions of the stretched exponential444Note the absence of the long-range correction term. L3 demonstrated that this term vanishes if the dip, the non-positive definiteness of , is taken into account by the parametrisation elsewhere, e.g. by the cosine in and by the first-order polynomials in and , resulting in values consistent with zero..

Hypothesis Functional form
Gauss 4
Stretched Exponential 5
Simplified -model 5
1st-order Lévy polynomial 5
3rd-order Lévy polynomial 6
1st-order derivative 5
3rd-order derivative 6

[8pt] Table 1: Summary of parametrisations tested

Figure 1: Lévy polynomials of first and third order times the weight function for .

4 Application to L3 binned data

In Table 2, we show the results of applying the Laplace approximation (6) to the L3 two-jet data, which is provided in terms of 100 binned values for the correlation function together with standard errors in the range GeV. Throughout, we used a Gaussian prior with a width which was determined by numerical integration over one of the L3 data points. To illustrate the contributions of the likelihood, prior and determinant factors entering in (8), we have listed their logarithmic contributions separately in the three columns headed L, P and F. These quantities are therefore the building blocks for calculating the odds between any two competing hypotheses. Thus one can, for example, deduce that the odds for compared to are . Also included in Table 2 are the traditional measure (C) and its associated confidence level (CL).

Hypothesis L P F C CL
Gauss 4 177. 8 -3. 6 32. 2 206. 5 2. 57 3. 4%
Stretched Exponential 5 138. 5 -0. 5 34. 0 172. 0 2. 02 1. 5%
Simplified -model 5 68. 2 -3. 4 37. 0 101. 8 1. 00 49. 1%
1st-order Lévy polynomial 5 66. 2 2. 2 30. 3 98. 8 0. 97 57. 3%
3rd-order Lévy polynomial 6 65. 9 3. 8 41. 6 111. 3 0. 97 55. 7%
1st-order derivative 5 67. 3 4. 2 29. 1 100. 6 0. 98 53. 0%
3rd-order derivative 6 60. 4 4. 9 31. 7 97. 0 0. 89 77. 0%

[12pt] Table 2: Results of fitting parametrisations listed in Table 1. Legend: L P C F CL confidence level

It is inappropriate to generalise conclusions based on one specific dataset with its specific circumstances. The fact that in the two-jet L3 data the correlation function drops well below 1.0 for GeV, for example, is probably the dominant influence on the goodness of fit. Under this caveat, we make the following observations regarding the results shown in Table 2:

  1. At first sight, the Bayes factor and the methodologies deliver judgements which are rather similar: is consistently ranked best, while and are ranked worst (least likely). The two methodologies yield vastly different numbers when one hypothesis is bad. As shown below, there are surprising variations even among the better ones.

  2. The determinant plays an important role. For example, factor F for is significantly larger than that of similar models and even though the three log likelihoods are similar. This can be traced to the fact that the uncertainty in the parameters for is larger, as expressed in the width of its Gaussian (4). While , based only on the likelihood, can hardly distinguish between and , the contribution of the large determinant ensures that the Bayesian odds for versus are 5800:1. In other words, by taking into account not only the best parameter values but also their uncertainties, the Bayes factor could distinguish what could not.

  3. Our Bayes factor calculation takes the experimental standard errors into account by using (14) in the exponent of the likelihood; in other words, we assume that they are Gaussian. We can improve on this approximation by doing a more complete Bayesian analysis using not the binned data but the pair momenta themselves.

  4. As Fig. 1 shows, the Lévy polynomials introduced here are well suited to describe one-sided strongly-peaked data. It may be helpful to use them, as we have done here, merely as part of parametrisations of data to which they show some resemblance. More systematic use in Gram-Charlier or other expansions will be faced with issues inherent in all asymptotic series [7, 8].

5 Conclusions

  1. In hypotheses to , we have presented new techniques to study deviations from a stretched exponential or Fourier-transformed Lévy shape. Details will be published elsewhere.

  2. The standard measures of fit quality like or CL are useful in rejecting models which are inconsistent with a given dataset. Where two or more models are consistent with the data, however, they are unable to select the more probable. The Bayes factor (9) permits quantification of the evidence (relative probability) for the validity of models.

  3. Besides the likelihood, the prior and determinant also play a role, sometimes decisively so.

  4. The Laplace approximation (4) is usually fairly accurate, but the assumption of Gaussian errors for count data (13), which is made by truncation of the Taylor expansion in the data, is of dubious quality.

  5. By integrating over parameter space, Bayesian evidence takes into account all possible values of the parameters, while and Maximum Likelihood do not.

  6. Bayes factors depend linearly on the two priors. This is good in that they are made explicit, but bad in the sense that results can and do change depending on the choice of priors.

  7. The omission of priors in is to its disadvantage as it discards important information.

  8. It may appear that does not need any alternative hypothesis to be of use. This is not so, however: the alternative implicit in is the “Bernoulli class” of multinomials [10].

Acknowledgements: We thank the L3 collaboration for making its results available electronically [4] and the organizers of WPCF 2011 for support and an excellent atmosphere. This work was supported in part by the South African National Research Foundation and by the Hungarian OTKA grant NK–101438.

References

  • [1] K. Pearson, On a criterion that a given system of deviations …is such that it can be reasonably supposed to have arisen in random sampling, Phil. Mag. (5) 50 (1900) 157.
  • [2] P.S. Laplace, Mémoires de Mathématique et de Physique, Tome Sixième (1774).
  • [3] R. Jeffreys, Theory of Probability, Oxford University Press (1961).
  • [4] L3 Collaboration, P. Achard et al., Test of the -Model of Bose-Einstein Correlations and Reconstruction of the Source Function in Hadronic -boson Decay at LEP, Eur. Phys. J. C71 (2011) 1648 [arXiv:1105.4788], see also http://l3.web.cern.ch/l3/
  • [5] R.E. Kass and A.E. Raftery, Bayes Factors, J. American Statistical Association 90 (1995) 773.
  • [6] T. Csörgő and S. Hegyi, Model independent shape analysis of correlations in 1, 2 or 3 dimensions, Phys. Lett. B 489 (2000) 15.
  • [7] M.B. de Kock, Gaussian and non-Gaussian-based Gram-Charlier and Edgeworth expansions for correlations of identical particles in HBT interferometry, M.Sc., University of Stellenbosch (2009).
  • [8] H.C. Eggers, M.B. de Kock and J. Schmiegel, Determining source cumulants in femtoscopy with Gram-Charlier and Edgeworth series, Mod. Phys. Lett. A 26 (2011) 1771 [arXiv:1011.3950].
  • [9] A. Erdélyi et al., Higher Transcendental Functions, Vol. 2, McGraw-Hill, New York (1953).
  • [10] E.T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press (2003).
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
359376
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description