Optimal modeling of 1D azimuth correlations in the context of Bayesian inference

Optimal modeling of 1D azimuth correlations in the context of Bayesian inference

Michiel B. De Kock    Hans C. Eggers Stellenbosch University and National Institute for Theoretical Physics (NITheP), ZA-7600 Stellenbosch, South Africa    Thomas A. Trainor CENPA 354290, University of Washington, Seattle, Washington 98195, United States
Version September 2015
Abstract

Analysis and interpretation of spectrum and correlation data from high-energy nuclear collisions is currently controversial because two opposing physics narratives derive contradictory implications from the same data—one narrative claiming collision dynamics is dominated by dijet production and projectile-nucleon fragmentation, the other claiming collision dynamics is dominated by a dense, flowing QCD medium. Opposing interpretations seem to be supported by alternative data models, and current model-comparison schemes are unable to distinguish between them. There is clearly need for a convincing new methodology to break the deadlock. In this study we introduce Bayesian Inference (BI) methods applied to angular correlation data as a basis to evaluate competing data models. For simplicity the data considered are projections of 2D angular correlations onto 1D azimuth from three centrality classes of 200 GeV Au-Au collisions. We consider several data models typical of current model choices, including Fourier series (FS) and a Gaussian plus various combinations of individual cosine components. We evaluate model performance with BI methods and with power-spectrum (PS) analysis. We find that the FS-only model is rejected in all cases by Bayesian analysis which always prefers a Gaussian. A cylindrical quadrupole is required in some cases but rejected for 0-5%-central Au-Au collisions. Given a Gaussian centered at the azimuth origin “higher harmonics” for are rejected. A model consisting of Gaussian + dipole + quadrupole provides good 1D data descriptions in all cases.

pacs:
25.75.-q, 25.75.Gz, 25.75.Nq, 25.75.Ld, 25.75.Bh
preprint: Version 2.3

I Introduction

A significant and persistent problem has emerged concerning models for high-energy nucleus-nucleus (A-A) collision data from the relativistic heavy ion collider (RHIC) and the large hadron collider (LHC). Distinct classes of data models with divergent physics implications are invoked to support two narratives: a high-energy physics (HEP)/jets narrative in which the essential phenomenon is dijet production ua1 (); kll (); sarc (); hijing (); kn () and a quark-gluon plasma (QGP)/flow narrative in which the essential phenomenon is a flowing dense QCD medium or QGP and dijets play no significant role perfliq1 (); perfliq2 ().

The HEP/jets narrative emerges spontaneously from an analysis program based on spectrum and correlation data models derived from the observed differential structure of available data ppprd (); hardspec (); porter2 (); porter3 (); axialci (); anomalous (). In contrast, models emerging from the QGP/flow narrative tend to rely on theoretical motivations coupled with data and information selection (e.g.  cuts, preferred A-A centralities, ratio measures) poskvol (); 2004 (); blastwave (); trigger (); staras (); starraa (). A comparison of RHIC results and interpretations is presented in Ref. review ().

For example, 2D angular correlations from high-energy nuclear collisions include only a few structures common to all collisions from - to central Au-Au at RHIC energies. A simple mathematical model of those structures describes almost all data accurately with no significant residual structure porter2 (); porter3 (); axialci (); anomalous (). No theoretical assumptions motivated the data model. Three of the four principal model elements have been interpreted post facto as representing dijet production and projectile-nucleon dissociation jetspec (); jetspecth (); pptheory (). Interpretation of the fourth element, an independent azimuth quadrupole, remains in question azimuth1 (); gluequad (); quadspec (); davehq (); davehq2 (); nov2 (); nohydro (). Differential analysis of hadron  spectra reveals two components modeled by simple functions ppprd (); hardspec (). One component is identified with fragments from dijets described quantitatively by QCD calculations fragevo (). Most spectrum and correlation structures appear to be consistent with the HEP/jets narrative. Alternative models motivated by the QGP/flow narrative include quantity [Fourier coefficient of function fitted to 1D projections of 2D angular correlations] interpreted to represent elliptic flow poskvol (); 2004 (), a blast-wave spectrum model interpreted to measure radial flow blastwave (), spectrum ratio interpreted to indicate jet quenching within a dense QCD medium starraa (), and dihadron correlation analysis via background subtraction interpreted to represent jet structure staras (); trigger (); tzyam (). “Higher harmonic” flows have been inferred recently from azimuth distributions via Fourier-series models gunther (); luzum (); lhcharm ().

The same underlying particle data are therefore characterized and interpreted with competing mathematical models applied to different data selections, variables and measured quantities. Judgments on the validity and relative merits of competing data models have relied historically on comparisons of minimum- values and qualitative arguments based on consistency of a given narrative across selected measured quantities. While such an approach might suffice when the underlying physical processes and models are simple, the complexity of A-A phenomenology and lack of consistent quantitative criteria have impeded progress in resolving conflicts.

To address this problem we require a formal context in which competing data models are evaluated on a statistically sound basis, and a “best” model may be selected that either does not rely on unspoken a priori physics assumptions or renders such assumptions quantifiable. We suggest that this context exists in the form of Bayesian Inference (BI) which provides both a formal mathematical framework and the necessary concepts to represent prior knowledge, evaluate candidate models for different parameter values and thereby establish value judgments on models as a whole bayes1 (); bayes2 (); bayes3 (); mackay (). Each data model is rated not only by how well it describes some data or how much data it describes well, but also by the “cost” of the model in terms of complexity and parameter number (Occam penalty) and associated physical assumptions.

In this study we focus on 1D projections onto azimuth of 2D angular correlations reported in Ref. anomalous (), currently one of the most contentious areas of RHIC/LHC data analysis. We consider several popular data models and evaluate them according to BI methods to determine whether a uniquely preferred data model can be established without recourse to a priori physics assumptions.

This article is arranged as follows: Section II presents the basics of Bayesian Inference. Section III describes Fourier power spectra (PS) and their properties. Section IV summarizes analysis methods applied to correlation data. Section V introduces the correlation data used for this study. Sections VI, VII and VIII apply BI and PS methods to azimuth projections from three centralities of 200 GeV Au-Au collisions. Section IX presents systematic-uncertainty estimates. Sections X and XI present discussion and summary. Appendices A and B consider the geometry of BI analysis and periodic peak arrays respectively.

Ii Bayesian inference

Bayesian Inference addresses the problem of relating parametrized model functions to available data in an optimal manner. Given specific data values the best set of parameter values for each model is determined based on the likelihood function. Several models are then compared based on each model’s evidence, an integral measure defined below. The most plausible and therefore preferred data model produces the largest evidence value.

ii.1 The measure and model fits to data

In the present study we focus on aspects of Bayesian Inference that correspond directly with the methodology of minimization. Given a set of data points with experimentally determined standard errors on the conventional statistic evaluating the goodness of fit of model function with parameters is

(1)

As stated in Ref. dekock14 () the measure assumes a Gaussian distribution of data-sample fluctuations about mean values which we accept as a reasonable approximation for 1D RHIC/LHC data projections. In what follows model functions are represented by , a vector function mapping parameter space to data space .

Most model comparisons are based on /DoF, where the number of fit degrees of freedom (DoF) is assumed to be the number of data points minus the number of free model parameters . Minimizing without considering the fit DoF is clearly misleading since there are infinitely many models with free parameters that might describe the same data points with jeffreys (). We require a mechanism to penalize excess model parameters such that a simple few-parameter model that describes the data well may be favored over more-complex models. That mechanism exists in the form of Bayesian Inference.

ii.2 Logical and rational inference

Distinction may be drawn between logical inference on the one hand, in which nominally-valid conclusions are drawn via a logical chain of argument from premises assumed to be true and rational inference on the other, in which patterns or events (i.e. data) are used to improve our understanding of the physical system, either augmenting or displacing previous understanding. Both the acquired data and the modified understanding may be uncertain to some degree as measured by probabilities. Rational inference includes induction, in which newly-acquired data are employed to formulate or refine a model, and deduction in which a fixed model is used to predict values of data not yet acquired inference ().

Bayesian Inference is a formal recipe for rational inference based on Bayes’ theorem jeffreys (); jaynes (). “Understanding” in this context means that reality in the form of data or data-derived quantities is well described by a parametrized model. A given set of parameter values predicts a specific set of possible data values. Previous understanding including uncertainties is represented by the prior, a probability distribution function (PDF) on possible parameter values. As new data are acquired BI provides a means to update the PDF on model parameters to effect improved understanding in the form of the posterior PDF, thereby refining the model by reducing the volume of its parameter space or falsifying the model altogether if the new data fall outside the model’s predicted data volume.

ii.3 The probability chain rule and Bayes’ theorem

Bayesian Inference is based on relations among joint, conditional and marginal PDFs and related unnormalized functions distributed on data and model-parameter spaces kendall (). External factors common to all models that may influence the inference process are represented by a comprehensive parameter set suppressed below. Our notation follows that in Refs. mackay () and jaynes ().

A model is defined by a joint PDF , where and are multidimensional spaces representing model-parameter values and data values. The corresponding conditional PDFs are and , and the marginal PDFs are and . The probability chain rule provides factorizations in the form . Bayes’ theorem (BT) can then be expressed in either of two forms

(2)

both of which are valid descriptions of a joint PDF. However, only the first line is applicable to BI analysis that proceeds from specific data values to improved parametrized data model, a unique BT application.

ii.4 Prior and posterior PDFs – model fits

As applied to BI analysis some quantities in the first line of Eq. (2) must be defined more specifically. In this application quantity is not a variable on the space of all possible data; it is a specific set of data values with uncertainties or errors . Factor , a normalized conditional PDF on data space , is redefined as the likelihood function on parameter space for model given specific data and model function . is the prior PDF on model parameters determined before data are available. is the posterior PDF on parameters given the new data. Denominator , also a PDF on space , is redefined as the evidence (a number) for model given specific data which we denote by the symbol . With those more-specific definitions the version of Bayes’ Theorem used for BI is

(3)

which can be read as “A posterior PDF on is derived from a prior PDF given data , likelihood L and evidence E.” Any change between prior and posterior represents information acquired by the model from the data. The result is an updated PDF on model parameters determined by newly-acquired specific data values . The posterior PDF on parameters provides considerably more information about the model than the best-fit parameter set and uncertainties derived from conventional model fits to data.

ii.5 Model comparisons and evidence

Beyond determining posterior PDFs on parameters Bayes’ Theorem can be used on a higher level for comparisons among competing data models in the form

(4)

where is the plausibility of model given data values and is the prior model probability within some assumed context represented by (suppressed). The main goal of this study is comparison of competing model functions with all other BI elements maintained as similar as possible.

Evidence is just a normalization parameter in Eq. (3), but its absolute numerical value is important for model comparisons. Because the likelihood is usually a peaked function on with single mode near some optimal parameter values the evidence defined in the first line below can be represented by Laplace’s approximation in the second line laplace ()

where is the maximum likelihood and is the covariance matrix for model function with parameters. The negative log evidence is

(6)

with usual parameter, and information is defined by

(7)

the information gained by model from specific data . Information is the log of a volume ratio as discussed in the next subsection. In general decreases and increases as parameter-number index increases. The sum should then have a minimum corresponding to the maximum evidence for a specific model. For an optimized predictive model (e.g. a theory) and fit DoF (= data DoF minus model DoF ).

Quick and easy comparisons between two models and can be obtained by calculating the evidence ratio , also known as an odds ratio. Assuming equal model priors the Bayes Factor is bayes3 (); dekock11 ()

(8)

Comparisons among more than two models indexed by are effected by

(9)

where replaces in Eq. (4). The model priors could be set equal assuming ignorance, but in practice assigned model priors may differ sharply among competing models, possibly reflecting strong prejudices.

Our use of differences in log Evidence (Bayes factors) rather than isolated values is consistent with the use of Likelihood ratios (e.g. Neyman-Pearson approach). Evidence ratios are an improvement on Likelihood ratios because the latter assume delta-function priors.

ii.6 Bayesian priors and Information

Information is generally defined as the logarithm of a volume ratio, the volumes being subsets of some space of alternatives before and after a message (data) is received conveying information. For instance, if a message reduces the number of possible alternatives by factor 2 then the amount of information received is : one “bit” of information is provided by the message. Several definitions of information have been formulated (e.g. Shannon, Rényi), and the precise correspondence to a volume ratio varies from case to case. In some cases the terms “information” and “entropy” may be used interchangeably such that for example “information gain” may represent the difference between two entropies.

In Eq. (7) factor is related to the prior volume of a model parameter space and approximates the posterior volume . Thus, information is defined here as the natural log of the prior volume over the posterior volume. A prior PDF based on ignorance (uniform or translation-invariant probability within some assumed boundaries for each parameter) is estimated by the product

(10)

where the estimated for amplitude parameters may be based on differences of data extreme values, but the prior for angle parameters depends on circumstances. In this study the condition is based on the definition of the same-side peak at the azimuth origin.

Since typical correlation-structure amplitudes (e.g. peak-to-peak excursions) are generally and given the assumed constraint on the Gaussian width we assign for those cases. Given certain algebraic relations it is reasonable to assume that cosine coefficients and uncertainties may be substantially smaller on average than the Gaussian amplitude and width. For all cosine components in any model we assign . Given those assignments the basic Model (defined below) is somewhat disadvantaged (smaller prior probability) compared to models based only on cosine terms. Further discussion of prior construction is found in Ref. dekock14 ().

The posterior volume is obtained from the determinant of the covariance matrix which, in the absence of significant covariances, is the product of the variances for the several model parameters. Its square root is then the product of r.m.s. widths on parameters, the posterior volume. In this study the Hessian (matrix of second-order derivatives at maximum of the Likelihood function derived from data ) is obtained, and the covariance matrix is constructed from the Hessian elements.

The information defined in Eq. (7) permits a quantitative expression of Occam’s razor in two ways: (a) For a model with a large prior volume in parameter space (representing many “causes”, some possibly unnecessary) a substantial reduction in the parameter volume on encountering data automatically incurs an Occam penalty by means of larger . (b) The -dependence of implies that while models with more parameters may have a smaller and larger likelihood, the extra parameters are also penalized by increased resulting in reduced overall model plausibility.

Iii Fourier power spectrum

The Fourier power spectrum (PS) is an alternative information measure well understood in the context of signal processing. Comparison of PS results with BI analysis may better convey the technical details and interpretations of the latter.

The Wiener-Khinchin theorem wk () states that the Fourier transform of a two-particle autocorrelation is the corresponding power spectrum of an underlying single-particle distribution. Data autocorrelations with elements are periodic, symmetrized about 0 and and described by a PS with and . The PS expansion of autocorrelation data

might be viewed as a model function from which the power-spectrum elements could be determined by model fitting. However, in this study the PS elements are obtained directly by integrating the data

Note that is the “total power” (with independent elements), and is the mean value of the 1D autocorrelation (which, for data histograms introduced below and used in this study, is set to zero).

The power spectrum for a sample sequence may contain a deterministic “signal” component and a random (white) noise component. The signal may be localized at smaller wave number (index ), while an approximately flat white-noise spectrum is revealed at larger index values if the sample rate or bin number (resolution) is large enough (see Nyquist frequency limit below). The white-noise amplitude should correspond to the estimated statistical (Poisson) error used in fits to a sample sequence and to the r.m.s. error inferred from fit residuals.

The Nyquist limit applied to periodic azimuth implies that the power spectrum must be symmetric about the bin on containing . For there are then independent PS elements (including ) and 13 unique autocorrelation data bins whose contents may be correlated by one or more parent processes. For the broader correlation structures considered here the bin number, and therefore the Nyquist limit, is adequate. For the narrower BE/electron peak (defined below) the bin number (hence angle resolution) is insufficient, but that structure is not important for this analysis.

Power spectra PS should be distinguished from Fourier series (FS-only) fit models. A PS consisting of elements evaluated for all index values completely characterizes a data autocorrelation. FS-only models have a varying number of elements indexed by , being the number of parameters for a model.

Iv Analysis methods

High-energy nuclear collisions at the RHIC and LHC produce hadrons in each collision ranging in number from a few to thousands (depending on collision centrality) via several physical mechanisms. By studying properties of hadron yields, spectra and correlations we seek to identify and characterize the various underlying mechanisms. In this study we apply BI methods to evaluate several mathematical models of 2D angular correlations projected to 1D azimuth. In this section we summarizes basic analysis methods that produce the angular correlation data and our strategy for BI evaluation of the data models.

iv.1 Kinematic variables and spaces

High-energy nuclear collisions are described efficiently within a cylindrical coordinate system where (relative to the collision axis) is the transverse momentum, is the azimuth angle from a reference direction and pseudorapidity is a measure of polar angle , the approximation being valid near (). A bounded detector angular acceptance is denoted by intervals () on the primary single-particle space .

In general, two-particle correlations are measured on the 6D space . -integral angular correlations are measured on the 4D space . Within a limited acceptance and over azimuth the angular correlation structure may be approximately invariant along a sum axis (stationarity). In that case averages along for each value of the corresponding difference variable comprise an autocorrelation . Angular correlations on are then measured as 2D densities without significant loss of information inverse ().

iv.2 A-A centrality measures

A-A collision centrality is measured by comparing a measured minimum-bias (MB) event distribution on charge multiplicity within some fiducial angular acceptance with a Glauber Monte Carlo model of A-A collisions producing MB distributions on nucleon participant number and - binary-collision number  powerlaw (). The intermediary is the A-A fractional cross section . For the data employed in this study centrality is designated by fractional cross section in percent, where 100% refers to extreme peripheral collisions and 0% refers to head-on collisions. For the data employed in this study collision events were sorted into eleven centrality bins: ten equal 10% centrality bins with the most-central 10% bin split into two 5% bins. The bins are numbered 0 (most peripheral) through 10 (most central). The three (corrected) centrality intervals used in this study are 0-5% (bin 10), 9-18% (bin 8) and 83-94% (bin 0).

iv.3 Correlation measures

Correlation structure is identified by comparing a 2D pair density with a reference density representing no significant correlations or some uninteresting background structure. can be based for instance on a factorization assumption () or a distribution of mixed pairs formed from different but similar sample events (). The difference should reveal correlation structure of interest.

Correlation structure may have several components arising from different collision mechanisms. Correlation amplitudes may vary with collision conditions in characteristic ways, for instance proportional to , , or some combination. As a placeholder we define a per particle measure since according to a factorization assumption, and is the mean single-particle charge density near the angular origin. Practically speaking the correlation measure is obtained as

(13)

where the ratio inside the square brackets reduces certain instrumental effects anomalous (). In what follows we refer to symbol to simplify notation.

A 2D autocorrelation in the form is a density defined with the prefactor . When integrated over the autocorrelation is a density on defined by prefactor . Integration of over the azimuth acceptance should then give since has the same pair number as by construction.

iv.4 Bayesian Inference strategy

For each 1D data histogram we construct a PS as a reference for BI analysis and identify within the PS the signal and noise components. PS structure can be related 1-to-1 with BI elements, helping to clarify interpretation of the latter. Based on results from Ref. anomalous () we compare the PS for a fitted 1D Gaussian with each data PS.

For each model function we obtain the minimum (maximum likelihood describing fit quality) and information (derived from priors and covariance matrix) from fits to data histograms. We obtain evidence for each model from a combination of minimum and information . Competition between and contrasts goodness of fit (via ) with quantitative assessment of model-parameter “cost” or Occam penalty (via ). One model function may achieve a quantitatively better fit to data than another model, but at the cost of extra model parameters that may favor the second model overall.

We emphasize that the number of data DoF in this study is small, only 11 for the projected 1D histograms analyzed here compared to the original 2D histograms with 169 DoF. The small number of data DoF presents unique challenges for data modeling and BI evaluation.

V Correlation data and models

The data we consider were published in the form of 2D binned histograms (autocorrelations) derived from 1.2M 200 GeV Au-Au collision events sorted into eleven centrality classes based on charged-particle multiplicity  anomalous (). Depending on centrality each collision event may include from a few to more than a thousand charged particles within the detector acceptance .

In the present study we consider 1D projections of the 2D histograms onto azimuth difference represented as to simplify notation. The histogram bin size on azimuth is ( 24 bins). The position variable is then with . The conjugate index for a PS (Sec. III) is . The argument of PS cosines is . The bin size has been optimized to match the observed correlation structure and provides sufficient resolution to retain all information in the data, as indicated for instance by the power spectrum in Fig. 3.

The 2D data are symmetrized on both and . Thus, only one quadrant of each 2D histogram is unique. The statistical errors on are uniform except for bins at 0 and where they are larger. The errors on are strongly varying due to the triangular pair acceptance on , with the largest errors at the acceptance edges . As noted, the 2D correlation histograms sum to zero by construction. We also adjust the 1D projections onto to zero sum leading to one less data DoF (12).

v.1 Correlation data histograms

Figure 1 (left panels) shows 200 GeV Au-Au 2D angular correlations for centrality bin 0 (83-94%, - collisions) and bin 10 (0-5%). Within the STAR TPC acceptance the -integral correlation data from Au-Au collisions include four principal components: (a) a same-side (SS) 2D peak at the origin on well approximated by a 2D Gaussian for all -integral data, (b) an away-side (AS) 1D peak on azimuth well approximated by an AS dipole for all data and uniform to a few percent on (having negligible curvature), (c) an azimuth quadrupole also uniform on to a few percent over the full angular acceptance of the STAR TPC, and (d) a narrow 1D peak on . There is also a sharp 2D exponential peak at (0,0). That phenomenological description does not rely on physical interpretations of the components.

(a)(b)
(d)

Figure 1: (Color online) Left: 2D angular autocorrelations from 200 GeV Au-Au collisions for (a) 83-94% (- collisions) and (c) 0-5% centralities. Right: Two-dimensional model fits to the histograms in the left panels obtained with Eq. (V.2).

Based on subsequent comparisons of observed data systematics with theory the components (a) and (b) together are interpreted to represent minimum-bias dijets fragevo (); anomalous (). Component (c) has been conventionally attributed to elliptic flow 2004 (). Component (d) is attributed to projectile-nucleon dissociation. And the 2D exponential is attributed to Bose-Einstein (quantum) correlations and charge-neutral electron pairs from photoconversions (denoted as the BE/electron peak).

v.2 Correlation data models

The fit methods employed here are based on the the non-Fisherian ansatz that data can be represented as the sum of a hypothesis (any competing data parametrization) plus noise. 2D histograms from Ref. anomalous () [e.g. Fig. 1 (a) and (c)] were fitted with a data model including several elements applicable to higher RHIC energies and all Au-Au centralities. The 11-parameter model is

The definitions of two parameters in that expression ( and ) are modified from those in Ref. anomalous ().

Figure 1 (right panels) shows typical 2D model fits with Eq. V.2 compared to corresponding data histograms in the left panels. The fit residuals are consistent with bin-wise statistical errors. The general evolution with centrality is monotonic increase of the SS 2D peak and AS dipole amplitudes (dijet structure), substantial increase of the SS peak width, rapid decrease to zero of the 1D Gaussian on (soft component) axialci (); ptscale (); anomalous () and non-monotonic variation of the quadrupole amplitude davehq ().

For the present 1D study we develop simplified versions of the 11-parameter model. In more-central Au-Au collisions the soft component () falls to zero amplitude, and the BE/electron component () becomes very narrow anomalous (). A 2D model applicable to more-central Au-Au collisions then has 6 parameters

(15)

The BE/electron component remains significant in a few bins near the origin that can be removed from the fits.

Projection onto 1D azimuth represents large information reduction. The full 2D histogram with bins includes 169 independent bins (one independent quadrant due to symmetrization), whereas 1D projections include at most 13 independent bins. A simplified model derived from the 2D data model but applicable to projected 1D azimuth correlations in more-central A-A collisions includes 5 parameters defined to be consistent with the PS introduced in Sec. III

(16)

A further simplification is possible for the most-central (0-5%) bin. The quadrupole amplitude for that centrality is observed to be consistent with zero davehq (); davehq2 (). The 1D model then includes only 4 parameters

(17)

Integrating Eqs. (17) and (III) with differential factor gives

(18)

The 1D data histograms have been adjusted to insure . A fit to bin-10 data with Eq. 17 determines an offset value . With other fitted parameter values we obtain

(19)

The four-parameter 1D model can then be further reduced to a three-parameter model defined by

(20)

where each of two model components integrates to zero over . We therefore replace Eq. (17) with Eq. (20) referred to below as the “basic Model.”

Since all data histograms are corrected to to remove the offset DoF the adjusted 13-bin 1D data histograms have 12 independent DoF. But the bin at is removed from all model fits to exclude the BE/electron component, reducing the effective data DoF to 11.

The AS dipole component is the limiting case of an AS Gaussian peak array (see App. D for details). The r.m.s. peak width () is large enough that only the AS dipole term of the PS representation survives.

We define alternative data models by adding to the basic Model of Eq. (20) successive cosine terms of the form , where for quadrupole, sextupole and octupole (). We also define independent “FS-only” models as truncated Fourier series with cosine terms and no other components.

Vi Bin-10 0-5% Azimuth correlations

We first apply BI methods to the 1D azimuth projection from 0-5% central 200 GeV Au-Au collisions. We fit the data with the basic Model and obtain the data PS. We determine and information for FS-only models vs parameter number . We then evaluate evidence for several competing models and determine the posterior model probabilities.

vi.1 1D azimuth projection

Figure 2 shows a projection of the 2D data histogram from 0-5% central 200 GeV Au-Au collisions onto 1D (points). 24 bins are shown but only 13 are unique due to symmetrization about zero and . Estimated statistical errors have been multiplied by factor 2 to make them visible (extend outside the points). Errors are a factor larger for the bins at 0 and because of symmetrization of the data about those bins. The bin at zero also includes a significant contribution from BE/electrons not included in the models used for this exercise and is therefore excluded from all fits. The bin at includes a small excess due to a tracking-geometry distortion accommodated in some model fits by addition of a “delta function.”

Figure 2: (Color online) 1D projection onto azimuth (points) from the 2D data histogram for 0-5% central 200 GeV Au-Au collisions in Fig. 1 (c). Statistical errors at 0 and are larger than the others due to symmetrization of data on the periodic variable. The bin-wise statistical errors 0.0037 have been multiplied by 2 to make them visible outside the data points. The (red) dashed curve is obtained from a fit to the data with the basic Model of Eq. (20). A fit with an FS-only model including four or more terms would appear identical on the scale of this plot. A similar remark applies to corresponding data plots for two other centrality bins.

A fit of the basic Model to data is shown by the dashed (red) curve. The fitted model parameters are , and with for fit DoF.

vi.2 Data power spectrum

Figure 3 shows the PS (points and blue solid curve) as a Fourier transform of the data autocorrelation in Fig. 2 using Eq. (III). The general structure includes a signal component at smaller wave number and a flat (on average) white-noise spectrum at larger wave number corresponding to the r.m.s. statistical error in the data histogram. The noise-spectrum mean is about 0.001 (dotted line).

Figure 3: (Color online) Power spectrum values (points) derived from the data in Fig. 2 via Eq. (III). The (red) dashed curve is the Gaussian PS described by Eq. (21) with width and amplitude corresponding to the fitted Gaussian in Fig. 2. Interval is consistent with a “white-noise” power spectrum (dotted line) representing the statistical noise in Fig. 2.

To aid interpretation of the data PS we include the predicted PS for a 1D Gaussian (red dashed curve) with amplitude and width derived from the basic-Model fit in Fig. 2. The PS amplitudes for a unit-amplitude periodic Gaussian peak array on are given by (App. D)

(21)

As the Gaussian peak width increases the number of significant signal terms in the PS decreases. The Gaussian PS coincides with the data PS for , and the data PS for is consistent with statistical noise. The data PS element for includes a negative contribution from the AS peak (dipole).

We can assess the quality of the basic-Model data description by determining the PS of the residuals, not of (data Model) but of (data Gaussian) only. The PS of the residuals should be equal to the PS difference in Fig. 3 according to the linearity of Eq. (III).

Figure 4: (Color online) The PS for residuals in the form (data Gaussian) from Fig. 2 consistent with the white-noise part of the PS in Fig. 3, with mean approximately 0.001. The negative PS value for (not shown) corresponds to the AS dipole amplitude from the basic-Model fit in Fig. 2.

Figure 4 shows the PS for (data Gaussian) referring to the fitted Gaussian in Fig. 2. The PS values for are consistent with the white-noise spectrum. The value for (not shown) is consistent with the fitted dipole amplitude. From this PS study we have a first indication that the basic Model is sufficient to describe the bin-10 1D azimuth projection.

vi.3 Bayesian model fits with Fourier series

We next apply BI methods to FS-only models of the data histogram in Fig. 2 to establish a BI reference. In this application the number of parameters represents the largest value of FS index for a given FS-only model. Varying represents different FS data models. We obtain the and information for each FS-only model.

Figure 5: (Color online) (upper solid curve and points) and information (dashed curve and points with uncertainty band) vs number of parameters for Fourier-series (FS-only) models. The sum (log Evidence, , dotted curve) is also included. The lower solid curve is values for fits to residuals (data basic Model) from Fig. 2 consistent with the trend expected for no signal (noise only) in the data.

Figure 5 shows the basic elements of BI model fits. The upper (blue) solid curve and points represent the log likelihood (LL) in the form or . The (red) dashed curve shows information representing the parameter cost (Occam penalty, Sec. II.6). The (black) dotted curve represents the sum (negative log evidence). The minimum for (and maximum for evidence ) occurs at indicating the FS-only model preferred by the data. That result is consistent with Fig. 3 indicating a FS-only model should exhaust the PS signal.

The trend indicates that the FS model components are ideally ordered on index for the signal in these specific data, and is similar to the idealized trend suggested in Fig. 5.1 of Ref. bayes2 (). The largest decreases occur for the smallest index values. The interval with larger (negative) slope at smaller corresponds to accommodation of the data signal with increasing . The interval with smaller slope at larger indicates that additional Fourier terms only accommodate statistical noise. The overall trend then matches the power-spectrum trend in Fig. 3. must go to zero when the number of data DoF (11 in this case). The lower solid curve represents the for fits to the residuals (data basic Model) from Fig. 2 (no signal present). The values are then consistent with the fit DoF .

vi.4 Bayesian model comparisons

We next extend BI methods to several data models with different combinations of elements and parameters compared to the previous FS-only exercise. We first compare alone, simulating a conventional model-fit exercise, then extend to comparisons of evidence .

Figure 6: (Color online) values vs number of parameters for several data models. The general trend is monotonic decrease with increasing number of model parameters, responding only to statistical noise with for .

Figure 6 shows values for various model fits to data. The FS-only description (blue points and line) achieves a substantial decrease for but no significant improvement with additional terms. The basic Model with three parameters (red solid square) has , somewhat in excess of the number of fit DoF . Addition of more cosines (quadrupole, sextupole, octupole) to the basic Model keeps pace with the FS noise trend with its reduced slope. In this conventional context the extra cosines seem to be required for competitive data description because they reduce the fitted , but at what cost?

The basic Model + quadrupole + sextupole + octupole with (open diamond) has the same as the FS-only model. As explained below in connection with Table 1 the additional cosine terms effectively displace the Gaussian part of the basic Model. The composite model then functions as a FS-only model with , but with increased cost in the Occam penalty.

Figure 7: (Color online) Negative log evidence vs number of parameters for several models. The basic Model (solid square) is strongly favored over all others (lowest ). The hatched band indicates the common uncertainty of priors assigned to cosine terms in all models. FS-only models for all (solid dots and line) are strongly rejected by the evidence.

Figure 7 shows negative log evidence for several models. Adding an Occam penalty in the form of information gained by each model reveals a different picture. The basic Model with has substantially smaller -2LE (larger evidence ) than other models where the cost of extra parameters is not justified by a compensating reduction in . The hatched band reflects the estimated uncertainty in (for the FS-only model) arising from the estimated priors.

Given that values for various models are similar () the large differences in among models must be dominated by information which depends on the covariance matrix and prior PDFs. It might be suggested that such differences arise mainly from the assignment of prior probabilities, but that is not the case. We apply the same prior to a given parameter or parameter class consistently across all models, so that uncertainties in I are strongly correlated across competing models and largely cancel when odds ratios are taken (see Sec. IX.3).

The trend vs for FS-only models arises from , whereas the trend for the basic Model plus additional cosine terms corresponds to . The difference in of 2.5 corresponds to a factor difference in parameter errors for the two models. Parameter errors for FS-only models are whereas errors for the basic Model plus cosine terms are , accounting for the factor 10-15 difference. As discussed in Sec. X.2 the large Occam penalty for FS-only models is mainly owing to smaller covariance-matrix elements (parameter errors).

Figure 8: (Color online) Normalized from Eq. (22) for several models indexed by . As in Fig. 7 the basic Model (solid square) is strongly favored over all other models while FS-only models for all are strongly rejected.

Figure 8 shows the plausibility (relative evidence) for each competing model in the form

(22)

that reveals the full selectivity of the BI method. For this exercise we assume that model prior probabilities are all equal (and therefore irrelevant). However, implicit assumptions do play a role in RHIC/LHC data modeling and physics interpretations.

The most plausible models are the basic Model (80%) and basic Model + quadrupole (15%). Large Occam penalties reduce competing additional multipole elements to a few percent or less. The model including an octupole (open diamond) leads to major fit instabilities and is rejected. With plausibilities of less than 1% FS-only models are also rejected. In terms of odds the basic Model is preferred over Model + quadrupole by :1, over Model + sextupole by :1 and over all FS-only models by :1.

As noted in Section II-E Bayesian comparisons among models are effected by taking ratios of evidences (odds ratios). Comparisons are visualized efficiently by corresponding differences on a log-evidence scale (Bayes factors) as in Fig. 7 and subsequent equivalent figures. Isolated absolute numbers are not relevant to our method.

vi.5 Model-fit results for bin 10

Table 1 summarizes the best-fit model-parameter values obtained from model fits (minimum ) emphasizing the basic Model (column 2) and successive additions of quadrupole, sextupole and octupole components, as well as a delta function at to accommodate a data artifact. The parameters are as defined in Sec. V.2. Also shown are and the BI parameters and .

parameter basic Model + + +
0.57 0.73 0.84 0.34 0.57
0.64 0.69 0.71 0.09 0.63
0.12 0.15 0.18 -0.003 0.115
-0.014 -0.025 0.064
0.005 0.024
0.005
0.005
12.5 9.7 10 9 11
28 34 38 44 36
40.5 42.5 48 53 47
Table 1: Bin-10 model parameters (minimum ) for several fit models: (a) basic Model, (b) basic Model plus quadrupole term , (c) previous plus sextupole term , (d) previous plus octupole term , (e) basic Model plus delta function at . The fit parameters are as defined in Sec. V.2.

Results for the basic Model are in good agreement with the published values from 2D model fits anomalous (). For this centrality the best-fit 2D parameters from the model of Eq. (V.2) are , , (consistent with the Gaussian integral), (), with / DoF = 2.6. Note that must be less than because of the curvature on of the SS 2D peak. The /DoF = 2.8 of the 2D model fit is substantially higher than that for the 1D fit with the basic Model [12.5 / (11 - 3) = 1.6] because of significant structure on (-modulated dipole) not described by the standard 2D data model of Eq. (V.2).

As cosine terms are added to the basic Model a conflict develops between the explicit Gaussian component and a sum of cosines approximating a competing Gaussian. The large parameter differences for “” vs “basic Model” columns are discussed further in Sec. X.3.

The column refers to the basic Model plus a free amplitude in the bin at (“delta function”). Compared to the basic Model alone there is reduction of by 1.5 but increase of information by 8 leading to overall increase of negative log evidence by 6.5. The additional model DoF is rejected by 25:1.

Vii bin-8 9-18% Azimuth correlations

In this second of three examples the statistical errors of the wider centrality bin are reduced by factor compared to the 0-5% centrality bin. The BE/electron peak is still narrow enough to remain within the single bin at zero. The quadrupole component is significant and positive, shifting the plausibility order of competing models.

vii.1 1D azimuth projection

Figure 9 shows a projection of the 2D data histogram from 9-18% central 200 GeV Au-Au collisions onto 1D (points). As for the previous centrality the bin at zero also includes a significant contribution from BE/electrons not included in the data models and is therefore excluded from the fits. The typical data r.m.s. statistical error is 0.0026, not visible outside the points on this scale.

Figure 9: (Color online) 1D projection onto azimuth (points) from the 2D data histogram for 9 - 18% central 200 GeV Au-Au collisions. The (red) dashed curve is a fit to the data with the basic Model of Eq. (20) plus independent quadrupole component . The bin-wise statistical errors are 0.0026, not visible outside the points.

A fit of the basic Model + quadrupole to data is shown by the dashed (red) curve. The fitted model parameters are , , and with for 11 - 4 = 7 fit DoF.

vii.2 Data power spectrum

Figure 10 shows the power spectrum (points and blue solid curve) derived from the data in Fig. 9. As for the 0-5% centrality bin we include the predicted power spectrum (red dashed curve) for a 1D Gaussian (SS peak) with amplitude and width parameters derived from the fit to data in Fig. 9. The data PS is again consistent with statistical noise for . The PS element for includes a negative contribution from the AS dipole. The element for includes a significant positive contribution from a quadrupole component not associated with the SS peak davehq ().

Figure 10: (Color online) Power spectrum values (points) derived from the data in Fig. 9 via Eq. (III). The (red) dashed curve is the Gaussian PS described by Eq. (21) with amplitude and width from the basic Model + quadrupole fitted to data in Fig. 9. The interval is consistent with a “white-noise” power spectrum (dotted line) representing the statistical noise in Fig. 9.

Just as for bin 10 we assess the quality of the basic-Model data description by determining the PS of the residuals of (data Gaussian) only, where Gaussian is the fitted Gaussian in Fig. 9. The PS for (data Gaussian) is consistent with a white-noise spectrum with mean value for . The values for 2 are consistent with the fitted positive quadrupole and negative dipole amplitudes. The basic Model augmented by quadrupole component fully exhausts the data signal and is therefore a sufficient model.

vii.3 Bayesian model fits with Fourier series

The log-likelihood LL trend in the form for for FS-only model fits to data from bin 8 (not shown) is similar to that for bin-10 data in Fig. 5. Information representing the parameter cost is also similar. The minimum of occurs at , consistent with Fig. 10 where we again find that a FS-only model should completely describe the signal in the bin-8 data. The FS-only model should then be competitive with the basic Model + quadrupole in terms of fit quality and parameter number, two elements of BI evaluation.

vii.4 Bayesian model comparisons

Figure 11 shows values from conventional data modeling. The FS-only model achieves a substantial reduction for but no significant improvement for additional terms. The basic Model with (solid red square) has a much elevated from the number of fit DoF = 8 and is rejected on that basis. Addition of a quadrupole component (solid green diamond) brings down to an acceptable value. Addition of more cosines (sextupole, octupole) to the basic Model + quadrupole tracks the FS-only noise accommodation.

Figure 11: (Color online) values vs number of parameters for several data models. The general trend is again monotonic decrease with increasing parameter number.

The basic Model + sextupole (solid red triangle) has the same value as that for basic Model + quadrupole. The Gaussian + dipole + sextupole combination can interact to accommodate the independent quadrupole component in the data, since the octupole component of the Gaussian is only a few sigma above the statistical noise. Interactions among the basic Model Gaussian and additional cosine terms are discussed in Sec. X.3.

Figure 12: (Color online) Negative log Evidence vs number of parameters for several models. The basic Model + quadrupole (solid diamond) is strongly favored over others (lowest , largest evidence). The basic Model alone (solid square) is strongly rejected by the evidence, as are FS-only models for all (blue points and line).

Figure 12 shows negative log evidence for various models. The basic Model + quadrupole (solid green diamond) corresponding to model DoF has substantially smaller -2LE (larger evidence ) than other model combinations. It is clearly preferred over the basic Model alone by :1 odds. For other models the cost of extra parameters is not justified by reductions in . The quadrupole model component is preferred over a sextupole by :1 due to differences in the fit covariance matrix for the two models. All FS-only models are again rejected by large factors.

Viii bin-0 83-94% Azimuth correlations

In this third of three cases, essentially representing - (-) collisions, we encounter a major challenge for BI analysis from several sources: (a) The SS peak on azimuth contains two contributions that cannot be separated easily by discarding the bin at the origin as they were for bins 8 and 10, (b) the signal amplitude is much smaller relative to statistical noise (15:1) than it was for more-central collisions (200:1), and (c) the SS peak is substantially broader on azimuth.

viii.1 1D azimuth projection

Figure 13 shows a projection of the 2D data histogram from 83-94% central 200 GeV Au-Au collisions (points). Unlike previous cases the SS peak includes a significant contribution from BE/electrons that is not included in the models (conversion electron pairs do fall mainly within the single bin at the origin). A fit of the basic Model to data is shown by the dashed (red) curve. The fitted model parameters are , and with for 11 - 3 = 8 fit DoF.

Figure 13: (Color online) 1D projection onto azimuth (points) from the 2D data histogram for 83 - 94% central 200 GeV Au-Au collisions in Fig. 1 (a). The (red) dashed curve is a fit to the data with the basic Model of Eq. (20). The statistical errors are 0.0026.

viii.2 Data power spectrum

Figure 14 shows the power spectrum (points and blue solid curve) derived from the data in Fig. 13. As for previous centrality bins we include a predicted power spectrum (red dashed curve) for a 1D Gaussian (SS peak) with amplitude and width parameters derived from the basic Model fitted to data in Fig. 13.

Figure 14: (Color online) Power spectrum values (points) derived from the data in Fig. 13 via Eq. (III). The (red) dashed curve is the Gaussian PS described by Eq. (21) with Gaussian width and amplitude corresponding to the basic Model fitted to data in Fig. 13. The interval is consistent with a “white-noise” power spectrum (dotted line) representing the statistical noise in Fig. 13.

Because the bin-0 SS peak is broader on azimuth (thus narrower on index ) and the S/N is much smaller the PS signal is not significant at or even . A FS-only model in the form of dipole + quadrupole should be sufficient to displace the basic Model. For bins 10 and 8 FS-only models are clearly excluded in favor of the basic Model, but for bin 0 the basic Model and a FS can both describe the two data DoF. Thus, we expect BI analysis to prefer the FS-only model.

viii.3 Bayesian model fits with Fourier series

Figure 15 shows the FS-only trend for bin-0 data (upper solid blue curve and points). As expected, drops to the noise trend (lower solid curve) by . Additional terms accommodate statistical noise. Information follows the expected monotonic increase . Negative log evidence has a minimum for . Thus, an FS-only model with is preferred by the data, as expected from the PS in Fig. 14.

Figure 15: (Color online) (upper solid curve and points) and information (dashed curve and points) vs number of parameters for FS-only models. The sum (dotted curve and points) is also included. The trend for basic-Model fit residuals (lower solid curve) is approximately consistent with the expected noise trend .

viii.4 Bayesian model comparisons

Figure 16 shows trends for several competing models applied to the bin-0 data in Fig. 13. The basic Model and FS-only model describe the data equally well, and we expect the simpler FS-only model to be preferred when an Occam penalty is included. For these bin-0 data the addition of a “delta” component at (open square) leads to substantial improvement in the fit quality, consistent with Fig. 13.

Figure 16: (Color online) values vs number of parameters for several data models. The basic Model (solid square) is equivalent to the FS-only model with (solid dot).

Figure 17 shows the log evidence trend. That the basic Model (solid red square) is preferred over the FS-only model (lowest blue point) despite the cost of the extra model parameter is a major surprise. The evidence ratio (odds) is :1 . That result prompted a detailed study reported in Sec. X.2 on how information is related to model priors and data, with supporting material provided in App. A.

Figure 17: (Color online) Negative log Evidence vs number of parameters for several models. The basic Model (solid square) is favored over all others (lowest , largest evidence), especially over the FS-only model expected to prevail for this centrality (lowest solid dot).

The ability in this case to discriminate between the basic Model and FS models, despite 1D data with low S/N ratio, is a significant achievement for BI analysis. The correctness of the basic-Model preference is confirmed by analysis of 2D data histograms. From the 2D analysis of Ref. anomalous () we learn that the SS 2D peak is necessary for all centralities. In contrast, a 1D FS-only model would fail dramatically for any 2D data, but that is not apparent from 1D projections alone.

Ix Systematic uncertainties

Bayesian Inference methods provide a powerful system for discriminating among competing complex data models with a consistent set of evaluation rules. Close examination of method details and evaluation of uncertainties is required to insure confidence in the results.

ix.1 Uncertainties for data histograms

For 2D histograms from Ref. anomalous () the angular acceptance was divided into 25 bins on the axis and 25 bins on , a trade off between statistical error magnitude and angular resolution. The histograms are by construction symmetric about and . The 25 bins on actually span to insure centering of major peaks on azimuth bin centers. 2D binwise statistical errors are for 200 GeV data near . Because of the dependence of the pair acceptance statistical errors increase with as with acceptance . Errors are uniform on except that errors are larger by factor for angle bins with and because of reflection symmetries.

Statistical errors are approximately independent of centrality for the per-particle statistical measure over nine 10% centrality bins (0-8). An additional factor increase applies to the two most-central centrality bins (9, 10) which split the top 10% of the total cross section. After projection onto 1D azimuth for this study the centrality bin 10 errors are about 0.0037 except for the azimuth bins at 0 and . Errors for the other centrality bins (0, 8) are a factor less or 0.0026. values for optimized models in this study determined with those statistical errors are generally consistent with the number of fit DoF = data DoF K (number of model parameters), as demonstrated in Fig. 5. Thus, the statistical and systematic uncertainties for data histograms used in this study are both small and well understood.

ix.2 Uncertainties for information estimation

Information uncertainty is largely related to the choice of prior PDFs for various model parameters and the fitted-parameter uncertainties. We repeat the information definition in Eq. (7)

(23)

the natural log of prior volume over posterior volume in the model parameter space. The covariance matrix for a -parameter model is obtained from the Hessian describing the curvatures of the likelihood function near its maximum. The likelihood function is in turn determined by model in combination with specific data . If the model and prior are defined and data specified the posterior volume is also well defined.

Assuming translation invariance within a parameter-space volume where the likelihood is significantly nonzero the prior PDF for parameter is taken to be uniform across a bounded interval into which the corresponding fitted parameter value should almost certainly fall. The prior volume for parameters is then

(24)

In principle, a prior is defined before data are obtained and thus should not depend on specific data. However, it is fair to invoke general knowledge () about the typical amplitudes of structures in such data. We know from experience that typical structure amplitudes (e.g. peak-to-peak excursions) are generally . That applies for example to the Gaussian amplitude in the basic Model, and to the Gaussian width based on the definition of the SS peak, implying that in those cases.

What matters more than absolute estimates of is the relations among different models and model parameters. If prior-interval estimates are excessive for a particular model it may be unduly penalized. Given the above assignment for a Gaussian amplitude, what is a fair assignment for cosine coefficients? To that end we examine Eq. (III). The autocorrelation to be modeled on the left receives contributions at the origin from several FS components including factors 2. Thus, it is reasonable to assume that cosine coefficients and uncertainties may be substantially smaller on average than the Gaussian amplitude and uncertainty. For all cosine components we assign and indicate prior-related uncertainties by including and as limiting cases for cosines (e.g. curve and hatched band in Fig. 5).

If we assume equal prior intervals and equal variances for model parameters and negligible covariances among parameters information simplifies to

(25)

In fits with FS-only models we observe . Given we have , while if we reduce to (assumed for all cosine terms) . If we further reduce with the fitted parameter values in some cases contradict the prior, implying that the chosen prior interval is too small. We can then state that for all FS-only models . For the basic model with added cosines the parameter uncertainties are more typically . In that case we obtain . Those results imply that addition of a model parameter is justified ( is significantly reduced) if the resulting decrease in is significantly greater than for FS-only models and for the basic Model plus optional cosines.

ix.3 Uncertainties for odds ratios

As noted in Sec. II.5 odds ratios can be used to state quantitatively the BI relation between two models in the form of a probability ratio , where equality to the second ratio assumes equal model priors for the two cases. In terms of log evidence the Bayes Factor is , and the odds is then .

The uncertainty (error) in an odds ratio is determined by the uncertainties in the compared evidences in turn dominated by uncertainties in the covariance matrix/Hessian and the prior PDFs. Uncertainties for the Hessian matrix are discussed in App. B and serve as the sole basis for the odds errors stated in the text.

Uncertainties for the prior PDFs are discussed in the previous subsection. The priors for SS Gaussian amplitude and width are set to the minimum values consistent with experience, disfavoring the basic Model a priori and implying that any odds favoring the basic Model is a lower limit. A common uncertainty of a factor 2 either way is assumed for a cosine coefficient in any model. Because an odds estimate is a probability ratio systematic errors correlated between numerator and denominator cancel in first order, whereas uncorrelated random errors should combine quadratically. That property can be seen as an advantage for odds as a basis for model comparisons and minimizes the uncertainty contribution from cosine elements common to two compared models.

If models with different values (parameter number) are compared the unpaired systematic error is not canceled. For instance, the odds between the basic Model () vs FS-only model () for bin 10 includes a linear dependence on the FS-only prior uncertainty for one additional cosine term. However, the comparison of basic Model plus quadrupole vs FS-only model for bin 8 (both ) eliminates that uncertainty contribution.

X Discussion

We consider several issues that have arisen in application of BI methods to azimuth-correlation data models, including surprising performance of the 1D basic Model in peripheral collisions, consistent strong preference for the basic Model by BI analysis, competition between Gaussian and cosine terms in data models, and implications from this study for two theoretical narratives.

x.1 Comparing bin 10 and bin 0

The data structure for centrality bin 10 in Fig. 2 could be modeled as (a) two peaks at 0 and , (b) as a Fourier series only, or (c) as a combination of such elements. The two peaks described by the basic Model are expected in a HEP/jets narrative describing high-energy nuclear collisions. FS models are expected in a QGP/flow narrative and are capable of describing any structure on periodic azimuth. Competition among data models thus reflects competition between theoretical narratives.

In Fig. 3 we learn that all information in the data PS is confined to . Higher terms in an FS model describe only statistical noise. Comparing a PS Gaussian model with the data PS we find that four points are predicted by a Gaussian fitted to data, and one point corresponds to the fitted dipole within the basic Model. The basic Model fully represents the data signal as demonstrated in Fig. 4, but so does a FS-only model. Intermediate combinations of Gaussian + cosines also describe the data well. Models with more parameters continue to reduce as in Fig. 6, and might be preferred on that basis.

However, when an Occam penalty is introduced in the form of information dramatic differences among models appear, as in Figs. 7 and 8. In the latter figure the basic-Model probability is %, the next highest being basic Model + quadrupole with %. Adding more cosine components may reduce , but not to an extent that compensates large Occam penalties (increase in ). The additions are essentially “fitting the noise” and are strongly rejected by BI analysis.

A different situation emerges for Bin 0. In Fig. 14 the data signal is confined to for two reasons: (a) The S/N ratio is reduced by a factor 13 and (b) the SS peak azimuth width is increased by 30% so the conjugate PS signal peak width is reduced by that factor. Consequently the “bandwidth” of the data PS signal is reduced from to . A FS-only model with two parameters should then be strongly preferred by BI analysis over the basic Model, given equivalent priors for the two models.

However, that is not what we find in Fig. 17. The basic Model maintains a significant advantage over a FS-only model, the odds ratio being in favor of the basic Model, even with one more parameter. That surprising result led to the detailed comparisons in the next subsection and the study in App. A.

The 1D projection is not the only information we have about the source of these data. The unprojected 2D histogram in Fig. 1 (a) clearly indicates that a SS 2D peak model is required by bin-0 data, and a 2D FS-only model would be rejected by a large factor anomalous (); multipoles (). The 2D observations contribute a larger Bayesian context () applicable to this 1D BI study. The FS-only model is ruled out for all 2D histograms as shown in previous studies multipoles ().

x.2 Why BI analysis favors the basic Model

The basic Model alone (for near-central collisions) or the basic Model plus quadrupole (for noncentral collisions) is strongly preferred by BI analysis over FS-only models, even for the most-peripheral collisions where the 1D data include only two significant DoF. The choice of priors is not the reason; priors are applied consistently for each parameter type within any model. The large difference in evidence values is dominated by differences in the fit covariance matrix. The r.m.s. parameter errors for FS models are consistently 10-15 times smaller than for the basic Model. Evidence differences correspond to differences in model predictivity, as illustrated in the following comparison and App. A.

Figure 18 shows sketches of joint data-parameter spaces for an FS-only model (left) and the basic Model (right) corresponding to bin-10 data. The parameter errors for the FS-only model are typically . The parameter errors for the basic Model in Table 1 are 0.007 for SS peak amplitude and width and 0.002 for dipole amplitude, but with added cosine terms the errors increase to . The data errors for bin 10 are . In terms of angles defined in Eq. (28) 1/3 for the basic Model (20°) and 5 for the FS-only model (80°). The errors and are represented by the dashed rectangles and the angles of the diagonals (solid lines) in the two panels, as in Fig. 19.

Figure 18: Joint parameter-data space for two data models. These panels are zoomed out from the scale of Fig. 19 to reveal the PDFs distributed over the entire parameter and data spaces. Given prior intervals the model angles determine the magnitudes of the predicted data intervals , the predicted data volume , and therefore the evidence . The , and hence the Jacobian of a model function, largely determine the predictivity of a model. For these two models the typical differ by factor 12.

In Fig. 18 the prior PDFs are represented by the vertical dash-dotted lines in each panel and the arrows labeled . For all cosine amplitudes the prior is . For the SS peak amplitude and width the priors are . We can estimate the evidences or predicted data volumes based on the argument in App. A where the relation between data-space volume and parameter-space volume is determined by angle factors . For the basic Model (right panel) even the larger priors () are mapped to smaller data intervals (), whereas for FS-only models (left panel) smaller priors () are mapped to larger data intervals (). The result is much smaller predicted data volumes for the basic Model, and consequently much higher evidence and plausibility compared to FS-only models.

The same argument applies to changes in evidence or information with increasing parameter number (e.g. added cosine terms). From Eq. (A.4) (assuming comparable values for competing models)

(26)

With for all cosine terms, for FS-only models and for basic Model + cosines the typical increment per cosine term is for FS-only models and 2.5 for basic Model + cosines, e.g. consistent with Fig. 7. Thus, evidence and information trends are determined mainly by the relative parameter errors reflecting the Jacobians and model algebra.

The fitted-parameter errors reflect the algebraic structure of the model, as discussed in App. A. Because a Fourier series is orthogonal each coefficient is determined independently. Since the Fourier model elements individually do not resemble the data there is required a very ”fragile” assembly of terms that easily overfits the data (treats noise as signal). Only a small range of FS-only parameter values can reproduce a given data set, and the parameter variances are consequently very small.

In contrast, the basis Model includes a Gaussian (motivated by the data structure) with nonlinear parameter that covaries with other parameters. Thus, larger ranges of basic-Model parameters can reproduce the data adequately, and the parameter variances are correspondingly larger. The basic Model is more ”robust” because on average the model elements individually look more like isolated data components. If both models give the same chi-squared fit the basic Model is preferred by BI analysis because on average it is far more likely to describe the data accurately (a larger fraction of the prior-delimited parameter space provides an acceptable description for the given data).

We conclude that the key issue for Bayesian model comparisons is model predictivity. The basic Model is highly predictive (therefore falsifiable), describing two peaks (fixed at 0 and ), with one peak as wide as possible and the other somewhat narrower. Two peak amplitudes and a width are the only parameters. The basic Model is consistent with the HEP/jets narrative but was inferred from data without any theory assumptions. In contrast, FS-only models can describe any structure on azimuth, have no predictivity (are not falsifiable) and are therefore strongly rejected by BI analysis. Model predictivity [smallness of predicted data volume ] is determined largely by the algebraic structure of the data model (Jacobian) as revealed by fitted-parameter errors compared to data errors via the elements.

x.3 Competition: extra cosine terms vs SS Gaussian

The bin-10 results in Table 1 can be used to examine the consequences of adding one or more cosine terms to the basic Model when there is no corresponding data signal. The is reduced in general, suggesting an improved data description. However, in some cases the model parameters undergo large changes seeming to indicate that model parameters are very uncertain. To understand the apparent contradiction we consider the bin-10 “worst case” model (basic Model + quadrupole + sextupole + octupole) appearing in the next-to-last column of Table 1.

The model difference (“ basic Model) for each cosine coefficient is , , and for . The differences correspond to the predicted Gaussian PS values in Fig. 3 (red dashed curve). In effect, changes in the cosine coefficients of the FS-only model are equivalent to the fitted Gaussian already describing the data signal correctly in the basic Model. The SS Gaussian required by the data is effectively excluded from the data model by the added cosine terms, reduced to a minor role sextupole ().

The bin-10 result reveals a competition between the basic Model and a truncated FS to describe signal + noise. The competing truncated FS offers more flexibility in accommodating noise compared to the monolithic Gaussian. The FS may “win” in terms of , but a well-chosen model element (Gaussian) describes only the signal and excludes the noise. Referring to Fig. 6 the (“+ ”) model (open diamond) has the same value as the FS-only model (solid point) because the former is effectively a FS. The Gaussian, with two parameters, has been excluded from the fit model owing to noise competition, but its two parameters still contribute to the Occam penalty. BI analysis then rejects the unnecessary cosine terms in favor of the basic Model.

x.4 Evidence extending beyond single histograms

A model may describe data from some A-A centralities well but others poorly. Nevertheless, the model may be retained by convention because of desirable features (such as flow interpretations). Other forms of data selection ( cuts, 1D projections, ratio measures) present similar issues. In response we propose to extend BI methods beyond single data histograms, combining results into one comprehensive evaluation for competing data models.

The mechanism is suggested by the nature of Bayesian evidence . The evidence is a probability, and by the rules governing probabilities the joint evidence for several cases should be the product of elementary evidences (assuming approximate independence). For instance, the evidence for a model of 200 GeV Au-Au collisions should be the product of evidences for individual centralities. If a model claiming to describe all data components is falsified for one component then it is falsified for all. More generally, a model that provides an adequate description for all cases may be preferred over a model that is favored for some cases but strongly disfavored for others.

That principle extends not only to A-A centralities but to different collision energies, A-B collision systems, spectrum and correlation measures and data cuts. Evidence as a product measure introduces an “and” condition for data description. A candidate model must address all available data within its parameter space or be rejected.

x.5 Implications for theoretical narratives

As noted in the introduction HEP/jets and QGP/flow narratives currently compete to describe and interpret high-energy nuclear-collision data through choices of data model and emphasis on specific data and measured quantities. The HEP/jets narrative predicts two dijet-related peaks on 1D azimuth, just what the basic Model describes. Almost all 1D azimuth correlation data from the RHIC are described by the basic Model + quadrupole with modest parameter variations. The QGP/flow narrative prefers various forms of the FS-only data model interpreted physically in a flow context, from a single cosine (, index ) to several cosines interpreted to include “higher harmonic” flows (index ).

In the present study we apply BI methods to 1D azimuth data models associated with the two narratives. BI analysis strongly favors the basic Model in all cases, combined with an additional quadrupole term except for the most-central data. The FS-only model is strongly rejected in all cases. As discussed in Sec. X.2 and App. A the main reason for BI rejection is lack of predictivity for FS-only models, whereas the basic Model is strongly predictive and therefore falsifiable. The present BI analysis thus seems to support the HEP/jets narrative and reject the QGP/flow narrative per their data models.

It could be argued that application of BI methods to data models represents an arbitrary choice motivated by interest in a specific outcome. However, we are faced with the requirement to evaluate conflicting data models according to some neutral criteria. minimization always prefers more-complex data models that may reveal little about data structure and possible physical mechanisms. Flow interpretations are always possible for FS-only models, but such models cannot exclude a dijet interpretation since they are able to describe any data configuration.

Additional criteria are therefore required to test data models. Guidance as to choice is provided by the role of rational inference within the scientific method. It is recognized that physical theories cannot be proven, can only be falsified by data, requiring that candidate theories be predictive. Unpredictive theories are not falsifiable and are therefore rejected as candidates. In a Bayesian context predictivity is measured by information and evidence as demonstrated in this study. For a well-tested physical theory encountering new data the information , and the predicted data-space volume is small. If the theory is falsified but results in plausibility : dramatically different results

In the present analysis we encounter not competing physical theories but competing data models serving as proxies. BI analysis evaluates data models according to predictivity, i.e. the degree of restriction on allowed data configurations. We conclude that the basic Model with optional quadrupole component is very predictive, corresponding to small information gain from newly-received data and consequent small predicted data-space volume. FS-only models are not predictive, can accommodate any data configuration, and are therefore rejected.

Xi Summary and Conclusions

Based on data from the relativistic heavy ion collider (RHIC) and large hadron collider (LHC) claims have been made for formation in high-energy nucleus-nucleus (A-A) collisions of a strongly-coupled quark-gluon plasma (sQGP) with small viscosity – a “perfect liquid.” Such claims are based mainly on measurements of Fourier coefficients of cosine terms used to describe two-particle correlations on azimuth and interpreted to represent flows, especially representing elliptic flow. In the flow context dijets play a comparatively negligible role in final-state correlation structure.

Modeling azimuth correlations by truncated Fourier series or individual cosine terms is not unique. Other model functions can describe the same data equally well and do suggest alternative physical interpretations, especially substantial contributions from dijet production. In effect, two physics narratives compete to describe and interpret the same data. In one narrative collision dynamics is dominated by dijet production. In the other narrative collision dynamics is dominated by a dense, flowing QCD medium. Opposing narratives appear to be supported by their respective data models. To break the deadlock a method is required to evaluate model functions according to neutral criteria and identify a preferred model.

In this study we introduce Bayesian Inference (BI) to evaluate competing model functions. BI analysis relies on a combination of the usual goodness-of-fit parameter and information derived from the fit covariance matrix. quantifies changes in the data model arising from acquisition of new data and represents an Occam penalty for excessive model complexity. Combination leads to evidence parameter that determines the plausibility of each model when confronted with new data values. The goal is to rank data models according to BI criteria without resorting to a priori physics assumptions.

We apply several representative model functions to angular correlation data and evaluate the model performance with BI methods. The data are published 2D angular correlations from three centrality classes of 200 GeV Au-Au collisions on . 2D histograms are projected onto periodic azimuth by integration over pseudorapidity . The three collision centralities include the centrality extremes (most central and most peripheral) and an intermediate centrality that requires a separate azimuth-quadrupole model element in the data model.

Model functions include (a) a “basic Model” consisting of a same-side (SS) peak modeled by a Gaussian at and an away-side (AS) peak at modeled by a cylindrical dipole , (b) the basic Model plus one or more additional cosine terms and (c) several Fourier-series (FS-only) models consisting only of one or more cosine term