The PDF4LHC Working Group Interim Report

The PDF4LHC Working Group Interim Report

Sergey Alekhin, Simone Alioli, Richard  D.  Ball, Valerio Bertone, Johannes Blümlein, Michiel Botje, Jon Butterworth, Francesco Cerutti, Amanda Cooper-Sarkar, Albert de Roeck, Luigi Del Debbio, Joel Feltesse, Stefano Forte, Alexander Glazov, Alberto Guffanti, Claire Gwenlan, Joey Huston, Pedro Jimenez-Delgado, Hung-Liang Lai, José I. Latorre, Ronan McNulty, Pavel Nadolsky, Sven Olaf Moch, Jon Pumplin, Voica Radescu, Juan Rojo, Torbjörn Sjöstrand, W.J. Stirling, Daniel Stump, Robert  S.  Thorne, Maria Ubiali, Alessandro Vicini, Graeme Watt, C.-P. Yuan

This document is intended as a study of benchmark cross sections at the LHC (at 7 TeV) at NLO using modern PDFs currently available from the 6 PDF fitting groups that have participated in this exercise. It also contains a succinct user guide to the computation of PDFs, uncertainties and correlations using available PDF sets.

A companion note provides an interim summary of the current recommendations of the PDF4LHC working group for the use of parton distribution functions (PDFs) and of PDF uncertainties at the LHC, for cross section and cross section uncertainty calculations.

The PDF4LHC Working Group Interim Report

Sergey Alekhin, Simone Alioli, Richard  D.  Ball, Valerio Bertone, Johannes Blümlein, Michiel Botje, Jon Butterworth, Francesco Cerutti, Amanda Cooper-Sarkar, Albert de Roeck, Luigi Del Debbio, Joel Feltesse, Stefano Forte, Alexander Glazov, Alberto Guffanti, Claire Gwenlan, Joey Huston, Pedro Jimenez-Delgado, Hung-Liang Lai, José I. Latorre, Ronan McNulty, Pavel Nadolsky, Sven Olaf Moch, Jon Pumplin, Voica Radescu, Juan Rojo, Torbjörn Sjöstrand, W.J. Stirling, Daniel Stump, Robert  S.  Thorne, Maria Ubiali, Alessandro Vicini, Graeme Watt, C.-P. Yuan

Deutsches Elektronen-Synchrotron, DESY, Platanenallee 6, D-15738 Zeuthen, Germany

Institute for High Energy Physics, IHEP, Pobeda 1, 142281 Protvino, Russia

School of Physics and Astronomy, University of Edinburgh, JCMB, KB, Mayfield Rd, Edinburgh EH9 3JZ, Scotland

Physikalisches Institut, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 3, D-79104 Freiburg i. B., Germany

NIKHEF, Science Park, Amsterdam, The Netherlands

Department of Physics and Astronomy, University College, London, WC1E 6BT, UK

Departament d’Estructura i Constituents de la Matèria, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona, Spain

Department of Physics, Oxford University, Denys Wilkinson Bldg, Keble Rd, Oxford, OX1 3RH, UK

CERN, CH–1211 Genève 23, Switzerland; Antwerp University, B–2610 Wilrijk, Belgium; University of California Davis, CA, USA

CEA, DSM/IRFU, CE-Saclay, Gif-sur-Yvetee, France

Dipartimento di Fisica, Università di Milano and INFN, Sezione di Milano, Via Celoria 16, I-20133 Milano, Italy

Deutsches Elektronensynchrotron DESY Notkestraße 85 D–22607 Hamburg, Germany

Physics and Astronomy Department, Michigan State University, East Lansing, MI 48824, USA

Institut für Theoretische Physik, Universität Zürich, CH-8057 Zürich, Switzerland

Taipei Municipal University of Education, Taipei, Taiwan

School of Physics, University College Dublin Science Centre North, UCD Belfeld, Dublin 4, Ireland

Department of Physics, Southern Methodist University, Dallas, TX 75275-0175, USA

Physikalisches Institut, Universität Heidelberg Philosophenweg 12, D–69120 Heidelberg, Germany

Department of Astronomy and Theoretical Physics, Lund University, Sölvegatan 14A, S-223 62 Lund, Sweden

Cavendish Laboratory, University of Cambridge, CB3 OHE, UK

Institut für Theoretische Teilchenhysik und Kosmologie, RWTH Aachen University, D-52056 Aachen, Germany

Theory Group, Physics Department, CERN, CH-1211 Geneva 23, Switzerland

1. Introduction

The LHC experiments are currently producing cross sections from the 7 TeV data, and thus need accurate predictions for these cross sections and their uncertainties at NLO and NNLO. Crucial to the predictions and their uncertainties are the parton distribution functions (PDFs) obtained from global fits to data from deep-inelastic scattering, Drell-Yan and jet data. A number of groups have produced publicly available PDFs using different data sets and analysis frameworks. It is one of the charges of the PDF4LHC working group to evaluate and understand differences among the PDF sets to be used at the LHC, and to provide a protocol for both experimentalists and theorists to use the PDF sets to calculate central cross sections at the LHC, as well as to estimate their PDF uncertainty. This current note is intended to be an interim summary of our level of understanding of NLO predictions as the first LHC cross sections at 7 TeV are being produced 111Comparisons at NNLO for , and Higgs production can be found in ref. [1]. The intention is to modify this note as improvements in data/understanding warrant.

For the purpose of increasing our quantitative understanding of the similarities and differences between available PDF determinations, a benchmarking exercise between the different groups was performed. This exercise was very instructive in understanding many differences in the PDF analyses: different input data, different methodologies and criteria for determining uncertainties, different ways of parametrizing PDFs, different number of parametrized PDFs, different treatments of heavy quarks, different perturbative orders, different ways of treating (as an input or as a fit parameter), different values of physical parameters such as itself and heavy quark masses, and more. This exercise was also very instructive in understanding where the PDFs agree and where they disagree: it established a broad agreement of PDFs (and uncertainties) obtained from data sets of comparable size and it singled out relevant instances of disagreement and of dependence of the results on assumptions or methodology.

The outline of this interim report is as follows. The first three sections are devoted to a description of current PDF sets and their usage. In Sect. 2. we present several modern PDF determinations, with special regard to the way PDF uncertainties are determined. First we summarize the main features of various sets, then we provide an explicit users’ guide for the computation of PDF uncertainties. In Sect. 3. we discuss theoretical uncertainties on PDFs. We first introduce various theoretical uncertainties, then we focus on the uncertainty related to the strong coupling and also in this case we give both a presentation of choices made by different groups and a users’ guide for the computation of combined PDF+ uncertainties. Finally in Sect. 4. we discuss PDF correlations and the way they can be computed.

In Sect. 5. we introduce the settings for the PDF4LHC benchmarks on LHC observables, present the results from the different groups and compare their predictions for important LHC observables at 7 TeV at NLO. In Sect. 6. we conclude and briefly discuss prospects for future developments.

2. PDF determinations - experimental uncertainties

Experimental uncertainties of PDFs determined in global fits (usually called “PDF uncertainties” for short) reflect three aspects of the analysis, and differ because of different choices made in each of these aspects: (1) the choice of data set; (2) the type of uncertainty estimator used which is used to determine the uncertainties and which also determines the way in which PDFs are delivered to the user; (3) the form and size of parton parametrization. First, we briefly discuss the available options for each of these aspects (at least, those which have been explored by the various groups discussed here) and summarize the choices made by each group; then, we provide a concise user guide for the determination of PDF uncertainties for available fits. We will in particular discuss the following PDF sets (when several releases are available the most recent published ones are given in parenthesis in each case): ABKM/ABM [2, 3], CTEQ/CT (CTEQ6.6 [4], CT10 [5]), GJR [6, 7], HERAPDF (HERAPDF1.0 [8]), MSTW (MSTW08 [9]), NNPDF (NNPDF2.0 [10]). There is a significant time-lag between the development of a new PDF and the wide adoption of its use by experimental collaborations, so in some cases, we report not on the most up-to-date PDF from a particular group, but instead on the most widely-used.

2.1 Features, tradeoffs and choices

2.11 Data Set

There is a clear tradeoff between the size and the consistency of a data set: a wider data set contains more information, but data coming from different experiment may be inconsistent to some extent. The choices made by the various groups are the following:

  • The CTEQ, MSTW and NNPDF data sets considered here include both electroproduction and hadroproduction data, in each case both from fixed-target and collider experiments. The electroproduction data include electron, muon and neutrino deep–inelastic scattering data (both inclusive and charm production). The hadroproduction data include Drell-Yan (fixed target virtual photon and collider and production) and jet production 222Although the comparisons included in this note are only at NLO,we note that, to date, the inclusive jet cross section, unlike the other processes in the list above, has been calculated only to NLO, and not to NNLO. This may have an impact on the precision of NNLO global PDF fits that include inclusive jet data..

  • The GJR data set includes electroproduction data from fixed-target and collider experiments, and a smaller set of hadroproduction data. The electroproduction data include electron and muon inclusive deep–inelastic scattering data, and deep-inelastic charm production from charged leptons and neutrinos. The hadroproduction data includes fixed–target virtual photon Drell-Yan production and Tevatron jet production.

  • The ABKM/ABM data sets include electroproduction from fixed-target and collider experiments, and fixed–target hadroproduction data. The electroproduction data include electron, muon and neutrino deep–inelastic scattering data (both inclusive and charm production). The hadroproduction data include fixed–target virtual photon Drell-Yan production. The most recent version, ABM10 [11], includes Tevatron jet data.

  • The HERAPDF data set includes all HERA deep-inelastic inclusive data.

2.12 Statistical treatment

Available PDF determinations fall in two broad categories: those based on a Hessian approach and those which use a Monte Carlo approach. The delivery of PDFs is different in each case and will be discussed in Sect. 2.2.

Within the Hessian method, PDFs are determined by minimizing a suitable log-likelihood function. Different groups may use somewhat different definitions of , for example, by including entirely, or only partially, correlated systematic uncertainties. While some groups account for correlated uncertainties by means of a covariance matrix, other groups treat some correlated systematics (specifically but not exclusively normalization uncertanties) as a shift of data, with a penalty term proportional to some power of the shift parameter added to the . The reader is referred to the original papers for the precise definition adopted by each group, but it should be born in mind that because of all these differences, values of the quoted by different groups are in general only roughly comparable.

With the covariance matrix approach, we can define , are data, theoretical predictions, is the number of data points (note the inclusion of the factor in the definition) and is the covariance matrix. Different groups may use somewhat different definitions of the covariance matrix, by including entirely or only partially correlated uncertainties. The best fit is the point in parameter space at which is minimum, while PDF uncertainties are found by diagonalizing the (Hessian) matrix of second derivatives of the at the minimum (see Fig. 1) and then determining the range of each orthonormal Hessian eigenvector which corresponds to a prescribed increase of the function with respect to the minimum.

In principle, the variation of the which corresponds to a 68% confidence (one sigma) is . However, a larger variation , with a suitable “tolerance” parameter [12, 13, 14] may turn out to be necessary for more realistic error estimates for fits containing a wide variety of input processes/data, and in particular in order for each individual experiment which enters the global fit to be consistent with the global best fit to one sigma (or some other desired confidence level such as 90%). Possible reasons why this is necessary could be related to data inconsistencies or incompatibilities, underestimated experimental systematics, insufficiently flexible parton parametrizations, theoretical uncertainties or approximation in the PDF extraction. At present, HERAPDF and ABKM use , GJR uses at one sigma (corresponding to at 90% c.l.), CTEQ6.6 uses at 90% c.l. (corresponding to to one sigma) and MSTW08 uses a dynamical tolerance [9], i.e. a different value of for each eigenvector, with values for one sigma ranging from to and most values being .

Within the NNPDF method, PDFs are determined by first producing a Monte Carlo sample of pseudo-data replicas. Each replica contains a number of points equal to the number of original data points. The sample is constructed in such a way that, in the limit , the central value of the -th data point is equal to the mean over the values that the -th point takes in each replica, the uncertainty of the same point is equal to the variance over the replicas, and the correlations between any two original data points is equal to their covariance over the replicas. From each data replica, a PDF replica is constructed by minimizing a function. PDF central values, uncertainties and correlations are then computed by taking means, variances and covariances over this replica sample. NNPDF uses a Monte Carlo method, with each PDF replica obtained as the minimum which satisfies a cross-validation criterion [15, 10], and is thus larger than the absolute minimum of the . This method has been used in all NNPDF sets from NNPDF1.0 onwards.

2.13 Parton parametrization

Existing parton parametrizations differ in the number of PDFs which are independently parametrized and in the functional form and number of independent parameters used. They also differ in the choice of individual linear combinations of PDFs which are parametrized. In what concerns the functional form, the most common choice is that each PDF at some reference scale is parametrized as


where is a function which tends to a constant both for and , such as for instance (HERAPDF). The fit parameters are , and the parameters in . Some of these parameters may be chosen to take a fixed value (including zero). The general form Eq. (1) is adopted in all PDF sets which we discuss here except NNPDF, which instead lets


where is a neural network, and is is a “preprocessing” function. The fit parameters are the parameters which determine the shape of the neural network (a 2-5-3-1 feed-forward neural network for NNPDF2.0). The preprocessing function is not fitted, but rather chosen randomly in a space of functions of the general form Eq. (2) within some acceptable range of the parameters and , and with .

The basis functions and number of parameters are the following.

  • ABKM parametrizes the two lightest flavours and antiflavours, the total strangeness and the gluon (five independent PDFs) with 21 free parameters.

  • CTEQ6.6 and CT10 parametrize the two lightest flavours and antiflavours the total strangeness and the gluon (six independent PDFs) with respectively 22 and 26 free parameters.

  • GJR parametrizes the two lightest flavours and antiflavours and the gluon with 20 free parameters (five independent PDFs); the strange distribution is assumed to be either proportional to the light sea or to vanish at a low scale  GeV at which PDFs become valence-like.

  • HERAPDF parametrizes the two lightest flavours, , the combination and the gluon with 10 free parameters (six independent PDFs), strangeness is assumed to be proportional to the distribution; HERAPDF also studies the effect of varying the form of the parametrization and of and varying the relative size of the strange component and thus determine a model and parametrization uncertainty (see Sect.3.23 for more details).

  • MSTW parametrizes the three lightest flavours and antiflavours and the gluon with 28 free parameters (seven independent PDFs) to find the best fit, but 8 are held fixed in determining uncertainty eigenvectors.

  • NNPDF parametrizes the three lightest flavours and antiflavours and the gluon with 259 free parameters (37 for each of the seven independent PDFs).

2.2 PDF delivery and usage

The way uncertainties should be determined for a given PDF set depends on whether it is a Monte Carlo set (NNPDF) or a Hessian set (all other sets). We now describe the procedure to be followed in each case.

2.21 Computation of Hessian PDF uncertainties

For Hessian PDF sets, both a central set and error sets are given. The number of eigenvectors is equal to the number of free parameters. Thus, the number of error PDFs is equal to twice that. Each error set corresponds to moving by the specified confidence level (one sigma or 90% c.l.) in the positive or negative direction of each independent orthonormal Hessian eigenvector.

Consider a variable ; its value using the central PDF for an error set is given by . is the value of that variable using the PDF corresponding to the “” direction for the eigenvector , and the value for the variable using the PDF corresponding to the “” direction.

Fig. 1: A schematic representation of the transformation from the PDF parameter basis to the orthonormal eigenvector basis [13].

adds in quadrature the PDF error contributions that lead to an increase in the observable , and the PDF error contributions that lead to a decrease. The addition in quadrature is justified by the eigenvectors forming an orthonormal basis. The sum is over all eigenvector directions. Ordinarily, one of and will be positive and one will be negative, and thus it is trivial as to which term is to be included in each quadratic sum. For the higher number (less well-determined) eigenvectors, however, the “” and “”eigenvector contributions may be in the same direction. In this case, only the more positive term will be included in the calculation of and the more negative in the calculation of  [24]. Thus, there may be less than non-zero terms for either the “” or “” directions. A symmetric version of this is also used by many groups, given by the equation below:

In most cases, the symmetric and asymmetric forms give very similar results. The extent to which the symmetric and asymmetric errors do not agree is an indication of the deviation of the distribution from a quadratic form. The lower number eigenvectors, corresponding to the best known directions in eigenvector space, tend to have very symmetric errors, while the higher number eigenvectors can have asymmetric errors. The uncertainty for a particular observable then will (will not) tend to have a quadratic form if it is most sensitive to lower number (higher number) eigenvectors. Deviations from a quadratic form are expected to be greater for larger excursions, i.e. for 90%c.l. limits than for 68% c.l. limits.

The HERAPDF analysis also works with the Hessian matrix, defining experimental error PDFs in an orthonormal basis as described above. The symmetric formula Eq. LABEL:eq:symm is most often used to calculate the experimental error bands on any variable, but it is possible to use the asymmetric formula as for MSTW and CTEQ. (For HERAPDF1.0 these errors are provided at c.l. in the LHAPDF file: HERAPDF10 EIG.LHgrid).

Other methods of calculating the PDF uncertainties independent of the Hessian method, such as the Lagrange Multiplier approach [12], are not discussed here.

2.22 Computation of Monte Carlo PDF uncertainties

For the NNPDF Monte Carlo set, a Monte Carlo sample of PDFs is given. The expectation value of any observable (for example a cross–section) which depends on the PDFs is computed as an average over the ensemble of PDF replicas, using the following master formula:


where is the number of replicas of PDFs in the Monte Carlo ensemble. The associated uncertainty is found as the standard deviation of the sample, according to the usual formula


These formulae may also be used for the determination of central values and uncertainties of the parton distribution themselves, in which case the functional is identified with the parton distribution : . Indeed, the central value for PDFs themselves is given by


NNPDF provides both sets of and replicas. The larger set ensures that statistical fluctuations are suppressed so that even oddly-shaped probability distributions such as non-gaussian or asymmetric ones are well reproduced, and more detailed features of the probability distributions such as correlation coefficients or uncertainties on uncertainties can be determined accurately. However, for most common applications such as the determination of the uncertainty on a cross section the smaller replica set is adequate, and in fact central values can be determined accurately using a yet smaller number of PDFs (typically ), with the full set of only needed for the reliable determination of uncertainties.

NNPDF also provides a set 0 in the NNPDF20_100.LHgrid LHAPDF file, as in previous releases of the NNPDF family, while replicas 1 to 100 correspond to PDF sets 1 to 100 in the same file. This set 0 contains the average of the PDFs, determined using Eq. (7): in other words, set 0 contains the central NNPDF prediction for each PDF. This central prediction can be used to get a quick evaluation of a central value. However, it should be noticed that for any which depends nonlinearly on the PDFs, . This means that a cross section evaluated from the central set is not exactly equal to the central cross section (though it will be for example for deep-inelastic structure functions, which are linear in the PDFs). Hence, use of the 0 set is not recommended for precision applications, though in most cases it will provide a good approximation. Note that set should not be included when computing an average with Eq. (5), because it is itself already an average.

Equation (6) provides the 1–sigma PDF uncertainty on a general quantity which depends on PDFs. However, an important advantage of the Monte Carlo method is that one does not have to rely on a Gaussian assumption or on linear error propagation. As a consequence, one may determine directly a confidence level: e.g. a 68% c.l. for is simply found by computing the values of and discarding the upper and lower 16% values. In a general non-gaussian case this 68% c.l. might be asymmetric and not equal to the variance (one–sigma uncertainty). For the observables of the present benchmark study the 1–sigma and 68% c.l. PDF uncertainties turn out to be very similar and thus only the former are given, but this is not necessarily the case in in general. For example, the one sigma error band on the NNPDF2.0 large gluon and the small strangeness is much larger than the corresponding 68% CL band, suggesting non-gaussian behavior of the probability distribution in these regions, in which PDFs are being extrapolated beyond the data region.

3. PDF determinations - Theoretical uncertainties

Theoretical uncertainties of PDFs determined in global fits reflect the approximations in the theory which is used in order to relate PDFs to measurable quantities. The study of theoretical PDF uncertainties is currently less advanced that that of experimental uncertainties, and only some theoretical uncertainties have been explored. One might expect that the main theoretical uncertainties in PDF determination should be related to the treatment of the strong interaction: in particular to the values of the QCD parameters, specifically the value of the strong coupling and of the quark masses and and uncertainties related to the truncation of the perturbative expansion (commonly estimated through the variation of renormalization and factorization scales). Further uncertainties are related to the treatment of heavy quark thresholds, which are handled in various ways by different groups (fixed flavour number vs. variable flavour number schemes, and in the latter case different implementations of the variable flavour number scheme), and to further approximations such as the use of -factor approximations. Finally, more uncertainties may be related to weak interaction parameters (such as the mass) and to the treatment of electroweak effects (such as QED PDF evolution [16] ).

Of these uncertainties, the only one which has been explored systematically by the majority of the PDF groups is the uncertainty. The way uncertainty can be determined using CTEQ, HERAPDF, MSTW, and NNPDF will be discussed in detail below. HERAPDF also provides model and parametrization uncertainties which include the effect of varying and , as well as the effect of varying the parton parametrization, as will also be discussed below. Sets with varying quark masses and their implications have recently been made available by MSTW [17], the effects of varying and have been included by ABKM [2] and preliminary studies of the effect of and have also been presented by NNPDF [18]. Uncertainties related to factorization and renormalization scale variation and to electroweak effects are so far not available. For the benchmarking exercise of Sec. 5., results are given adopting common values of electroweak parameters, and at least one common value of (though values for other values of are also given), but no attempt has yet been made to benchmark the other aspects mentioned above.

3.1 The value of and its uncertainty

We thus turn to the only theoretical uncertainty which has been studied systematically so far, namely the uncertainty on . The choice of value of is clearly important because it is strongly correlated to PDFs, especially the gluon distribution (the correlation of with the gluon distribution using CTEQ, MSTW and NNPDF PDFs is studied in detail in Ref. [19]). See also Ref. [2] for a discussion of this correlation in the ABKM PDFs. There are two separate issues related to the value of in PDF fits: first, the choice of for which PDFs are made available, and second the choice of the preferred value of to be used when giving PDFs and their uncertainties. The two issues are related but independent, and for each of the two issue two different basic philosophies may be adopted.

Concerning the range of available values of :

  • PDFs fits are performed for a number of different values of . Though a PDF set corresponding to some reference value of is given, the user is free to choose any of the given sets. This approach is adopted by CTEQ (0.118), HERAPDF (0.1176), MSTW (0.120) and NNPDF (0.119), where we have denoted in parenthesis the reference (NLO) value of for each set.

  • is treated as a fit parameters and PDFs are given only for the best–fit value. This approach is adopted by ABKM (0.1179) and GJR (0.1145), where in parenthesis the best-fit (NLO) value of is given.

Fig. 2: Values of for which fits are available. The default values and uncertainties used by each group are also shown. Plot by G. Watt [27].

Concerning the preferred central value and the treatment of the uncertainty:

  • The value of is taken as an external parameter, along with other parameters of the fit such as heavy quark masses or electroweak parameter. This approach is adopted by CTEQ, HERAPDF1.0 and NNPDF. In this case, there is no apriori central value of and the uncertainty on is treated by repeating the PDF determination as is varied in a suitable range. Though a range of variation is usually chosen by the groups, any other range may be chosen by the user.

  • The value of is treated as a fit parameter, and it is determined along with the PDFs. This approach is adopted by MSTW, ABKM and GJR08. In the last two cases, the uncertainty on is part of the Hessian matrix of the fit. The MSTW approach is explained below.

As a cross-check,CTEQ [20] has also used the world average value of as an additional input to the global fit.

The values of for which fits are available, as well as the default values and uncertainties used by each group are summarized in Fig. 2 333There is implicitly an additional uncertainty due to scale variation.See for example Ref. [26].. The most recent world average value of is  [22] 444We note that the values used in the average are from extractions at different orders in the perturbative expansion.. However, a more conservative estimate of the uncertainty on was felt to be appropriate for the benchmarking exercise summarized in this note, for which we have taken at 90%c.l. (corresponding to 0.0012 at one sigma). This uncertainty has been used for the CTEQ, NNPDF and HERAPDF studies. For MSTW, ABKM and GJR the preferred uncertainty for each group is used, though for MSTW in particular this is close to 0.0012 at one sigma. It may not be unreasonable to argue that a yet larger uncertainty may be appropriate.

When comparing results obtained using different PDF sets it should be borne in mind that if different values of are used, cross section predictions change both because of the dependence of the cross section on the value of (which for some processes such as top production or Higgs production in gluon-gluon fusion may be quite strong), and because of the dependence of the PDFs themselves on the value of . Differences due to the PDFs alone can be isolated only when performing comparisons at a common value of .

3.2 Computation of PDF+ uncertainties

Within the quadratic approximation to the dependence of on parameters (i.e. linear error propagation), it turns out that even if PDF uncertainty and the uncertainty are correlated, the total one-sigma combined PDF+ uncertainty including this correlation can be simply found without approximation by computing the one sigma PDF uncertainty with fixed at its central value and the one-sigma uncertainty with the PDFs fixed their central value, and adding results in quadrature [20], and similarly for any other desired confidence level.

For example, if is the PDF uncertainty for a cross section and is the uncertainty, the combined uncertainty is


Other treatments can be used when deviations from the quadratic approximation are possible. Indeed,for MSTW because of the use of dynamical tolerance linear error propagation does not necessarily apply. For NNPDF, because of the use of a Monte Carlo method linear error propagation is not assumed: in practice, addition in quadrature turns out to be a very good approximation, but an exact treatment is computationally simpler. We now describe in detail the procedure for the computation of and PDF uncertainties (and for HERAPDF also of model and parametrization uncertainties) for various parton sets.

3.21 CTEQ - Combined PDF and uncertainties

CTEQ takes as an external input parameter and provides the CTEQ6.6alphas [20] (or the CT10alpha [5]) series which contains 4 sets extracted using ; The uncertainty associated with   can be evaluated by computing any given observable with in the partonic cross-section and with the PDF sets that have been extracted with these values of . The differences


are the   uncertainties according to CTEQ. In [20] it has been demonstrated that, in the Hessian approach, the combination in quadrature of PDF and uncertainties is correct within the quadratic approximation. In the studies in Ref. [20], CTEQ did not find appreciable deviations from the quadratic approximation, and thus the procedure described below will be accurate for the cross sections considered here.

Therefore, for CTEQ6.6 the combined PDF+uncertainty is given by


3.22 MSTW - Combined PDF and uncertainties

MSTW fits   together with the PDFs and obtains and . Any correlation between the PDF and the   uncertainties is taken into account with the following recipe [23]. Beside the best-fit sets of PDFs, which correspond to , four more sets,both at NLO and at NNLO, of PDFs are provided. The latter are extracted setting as input , where is the standard deviation indicated here above. Each of these extra sets contains the full parametrization to describe the PDF uncertainty. Comparing the results of the five sets, the combined PDF+uncertainty is defined as:


where run over the five values of under study, and the corresponding PDF uncertainties are used.

The central and , where sets are all obtained using the dynamical tolerance prescription for PDF uncertainty which determines the uncertainty when the quality of the fit to any one data set (relative to the best fit for the preferred value of ) becomes sufficiently poor. Naively one might expect that the PDF uncertainty for the might then be zero since one is by definition already at the limit of allowed fit quality for one data set. If this were the case the procedure of adding PDF and uncertainties would be a very good approximation. However, in practice there is freedom to move the PDFs in particular directions without the data set at its limit of fit quality becoming worse fit, and some variations can be quite large before any data set becomes sufficiently badly fit for the criterion for uncertainty to be met. This can led to significantly larger PDF uncertainties than the simple quadratic prescription. In particular, since there is a tendency for the best fit to have a too low value of at low , at higher value the small- gluon has freedom to increase without spoiling the fit, and the PDF uncertainty is large in the upwards direction for Higgs production.

3.23 Herapdf - , model and parametrization uncertainties

HERAPDF provides not only uncertainties, but also model and parametrization uncertainties. Note that at least in part parametrization uncertainty will be accounted for by other groups by the use of a significantly larger number of initial parameters, the use of a large tolerance (CTEQ, MSTW) or by a more general parametrization (NNPDF), as discussed in Sect. 2.13. However, model uncertainties related to heavy quark masses are not determined by other groups.

The model errors come from variation of the choices of: charm mass (GeV); beauty mass ( GeV); minimum of data used in the fit ( GeV); fraction of strange sea in total d-type sea ( at the starting scale). The model errors are calculated by taking the difference between the central fit and the model variation and adding them in quadrature, separately for positive and negative deviations. (For HERAPDF1.0 the model variations are provided as members 1 to 8 of the LHAPDF file: HERAPDF10 VAR.LHgrid).

The parametrization errors come from: variation of the starting scale GeV; variations of the basic 10 parameter fit to 11 parameter fits in which an extra parameter is allowed to be free for each fitted parton distribution. In practice only three of these extra parameter variations have significantly different PDF shapes from the central fit. The parametrization errors are calculated by storing the difference between the parametrization variant and the central fit and constructing an envelope representing the maximal deviation at each value. (For HERAPDF1.0 the parametrization variations are provided as members 9 to 13 of the LHAPDF file: HERAPDF10 VAR.LHgrid).

HERAPDF also provide an estimate of the additional error due to the uncertainty on . Fits are made with the central value, , varied by . The c.l. error on any variable should be calculated by adding in quadrature the difference between its value as calculated using the central fit and its value using these two alternative values; c.l. values may be obtained by scaling the result down by 1.645. (For HERAPDF1.0 these variations are provided as members 9,10,11 of the LHAPDF file: HERAPDF10 ALPHAS.LHgrid for , respectively). Additionally members 1 to 8 provide PDFs for values of ranging from 0.114 to 0.122). The total PDF + uncertainty for HERAPDF should be constructed by adding in quadrature experimental, model, parametrization and uncertainties.

3.24 NNPDF - Combined PDF and uncertainties

For the NNPDF2.0 family, PDF sets obtained with values of in the range from 0.114 to 0.124 in steps of are available in LHAPDF. Each of these sets is denoted by NNPDF20_as_0114_100.LHgrid, NNPDF20_as_0115_100.LHgrid, … and has the same structure as the central NNPDF20_100.LHgrid set: PDF set number 0 is the average PDF set, as discussed above


for the different values of , while sets from 1 to 100 are the 100 PDF replicas corresponding to this particular value of . Note that in general not only the PDF central values but also the PDF uncertainties will depend on .

The methodology used within the NNPDF approach to combine PDF and uncertainties is discussed in Ref. [19, 28]. One possibility is to add in quadrature the PDF and uncertainties, using PDFs obtained from different values of , which as discussed above is correct in the quadratic approximation. However use of the exact correlated Monte Carlo formula turns out to be actually simpler, as we now show.

If the sum in quadrature is adopted, for a generic cross section which depends on the PDFs and the strong coupling , we have


where PDF stands schematically for the PDFs obtained when is varied within its 1–sigma range, . The PDF+ uncertainty is


with the PDF uncertainty on the observable computed from the set with the central value of .

The exact Monte Carlo expression instead is found noting that the average over Monte Carlo replicas of a general quantity which depends on both and the PDFs, is


where stands for the replica of the PDF fit obtained using as the value of the strong coupling; is the total number of PDF replicas


and is the number of PDF replicas for each value of . If we assume that is gaussianly distributed about its central value with width equal to the stated uncertainty, the number of replicas for each different value of is


with and the assumed central value and 1–sigma uncertainty of . Clearly with a Monte Carlo method a different probability distribution of values could also be assumed. For example, if we assume and we take nine distinct values , assuming 100 replicas for the central value () we get .

The combined PDF+ uncertainty is then simply found by using Eq. (6) with averages computed using Eq. (15). The difference between Eq. (15) and Eq. (14) measures deviations from linear error propagation. The NNPDF benchmark results presented below are obtained using Eq. (15) with at one sigma. No significant deviations from linear error propagation were observed.

It is interesting to observe that the same method can be used to determine the combined uncertainty of PDFs and other physical parameters, such as heavy quark masses.

4. PDF correlations

The uncertainty analysis may be extended to define a correlation between the uncertainties of two variables, say and As for the case of PDFs, the physical concept of PDF correlations can be determined both from PDF determinations based on the Hessian approach and on the Monte Carlo approach.

4.1 PDF correlations in the Hessian approach

Consider the projection of the tolerance hypersphere onto a circle of radius 1 in the plane of the gradients and in the parton parameter space [13, 24]. The circle maps onto an ellipse in the plane. This “tolerance ellipse” is described by Lissajous-style parametric equations,


where the parameter varies between 0 and , and . and are the maximal variations and evaluated according to the Equation, and is the angle between and in the space, with


The quantity characterizes whether the PDF degrees of freedom of and are correlated (), anti-correlated (), or uncorrelated (). If units for and are rescaled so that (e.g., ), the semimajor axis of the tolerance ellipse is directed at an angle (or with respect to the axis for (or ). In these units, the ellipse reduces to a line for and becomes a circle for , as illustrated by Fig. 3. These properties can be found by diagonalizing the equation for the correlation ellipse. Its semiminor and semimajor axes (normalized to ) are


The eccentricity is therefore approximately equal to as .

Fig. 3: Correlations ellipses for a strong correlation (left), no correlation (center) and a strong anti-correlation(right) [4].

A magnitude of close to unity suggests that a precise measurement of (constraining to be along the dashed line in Fig. 3) is likely to constrain tangibly the uncertainty in , as the value of shall lie within the needle-shaped error ellipse. Conversely, implies that the measurement of is not likely to constrain strongly.555The allowed range of for a given is where

The values of and are also sufficient to estimate the PDF uncertainty of any function of and by relating the gradient of to and via the chain rule:


Of particular interest is the case of a rational function pertinent to computations of various cross section ratios, cross section asymmetries, and statistical significance for finding signal events over background processes [24]. For rational functions Eq. (23) takes the form


For example, consider a simple ratio, . Then is suppressed () if and are strongly correlated, and it is enhanced () if and are strongly anticorrelated.

As would be true for any estimate provided by the Hessian method, the correlation angle is inherently approximate. Eq. (20) is derived under a number of simplifying assumptions, notably in the quadratic approximation for the function within the tolerance hypersphere, and by using a symmetric finite-difference formula for that may fail if is not monotonic. With these limitations in mind, we find the correlation angle to be a convenient measure of interdependence between quantities of diverse nature, such as physical cross sections and parton distributions themselves. For example, in Section 5.22, the correlations for the benchmark cross sections are given with respect to that for production. As expected, the and cross sections are very correlated with that for the , while the Higgs cross sections are uncorrelated (=120 GeV) or anti-correlated (=240 GeV). Thus, the PDF uncertainty for the ratio of the cross section for a 240 GeV Higgs boson to that of the cross section for boson production is larger than the PDF uncertainty for Higgs boson production by itself.

A simple code (corr.C) is available from the PDF4LHC website that calculates the correlation cosine between any two observables given two text files that present the cross sections for each observable as a function of the error PDFs.

4.2 PDF correlations in the Monte Carlo approach

General correlations between PDFs and physical observables can be computed within the Monte Carlo approach used by NNPDF using standard textbook methods. To illustrate this point, let us compute the the correlation coefficient for two observables and which depend on PDFs (or are PDFs themselves). This correlation coefficient in the Monte Carlo approach is given by


where the averages are taken over ensemble of the values of the observables computed with the different replicas in the NNPDF2.0 set, and are the standard deviations of the ensembles. The quantity characterizes whether two observables (or PDFs) are correlated (), anti-correlated () or uncorrelated ().

This correlation can be generalized to other cases, for example to compute the correlation between PDFs and the value of the strong coupling , as studied in Ref. [19, 28], for any given values of and . For example, the correlation between the strong coupling and the gluon at and (or in general any other PDF) is defined as the usual correlation between two probability distributions, namely


where averages over replicas include PDF sets with varying in the sense of Eq. (15). Note that the computation of this correlation takes into account not only the central gluons of the fits with different but also the corresponding uncertainties in each case.

5. The PDF4LHC benchmarks

A benchmarking exercise was carried out to which all PDF groups were invited to participate. This exercise considered only the-then most up to date published versions/most commonly used of NLO PDFs from 6 groups: ABKM09 [2][3], CTEQ6.6 [4], GJR08 [7], HERAPDF1.0 [8], MSTW08 [9], NNPDF2.0 [10]. The benchmark cross sections were evaluated at NLO at both 7 and 14 TeV. We report here primarily on the 7 TeV results.

All of the benchmark processes were to be calculated with the following settings:

  1. at NLO in the scheme

  2. all calculation done in a the 5-flavor quark ZM-VFNS scheme, though each group uses a different treatment of heavy quarks

  3. at a center-of-mass energy of 7 TeV

  4. for the central value predictions, and for and c.l. PDF uncertainties

  5. with and without the uncertainties, with the prescription for combining the PDF and errors to be specified

  6. repeating the calculation with a central value of of 0.119.

To provide some standardization, a gzipped version of MCFM5.7 [25] was prepared by John Campbell, using the specified parameters and exact input files for each process. It was allowable for other codes to be used, but they had to be checked against the MCFM output values.

The processes included in the benchmarking exercise are given below.

  1. and cross sections and rapidity distributions including the cross section ratios and ( and the asymmetry as a function of rapidity ([]).

    The following specifications were made for the and cross sections:

    1. =91.188 GeV

    2. =80.398 GeV

    3. zero width approximation used

    4. =0.116637 X

    5. = 0.2227

    6. other EW couplings derived using tree level relations

    7. BR() = 0.03366

    8. BR() = 0.1080

    9. CKM mixing parameters from Eq. 11.27 of the PDG2009 CKM review

    10. scales: = or

  2. total cross sections at NLO in the Standard Model

    The following specifications were made for the Higgs cross section.

    1. = 120, 180 and 240 GeV

    2. zero Higgs width approximation, no branching ratios taken into account

    3. top loop only, with = 171.3 GeV in

    4. scales:

  3. cross section at NLO

    1. = 171.3 GeV

    2. zero top width approximation, no branching ratios

    3. scales:

The cross sections chosen are all important cross sections at the LHC, for standard model benchmarking for the case of the and top cross sections and discovery potential for the case of the Higgs cross sections. Both and initial states are involved. The NLO and cross sections have a small dependence on the value of , while the dependence is sizeable for both and Higgs production.

5.1 Comparison between benchmark predictions

Now we turn to compare the results of the various PDF sets for the LHC observables with the common benchmark settings discussed above. To perform a more meaningful comparison, it is useful to first introduce the idea of differential parton-parton luminosities. Such luminosities, when multiplied by the dimensionless cross section for a given process, provide a useful estimate of the size of an event cross section at the LHC. Below we define the differential parton-parton luminosity :


The prefactor with the Kronecker delta avoids double-counting in case the partons are identical. The generic parton-model formula


can then be written as


Relative quark-antiquark and gluon-gluon PDF luminosities are shown in Figures 4 and 5. CTEQ6.6, NNPDF2.0, HERAPDF1.0, MSTW08, ABKM09 and GJR08 PDF luminosities are shown, all normalized to the MSTW08 central value, along with their 68 %c.l. error bands. The inner uncertainty bands (dashed lines)for HERAPDF1.0 correspond to the (asymmetric) experimental errors, while the outer uncertainty bands (shaded regions) also includes the model and parameterisation errors. It is interesting to note that the error bands for each of the PDF luminosities are of similar size. The predictions of W/Z, and Higgs cross sections are in reasonable agreement for CTEQ, MSTW and NNPDF, while the agreement with ABKM, HERAPDF and GJR is somewhat worse. (Note however that these plots do not illustrate the effect that the different values used by different groups will have on (mainly) and Higgs cross sections.) It is also notable that the PDF luminosities tend to differ at low and high , for both and luminosities. The CTEQ6.6 distributions, for example, may be larger at low than MSTW2008, due to the positive-definite parameterization of the gluon distribution; the MSTW gluon starts off negative at low and and this results in an impact for both the gluon and sea quark distributions at larger values. The NNPDF2.0 luminosity tends to be somewhat lower, in the region for example. Part of this effect might come from the use of a ZM heavy quark scheme, although other differences might be relevant.

Fig. 4: The luminosity functions and their uncertainties at 7 TeV, normalized to the MSTW08 result. Plot by G. Watt [27].
Fig. 5: The luminosity functions and their uncertainties at 7 TeV, normalized to the MSTW08 result. Plot by G. Watt [27].

After having performed the comparison between PDF luminosities, we turn to the comparison of LHC observables. Perhaps the most useful manner to perform this comparison is to show the cross–sections as a function of , with an interpolating curve connecting different values of for the same group, when available [27] (see Figs. 6-9). Following the interpolating curve, it is possible to compare cross sections at the same value of . The predictions for the CTEQ, MSTW and NNPDF and cross sections at 7 TeV (Figs. 6-7) agree well, with the NNPDF predictions somewhat lower, consistent with the behaviour of the luminosity observed in Fig. 4. The cross sections from HERAPDF1.0 and ABKM09 are somewhat larger 666Updated versions of these plots, including an extension to NNLO, will be presented in a forthcoming MSTW publication. See also Ref. [1].. The impact from the variation of the value of is relatively small. Basically, all of the PDFs predict similar values for the cross section ratio; much of the remaining uncertainty in this ratio is related to uncertainties in the strange quark distribution. This will serve as a useful benchmark at the LHC. A larger variation in predictions can be observed for the ratio (see Fig. 7). This quantity depends on the separation of the quarks into flavours and the separation between quarks and antiquarks. The data providing this information only extends down to , and consists partially of neutrino DIS off nuclear targets. Hence, different groups provide different results because they fit different choices of data, make different assumptions about nuclear corrections and make different assumptions about the parametric forms of nonsinglet quarks relevant for .

Fig. 6: Cross section predictions at 7 TeV for and production. All cross sections plotted here use a value of . Plot by G. Watt [27].
Fig. 7: Cross section predictions at 7 TeV for the and production. All cross sections plotted here use a value of . Plot by G. Watt [27].

The predictions for Higgs production from fusion (Figs. 8-9) depend strongly on the value of : the anticorrelation between the gluon distribution and the value of is not sufficient to offset the growth of the cross section (which starts at and undergoes a large correction). The CTEQ, MSTW and NNPDF predictions are in moderate agreement but CTEQ lies somewhat lower, to some extent due to the lower choice of . Compared at the common value of , the CTEQ prediction and that of either MSTW or NNPDF, have one-sigma PDF uncertainties which just about overlap for each value of . If the comparison is made at the respective reference values of , but without accounting for the uncertainty, the discrepancies are rather worse, and indeed, even allowing for uncertainty, the bands do not overlap. Hence, both the difference in PDFs and in the dependence of the cross section on the value of are responsible for the differences observed. A useful measure of this is to note that the difference in the central values of the MSTW and CTEQ predictions for a common value of for a 120 GeV Higgs (a typical discrepancy) is equivalent to a change in of about 0.0025. The worst PDF discrepancy is similar to a change of about 0.004. The predictions from HERAPDF are rather lower, reflecting the behaviour of the gluon luminosity of Fig. 5. The ABKM and GJR predictions are also rather lower, but the dependence of results is not explicitly available for these groups, hence it is hard to tell how much of the discrepancy is due to the fact that these groups adopt low values of .

Production of a pair (Fig. 9, right plot) probes the gluon-gluon luminosity at a higher value of , with smaller higher order corrections than present for Higgs production through fusion. The cross section predictions from CTEQ6.6, MSTW2008 and NNPDF2.0 are all seen to be in good agreement, especially when evaluated at the common value of of 0.119.

Fig. 8: Cross section predictions at 7 TeV for a Higgs boson ( fusion) for a Higgs mass of 120 GeV (left) and 180 GeV(right). Plot by G. Watt [27].
Fig. 9: Cross section predictions at 7 TeV for a Higgs boson of mass 240 GeV (left) and for production (right). Plot by G. Watt [27].

5.2 Tables of results from each PDF set

In the subsections below, we provide tables of the benchmark cross sections from the PDF groups participating in the benchmark exercise. Only results for 7 TeV will be provided for this interim version of the note.

5.21 ABMK09 NLO 5 Flavours

In the following sub-section, the tables of relevant cross sections for the ABKM09 PDFs are given. Results are given for the value of determined from the fit. The charm mass is taken to be GeV and the bottom mass is taken to be GeV. The heavy quark mass uncertainites are incorporated in with the PDF uncertainties.

The results obtained with the ABKM09 NLO 5 flavours set are reported in Tables 1-2.

Process Cross section combined PDF and errors
6.3398 0.0981
4.2540 0.0657
0.9834 0.0151
139.55 7.96
11.663 0.314
4.718 0.147
2.481 0.092

Table 1. Benchmark cross section predictions and uncertainties for ABKM09 NLO for and Higgs production (120, 180, 240 GeV) at 7 TeV. The central prediction is given in column 2. Errors are quoted at the 68% CL. The PDF and errors are evaluated simultaneously. Higgs boson cross sections are corrected for finite top mass effects (1.06, 1.15 and 1.31 for masses of 120, 180 and 240 GeV respectively.

PDF + Error PDF + Error PDF + Error
-4.4 0.002 0.0005 0.0001 0.00004 0.00002 0.000004
-4.0 0.102 0.0084 0.0198 0.00262 0.00472 0.000324
-3.6 0.394 0.0114 0.1228 0.01140 0.03321 0.000909
-3.2 0.687 0.0324 0.2663 0.03815 0.07542 0.002259
-2.8 0.878 0.0368 0.4017 0.04089 0.10946 0.002440
-2.4 0.940 0.0298 0.5328 0.01768 0.13367 0.002566
-2.0 0.935 0.0180 0.6249 0.01945 0.14787 0.002834
-1.6 0.915 0.0215 0.6923 0.01479 0.15581 0.002905
-1.2 0.895 0.0219 0.7344 0.01717 0.16042 0.004083
-0.8 0.881 0.0241 0.7625 0.02627 0.16298 0.003530
-0.4 0.867 0.0241 0.7729 0.02364 0.16373 0.004749
0.0 0.863 0.0402 0.7774 0.02215 0.16463 0.003186
0.4 0.870 0.0411 0.7733 0.01379 0.16352 0.005058
0.8 0.871 0.0254 0.7603 0.01647 0.16260 0.003751
1.2 0.891 0.0461 0.7348 0.02070 0.16092 0.003715
1.6 0.926 0.0589 0.6920 0.01416 0.15539 0.004267
2.0 0.934 0.0234 0.6255 0.01680 0.14750 0.003665
2.4 0.938 0.0161 0.5279 0.01737 0.13373 0.003013
2.8 0.873 0.0244 0.4045 0.01109 0.10944 0.002216
3.2 0.692 0.0173 0.2658 0.00600 0.07541 0.001574
3.6 0.393 0.0123 0.1254 0.00765 0.03353 0.001316
4.0 0.100 0.0057 0.0178 0.00434 0.00441 0.000361
4.4 0.002 0.0004 0.0001 0.00003 0.00001 0.000003

Table 2. Benchmark cross section predictions ( in nb) for ABKM09 NLO with for production at 7 TeV, as a function of boson rapidity.

5.22 cteq6.6

In the following sub-section, the tables of relevant cross sections for the CTEQ6.6 PDFs are given (Tables 3-6). The predictions for the central value of are given in bold. Errors are quoted at the 68% c.l. For CTEQ6.6, this involves dividing the normal 90%c.l. errors by a factor of 1.645.

0.116 5.957 4.044 0.9331
0.117 5.993 4.068 0.9384

6.064 4.114 0.9485
0.120 6.105 4.139 0.9539

Table 3: Benchmark cross section predictions for CTEQ6.6 for and production at 7 TeV, as a function of . The results for the central value of for CTEQ6.6 (0.118) are shown in bold.

0.116 11.25 4.69 2.52 149.2
0.117 11.42 4.76 2.57 153.0
0.119 11.75 4.91 2.66 160.5
0.120 11.92 4.99 2.70 164.3

Table 4: Benchmark cross section predictions for CTEQ6.6 for production (masses of 120, 180 and 240 GeV), and for production, at 7 TeV, as a function of . The results for the central value of for CTEQ6.6 (0.118) are shown in bold. Higgs production ross sections have been corrected for the finite top mass effect (a factor of 1.06 for 120 GeV, 1.15 for 180 GeV and 1.3