The Bjorken sum rule with Monte Carlo and Neural Network techniques
Abstract:
Determinations of structure functions and parton distribution functions have been recently obtained using Monte Carlo methods and neural networks as universal, unbiased interpolants for the unknown functional dependence. In this work the same methods are applied to obtain a parametrization of polarized Deep Inelastic Scattering (DIS) structure functions. The Monte Carlo approach provides a bias–free determination of the probability measure in the space of structure functions, while retaining all the information on experimental errors and correlations. In particular the error on the data is propagated into an error on the structure functions that has a clear statistical meaning. We present the application of this method to the parametrization from polarized DIS data of the photon asymmetries and from which we determine the structure functions and , and discuss the possibility to extract physical parameters from these parametrizations. This work can be used as a starting point for the determination of polarized parton distributions.
1 Introduction
In QCD the description of scattering processes at large momentum transfer () involving (polarized) hadrons in the initial state is based on the factorization theorem. The latter allows a separation between the high–energy dynamics, described by coefficient functions which are calculable in perturbative QCD, from low–energy, nonperturbative effects, binding partons into hadrons, which are encoded into (polarized) parton distribution functions (PDFs).
The growth in statistics and increase in precision of data from experiments involving polarized hadrons scattering calls for a more accurate determination of polarized PDFs and their errors. A crucial problem in this respect is the determination of the uncertainty on a function (i.e. a probability measure on a space of functions) from a finite set of experimental data points. In the standard PDF extraction approach to the problem the infinite–dimensional space of continuous functions is mapped into a finite–dimensional space of parameters by choosing a particular basis in the space of functions and truncating the basis to a finite number of elements. This procedure entails some degree of arbitrariness. Any sensible choice must strike a balance between two competing requirements: on the one hand a small number of parameters introduces a bias in the determination of both the functional form and the errors, as the chosen parametrization would not allow enough flexibility; on the other hand a large number of parameters could spoil the convergence of the fit, or be too sensitive to the statistical fluctuation of the experimental data.
This problem has been addressed by the NNPDF Collaboration in the case of unpolarized Deep Inelastic Scattering (DIS) structure functions in Refs. [1, 2], and in the case of the unpolarized PDFs in Refs. [3, 4, 5] using a method based on statistical inference and neural networks as an interpolating tool.
While avoiding technical complications linked to the extraction of PDFs from observables, the determination of structure functions addresses the main issue of devising a faithful estimation of errors on a function extracted from experimental data. The main ingredient in the studies above is the usage of Monte Carlo methods to obtain a representation of the probability measure in the space of structure functions. An ensemble of artificial data is generated, which reproduces all the statistical features (i.e. variances and correlations) of the original experimental data. Each set of artificial data is called a replica. A structure function, parametrized by a neural network, is then fitted to each replica. The net result of this procedure is an ensemble of fitted functions. This ensemble of fitted functions provides a representation of the measure in the space of structure functions. Errors and correlations of any observable involving the structure functions are obtained averaging over the ensemble of fits. Moreover suitable statistical estimators can be defined from the Monte Carlo ensemble which provide a quantitative description of the possible biases and inconsistencies in the fitting procedure. This method has been described in great detail in Refs. [1, 2, 3, 4] to which the interested reader should refer.
The aim of this work is to apply the same techniques to obtain a bias–free parametrization of the photon asymmetries and from available polarized DIS data and extract from them the corresponding structure functions and . We provide further testing of the Monte Carlo method, and produce statistically meaningful error bars for the structure function. Besides allowing us to address all systematics related to the data and the method, such a parametrization might be an ideal input for a fit based on factorization schemeinvariant evolution equations to determine , as proposed in Refs. [6, 7]. As shown in this work, a careful treatment of statistical and systematic errors leads to a reliable extraction of physically meaningful parameters such as , and the higher–twist contributions to the structure functions. While these are not the best determinations available for these parameters, the results we obtain are in agreement with other determinations, and show the robustness of the Monte Carlo method.
We shall now discuss in turn the two steps that are needed to produce the Monte Carlo sample of fitted functions: first the treatment of the experimental data, and then the actual fitting procedure. The experimental data points included in the fit are discussed in Section 2; Section 3 summarizes briefly the NNPDF approach and the characteristics of the neural networks used for this particular study. The results of our fits, together with their phenomenological implications are presented and discussed in Sections 4 and 5.
2 Experimental data
The cross section asymmetry for parallel and antiparallel configurations of longitudinal beam and target polarizations is given by:
(1) 
and it is related to the virtual–photon asymmetries by:
(2) 
The photon depolarization factor depends on kinematic factors and on the ratio:
(3) 
where and are the longitudinal and the transverse cross sections respectively (see e.g. Refs. [8, 9] for a detailed definition of all the quantities).
The polarized structure functions and are related to the virtualphoton asymmetries by:
(4)  
(5) 
where
(6) 
is the unpolarized structure function, , and denotes the nucleon mass.
The main features of each experimental data set used in the present analysis are summarized in Tab. 1, and their kinematical coverage of the plane is shown in Fig. 1. We observe that the kinematical coverage of the available data is rather small, especially when compared to the one of the available unpolarized DIS data, thus we will have a sizable region of the kinematical plane in which the fit extrapolates the behaviour extracted from the region covered by data.
From Tab. 1 we infer that the systematic errors are on average one order of magnitude smaller than the statistical ones. This justifies the procedure of neglecting correlations for systematic errors and the procedure of summing errors in quadrature when computing the figure of merit () to be minimized in the fitting procedure.
Finally we notice that E155 data have been corrected to yield by adding in Eq. (4) the contribution evaluated with the Wandzura–Wilczek relation and using the parametrization of given in Ref. [14]: this shift is also added as a source of uncertainty in the total error of the data set.
Experiment  range  range  Type  Ref.  
Proton  
EMC  0.015  0.466  3.5  29.5  10  0.077  0.024  0.028  [10]  
SMC  0.001  0.480  0.3  58.0  15  0.026  0.003  0.012  [11]  
SMC low  0.0001  0.121  0.02  23.1  15  0.033  0.002  0.006  [12]  
E143  0.031  0.75  1.27  9.52  28  0.045  0.016  0.012  [13]  
E155  0.015  0.75  1.22  34.72  24  0.043  0.018  0.026  [14]  
HERMES06  0.0058  0.7311  0.26  14.29  45  0.126  0.019  0.017  [15]  
Deuteron  
COMPASS  0.0051  0.474  1.18  47.5  12  0.034  0.017  0.011  [16]  
SMC  0.001  0.480  0.3  58.0  15  0.032  0.003  0.006  [11]  
SMC low  0.0001  0.121  0.02  23.1  15  0.069  0.005  0.005  [12]  
E143  0.031  0.75  1.27  9.52  28  0.066  0.011  0.008  [13]  
E155  0.015  0.75  1.22  34.72  24  0.091  0.009  0.011  [14]  
HERMES06  0.0058  0.7311  0.26  14.29  45  0.089  0.007  0.009  [15] 
3 The NNPDF approach
In this section we briefly review the approach used to extract an unbiased determination of the asymmetry and the structure function from the available inclusive polarized DIS data, following the analysis performed by the NNPDF Collaboration for the determination of the unpolarized structure function [1, 2] and the parton densities [3, 4, 5].
The core idea underlying the NNPDF approach is based on using Monte Carlo methods to build a representation of the probability measure in the space of structure functions, and parametrizing the space of structure functions using neural networks. We refer the interested reader to the papers cited above for a detailed description of the methods and in the following we will briefly discuss the settings used in this analysis. It is worthwhile emphasizing that the Monte Carlo method does not require the use of neural networks, and would yield a robust determination of the errors with any parametrization of the structure function, provided the parametrization is sufficiently flexible. A comparison of Monte Carlo analyses based on different parametrizations was performed in the framework of the HERALHC workshop by comparing the standard H1 and NNPDF analyses. The greater flexibility of the neural network parametrization compared to fixed functional forms is reflected in the larger error bands obtained using the NNPDF method, especially when considering the region not covered by data (i.e. the extrapolation region). These features are illustrated by the results in Sects. 3.2 and 3.4 of Ref. [17].
3.1 MonteCarlo replicas
We generate MonteCarlo replicas of the experimental data according to
(7) 
where are Gaussian distributed random numbers, is the quadratic sum of the normalization errors and is the total error, obtained by summing in quadrature the statistical and systematic errors, the latter assumed to be uncorrelated.
Following Ref. [18], the covariance matrix for experimental data points is evaluated using:
(8) 
while during the fit for each replica, we minimize:
(9) 
where
(10) 
The number of Monte Carlo replicas of the data is determined by requiring that the average over the replicas reproduces the features (central values, errors and correlations) of the original experimental data to a required accuracy. The quantitative check is performed by means of the statistical estimators described in the appendix of Ref. [2] and the results for sets of 10, 100 and 1000 replicas are collected in Tab. 2 for the proton target data and in Tab. 3 for the deuteron target data. We observe that all the considered estimators have the correct scaling behaviour as the number of replica grows. We also point out that the large percentage error on the deuteron central values is due to a bulk of data whose values are close to zero.
10  100  1000  
14.20%  3.21%  1.83%  
0.974203  0.998114  0.999682  
35.45%  12.44%  4.20%  
0.0699  0.0766  0.0768  
0.989956  0.997808  0.999793  
0.1469  0.1567  0.1585  
0.638102  0.944682  0.993523  
0.00166  0.00154  0.00160  
0.898803  0.986219  0.998858 
10  100  1000  
89.71%  24.90%  5.97%  
0.977524  0.98633  0.999865  
35.03%  11.75%  4.4%  
0.0689  0.0739  0.0734  
0.977501  0.997965  0.999705  
0.0878  0.0904  0.0861  
0.612155  0.932158  0.992952  
0.00133  0.00145  0.00134  
0.959275  0.995339  0.999479 
3.2 Neural Networks as unbiased interpolants
Artificial neural networks, see e.g. Ref. [19], are a class of algorithms which provide a robust and universal approximant to incomplete or noisy data, with the only requirement of continuity. Neural networks are universal approximators for measurable functions [20]. This means that any continuous function can be approximated to any degree of accuracy by a sufficiently large neural network with one hidden layer and nonlinear neuron activation function.
One of the main reasons to use neural networks in place of any other redundant parametrization is the existence of efficient techniques for training them, i.e. determining the parameters of the network (thresholds and weights) so that it reproduces a given set of inputoutput data. Equivalently one could say that a sufficiently large neural network provides a description of the data which is largely free of functional bias.
The analysis presented here uses a class of neural networks known as multilayer feedforward perceptrons, trained using a genetic algorithm [21, 22]. The networks we employed have one hidden layer and a 241 architecture, which gives us a total of 17 free parameters for each network to be determined during the training. The guidance principle in the choice of the network architecture to be used is that it should provide a redundant parametrization for the data to be fitted, i.e. the network should have enough flexibility to fit not only the underlying physical law but also the statistical fluctuations of the experimental data. This property is crucial in ensuring that the fit results are not biased by the specific parametrization. The lack of functional bias is established a posteriori by verifying that fits performed with networks with different architectures lead to statistically equivalent results. This is achieved using the statistical estimators introduced in the NNPDF Collaboration’s studies; the results of these comparisons are presented and discussed later.
The training of the individual networks to the Monte Carlo replicas is performed by minimizing the figure of merit given in Eq. (9). Given the extensive size and complex structure of the parameter space (a neural network with parameters, weights and thresholds has equivalent global minimum configurations), the most efficient training algorithm turns out to be a genetic algorithm. The details of the implementation are discussed in Ref. [3].
As already pointed out in various references the fact that we adopt a redundant parametrization and that the figure of merit minimized in the training procedure is monotonically decreasing might lead to overfitting the data: the neural network reproduces not only the underlying physical law but also the statistical noise of the data sample. To prevent this from happening and to determine the optimal fit we adopt a criterion to stop our fit based on the crossvalidation method. Once again our procedure is completely analogous to the one used for the unpolarized NNPDF fits.
For each replica of the experimental data we subdivide the data into a training and a validation set, respectively containing a fraction and of randomly chosen data points of each experiment.
We train one neural network on each replica of the data using the of the training set as a figure of merit to be minimized. In parallel we compute the of the validation set. We stop the training when we find that the smeared over a given number of generations is decreasing for the training set while increasing for the validation set. A graphical illustration of such a behaviour for one of the replicas in the reference fit is given in Fig. 2.
4 Phenomenology
The study of the first moments of polarized structure functions is of phenomenological interest, since they can be used to extract information on the fraction of polarization carried by partons and on physical couplings. In the scheme we have
(11)  
where and are the first moments of the nonsinglet and singlet Wilson coefficient functions, respectively, and
(12)  
(13)  
(14) 
Using isotopic spin invariance, it can be shown that is the axial coupling that governs neutron decay. Accurate measurements yield (see e.g. Ref. [9]):
(15) 
The difference of the moments for proton and neutron leads to the Bjorken sum rule
(16) 
where is the target mass correction and is the correction due to higher twists. Target mass corrections have been studied in Refs. [23, 24, 25, 26] and can be evaluated for any moment at the first order in using [24]:
(17) 
where and is the structure function taken at zero mass of the nucleon. The higher–twist contribution is simply
(18) 
where can be extracted from experimental data at low such as the CLAS data [27]. Finally, the coefficient function of Eq. (16) has been calculated in Ref. [28] and up to order is given by:
(19)  
For the running coupling we use the expanded solution of the renormalization group equation, up to NNLO we have:
(20)  
with
(21) 
(22) 
and the beta function coefficients given by
(23) 
where
(24)  
and .
5 Results
In this section we present our parametrization of the proton and deuteron asymmetries and the structure functions extracted from them.
We assess the quality of the fit by comparing our extraction with the experimental data included in the analysis, and by studying the stability of our results under variations of the parametrization used for the networks.
Then, as an example of a possible application of our result to a phenomenological analysis, we study the extraction of the physical parameters (the strong coupling constant and the axial coupling ) from the Bjorken sum rule. In order to give a faithful error on the extracted quantities we study the impact of the different assumptions which are needed to reconstruct the structure functions and then to evaluate the Bjorken sum rule. Results are compared to existing estimates.
5.1 The final fit and its statistical features
Proton  Deuteron  

EMC  0.370  COMPASS  0.885 
SMC  0.480  SMC  1.100 
SMC low  1.150  SMC low  0.774 
E143  0.904  E143  1.530 
E155  0.717  E155  0.661 
HERMES06  0.456  HERMES06  0.881 
Total  0.666  Total  0.986 
In Tab. 4 we show the for each target and each experimental data set included in the present analysis. We first observe the overall good quality of our fit. For the proton the small values of for EMC, SMC and HERMES can be explained by a possible overestimate of experimental errors. For the deuteron all the are of order 1, except for the E155 data set which has a value of significantly smaller than one. The somewhat larger value of for the E143 deuteron data set can be understood by looking at Figs. 3 and 4 where we present a comparison of our fit to experimental data in different kinematical regions. We observe that in the case of E143 the deuteron data show small incompatibilities among themselves, and the large value of is a reflection of this. It is interesting to remark that a careful analysis of the value for each experiment allows the identification of potential incompatible data. This feature had already been pointed out in the unpolarized studies by the NNPDF Collaboration.
In Tabs. 5, 6, and 7 we study the self–stability of the fit and the stability against the variation of the parametrization with respect to a smaller and a larger architecture. To this extent we define four different regions: one where we expect our fit to be an interpolation of the available data (Data region) and three where its behaviour is extrapolated to regions of the plane not covered by present data:

Data: and ;

Low: and ;

Low: and ;

High: and .
We observe that all the estimators for self–stabilities are of order unity (or smaller), meaning that different subsets within the whole ensemble of replicas have the same statistical features.
When we compare our final fit to a fit performed using networks with a smaller architecture, we notice that the the two fits are statistically equivalent. The same happens for the comparison with a fit done with networks with a larger architecture, with the only exception of the errors on the deuteron fit in the extrapolation (all distances are order 1.5), which show some minor instability.
Proton  Data  Low  Low  High 

Deuteron  Data  Low  Low  High 
Proton  Data  Low  Low  High 

Deuteron  Data  Low  Low  High 
Proton  Data  Low  Low  High 

Deuteron  Data  Low  Low  High 
5.2 Structure functions reconstruction
In order to reconstruct the structure function from data on the asymmetry as given in Eq. (4) some additional assumptions are needed. In the following we assess the impact of our assumptions for , and on the determination of first moment of . These checks are done using an ensemble of 100 replicas, which is enough to this purpose, and in a range of and which is entirely in the data region in order to avoid any extrapolation effects. Finally, we compare our result for 1000 replicas with the sum rules obtained by experimental collaborations.
The first assumption whose impact we consider is the one on the structure function , which is evaluated from the Wandzura–Wilczek relation [29]
(25) 
Inserting this expression into Eq. (4) gives
(26) 
which needs to be evaluated iteratively. To this purpose we take the initial value evaluated with , and we use
(27) 
Proton  

Deuteron  
From Tab. 8 we see that for values above one iteration is enough to stabilize the result for the first moment of computed in the data region. For lower scales, say , at least two iterations of the WandzuraWilczek relation are needed in order to obtain a stable result. In the following the index will be omitted as the number of iterations used should be evident from the scale at which the first moment of is evaluated.
For the unpolarized structure function we use the parametrization given in Ref. [2] for the proton and the one given in Ref. [1] for the deuteron. Since these parametrizations have also been extracted using a Monte Carlo procedure, ensembles of replicas are available for ; hence the result for is evaluated as:
(28) 
which takes into account both the uncertainty on and the one on (with we denote the expression in Eq. (25) evaluated for the th replica). Since there is no correlation between the extraction of and the one of the replicas of , and can be sampled independently.
In order to estimate the contribution of the uncertainty on to the uncertainty on , we can recompute as
(29) 
where for each th replica of we use the averaged value of the unpolarized structure function
(30) 
This procedure clearly freezes the fluctuations in , which is kept fixed to its average value. The result is given in Tab. 9, where we see that the contribution to the uncertainty on the first moment of due to is negligible. In the following we will always use as given from Eq. (28).
Proton  

Eq. (28)  
Eq. (29)  
Deuteron  
Eq. (28)  
Eq. (29) 
Finally a parametrization of is needed in order to extract from . Here we use given in Ref. [30, 31]. Such a parametrization provides also an error estimate, which we use to assess the impact of on the total uncertainty of the first moment of . In Tab. 10 we compare the sum rule evaluated with the central value of with the one obtained by taking into account the error on . This is achieved by letting fluctuate within its own error in the Monte Carlo sample; for the th replica we use:
(31) 
where is the error on the parametrization, and is a univariate Gaussian random number. Since is a parametrization of experimental data, we take the error as a statistical one, with no correlation between different replicas, and thus we use a different random number each time a value of is needed. From the results collected in Tab. 10 we conclude that the error on is also negligible.
Proton  

Deuteron  
We will now compare our results for the integral of at different scales and over different ranges obtained using our ensemble of 1000 replicas with those obtained by different experimental collaborations.
Target  SMC98  This Analysis 

p  
d 
In Tab. 11 results for the proton and the deuteron sum rules are compared to the result of Ref. [11]. We observe that the results are compatible within errors, and that our evaluation has a larger error.
In Tab. 12 we compare our result with the ones in Refs. [13], and [15]. First we notice that the errors are of the same size, while our central values are systematically smaller, with a significant difference for the proton at low . A substantial part of effect can be attributed to the different parametrization used for the unpolarized structure function. Indeed, if we evaluate the sum rule of E143 with the SMC98 parametrization [32, 33, 11], at we obtain which is less than one sigma away from the result in Refs. [13]; the same happens for the HERMES06 case, using the ALLM parametrization [34] at we get . This can be understood looking at Fig. 5 where we compare the different parametrizations for used in the different analysis.
It is clear that, while the different parametrizations agree in the kinematical region covered by experimental data, they differ significantly at low in the large region where there are no data and an extrapolation is needed. For the ALLM and the SMC98 parametrizations the large behaviour is determined by the chosen functional form; the NNPDF parametrization interpolates by continuity from the last experimental point to the kinematical constrain . The difference among the parametrizations is then enhanced once we multiply by the asymmetry to reconstruct the polarized structure function .


Since no neutron target data have been used in our fit, the neutron structure function, is evaluated from the proton and deuteron ones as
(32) 
where is the probability of the deuteron to be in the D state; we use