Reinterpreting the development of extensive air showers initiated by nuclei and photons
Abstract
Ultrahigh energy cosmic rays (UHECRs) interacting with the atmosphere generate extensive air showers (EAS) of secondary particles. The depth corresponding to the maximum development of the shower, , is a wellknown observable for determining the nature of the primary cosmic ray which initiated the cascade process. In this paper, we present an empirical model to describe the distribution of for EAS initiated by nuclei, in the energy range from eV up to eV, and by photons, in the energy range from eV up to eV. Our model adopts the generalized Gumbel distribution motivated by the relationship between the generalized Gumbel statistics and the distribution of the sum of nonidentically distributed variables in dissipative stochastic systems. We provide an analytical expression for describing the distribution for photons and for nuclei, and for their first two statistical moments, namely and . The impact of the hadronic interaction model is investigated in detail, even in the case of the most uptodate models accounting for LHC observations. We also briefly discuss the differences with a more classical approach and an application to the experimental data based on information theory.
a,b,c]Manlio De Domenico^{1}^{1}footnotetext: Now at Departament d’Enginyeria Informática i Matemátiques, Universitat Rovira i Virgili, Tarragona, Spain., d]Mariangela Settimo, e]Simone Riggi, f]Eric Bertin
Prepared for submission to JCAP
Reinterpreting the development of extensive air showers initiated by nuclei and photons

Laboratorio sui Sistemi Complessi, Scuola Superiore di Catania,
Via Valdisavoia 9, 95123 Catania, Italy 
Istituto Nazionale di Fisica Nucleare, Sez. di Catania,
Via S. Sofia 64, 95123 Catania, Italy 
Dipartimento di Fisica e Astronomia, Universitá degli Studi di Catania,
Via S. Sofia 64, 95123 Catania, Italy 
Laboratoire de Physique Nucléaire et de Hautes Energies (LPNHE), Universités Paris 6 et Paris 7, CNRSIN2P3, Paris, France,
University of Siegen, Germany 
INAF, Osservatorio Astrofisico di Catania, Italy

Université de Lyon, Laboratoire de Physique, École Normale Supérieure de Lyon, CNRS, 46 allée d’Italie, F69007 Lyon, France
Keywords: Ultrahigh energy cosmic rays, extensive air shower, mass composition, generalized Gumbel distribution, highenergy interactions, hadronic models
1 Introduction
The composition of cosmic rays at ultrahigh energies is a fundamental observable to test and constrain the present theories concerning the origin, nature and production site properties of a such extreme radiation. The measurement of the composition is experimentally based on the correlation of shower observables to the mass of the primary particle. Among them, the atmospheric depth of shower maximum () and the number of muons at ground level () exhibit the strongest correlation. An exhaustive review on experimental techniques over a wide energy range is available in [1].
The current results from the Pierre Auger Observatory [2, 3], HiRes [4], Telescope Array (TA) [5] and Yakutsk [6] agree at the level of their quoted systematics suggesting a light composition below 10 eV. Whereas the measurements from Auger indicate a mixed composition changing from light to heavier component at energies of about 10 eV, the HiRes and TA results is compatible with a constant composition dominated by light elements. A detailed comparison between these experiments and a discussion on the agreement between their results is given in [7] as a result of a joined effort between the collaborations. All these experiments make use of the mean and the rms of the distributions as a function of energy to get indications of the nature of the primary cosmic ray. On the other hand the interpretation of in terms of nuclear mass rely on the predicted values from Monte Carlo simulations of the shower development in atmosphere (see, for instance, [8]). They are strongly based on detailed models of the hadronic interactions at very high energies and therefore require to extrapolate the interaction parameters from laboratory experiments performed at lower energies. This introduces an additional source of uncertainty in the interpretation of the composition measurements. In addition, the analysis of composition data typically requires to employ a large sample of shower simulations with more than one hadronic model on the market, different primary nuclei and a wide range of primary energies and angles.
This context motivates the development of an analytical model to describe the distribution of the depth of the shower maximum parameter not only to speedup the composition analysis based on (elongation rate, RMS(), distributions fitting) but also to overcome the limitations related to the low event statistics.
In this paper we introduce a method to describe the distribution of by applying statistical methods typically developed for special dissipative stochastic systems, i.e., dynamical systems of many particles governed by stochastic rules, and subjected to energy injection and dissipation, keeping the whole process far from thermodynamic equilibrium.
In the literature [9], the fluctuations observed in the position of the , for a given primary particle and energy, can be related to the stochastic fluctuations in (i) the position of the first interaction point in the top layers of the atmosphere and (ii) the secondary interactions occurring along the shower development.
Thus, by assuming that such interactions are not correlated to each other, the distribution can be described by the convolution of these two terms:
(1.1) 
where follows an exponential distribution with interaction length , refers to the shower development and is its probability distribution. Since the shower development involves a large number of particles, the fluctuations in the secondary subcascades are assumed to be Gaussiandistributed and the distribution is parameterized with the numerical function:
(1.2) 
where and are the parameters of the Gaussian function, and erf is the standard error function. However, such an assumption is a rather delicate issue, because the fluctuations involve many physical processes as, for instance, the multiplicity of produced secondaries and the energy loss of the leading particle.
In this paper, we adopt an alternative approach to describe the fluctuations observed in the position of the by relating the shower development to the dynamics of special dissipative stochastic systems. In this context, it has been recently shown [12, 13] that the sum of correlated and/or nonidentically distributed random variables in such systems follows a generalized Gumbel distribution, at least in the framework of specific models. In particular we find that this distribution describes also the distribution of , thus providing the basis for a new statistical description of the shower development. Intriguingly, this result applies for any primary nucleus and photons, unrevealing the universality of the underlying process.
In this paper we use Monte Carlo simulations of several primary particles, generated in the energy range between 10 and 10 eV and assuming different hadronic interaction models, to extract the expected distribution. Hence we fit the generalized Gumbel to the simulated and we derive the dependence of its parameters against physical observables, namely the primary energy and the primary mass.
The paper is organized as follows. The description of our model is given in Sec. 2. In Sec. 3 we introduce the parameterization of the generalized Gumbel function. More specifically, we provide a practical method to obtain a likely value of for all nuclei and photons, and for three different hadronic models widely adopted for such studies, i.e., QGSjet II03 [14], Sibyll 2.1 [15] and Epos 1.99 [16]. Moreover, we also use the latest updated versions, including the corrections observed from LHC results [17], namely QGSjet II04 [18] and EposLHC [19]. Finally in Sec. 4 we discuss the potentiality of the proposed method.
2 A model to describe shower maxima behavior
The description of the development of the EAS is not trivial because of the number of processes and correlated properties (cross sections, inelasticity, multiplicity,…) that are involved. A simplified and schematic description of the development of the particle cascade was initially proposed by Heitler [20] for the case of the electromagnetic showers, induced by a photons or electron (positron). Unless explicitly written, hereafter we denote with electrons both electrons and its antiparticle. More recently the model has been generalized to the case of hadronic showers (i.e. induced by nuclei) in [10, 11]. We refer to the original papers for a full description of the Heitler model and of its generalized version. A sketch of the two models, is illustrated in Fig. 1. In the case of the electromagnetic component (Fig. 1a) each particle, namely photons (), electrons () and positrons (), traverse the atmosphere for a radiation length () before interacting or decaying until the energy falls below the critical energy . Therefore, each particle produces two new particles according to the dominant processes (pair production for photons and Bremsstrahlung radiation for electrons). In this model two other assumptions are done: the two produced particles equally share the parent’s energy and the interaction length is the same for all particles. From this simple model, a rough estimate of the is obtained as , with E the initial energy of the particle inducing the cascade.
In the case of the hadronic component (Fig. 1b) the process involves the production of many particles of different species, the multiplicity of newly generated particles is not constant, and the interaction length plays the role of the radiation length. However, the development of the hadronic component shares with the electromagnetic case the same general structure, although different physical interactions are involved.
In this section we investigate the basic Heitler model and its generalized counterpart from the point of view of statistics. The Heitler model, with its assumptions, is a simplification valid only for limited aims and contexts. Indeed, with no regards for the particle type, it is more realistic to assume that, at each step of the shower development, the probability of interaction (or decay) per unit length follows the exponential law , where is the mean free path. On the other hand, the number of secondary particles rapidly increases with the atmospherical depth until the critical energy is reached. The presence of nuclear particles of different species makes difficult an exact description of the statistics of their number. However, to obtain , we are interested in the atmospherical depth corresponding to the maximum number of secondary particles, where the critical energy is reached. In practice, the energy of the primary particle is deposited through the atmosphere at each step of the shower development and corresponds to the atmospherical depth where is maximum, i.e., when is reached. In the Heitler model, new particles are generated at the same time for increasing . However, such an assumption is rather unphysical for different reasons. Firstly, because of the stochasticity of the process, some particles might not interact or decay at the th step of the development: therefore, they still survive at a larger atmospherical depth, causing a fluctuation in the expected number of newly created particles. Secondly, the process is dissipative, i.e., some energy is lost by developing the system between th and th steps. Hence, the number of particles at atmospherical depth is the sum of the number of newly created particles at depth and the number , , etc., of surviving particles from the smaller atmospherical depths, i.e., is the sum of correlated random numbers.
Before introducing the Gumbel distribution and its interpretation for dissipative systems, it is of interest to briefly recall the standard central limit theorem (CLT). The CLT for independent and identically distributed (i.i.d.) random variables , with finite variance, states that the statistics of their sum
(2.1) 
can be described by a Gaussian distribution in the limit . More precisely, since each variable has the same finite mean and variance , the rescaled variable
(2.2) 
converges in law when to the normal distribution, if and .
However, when at least one among these conditions (i.i.d and finite second moment) is not fulfilled, the CLT is no longer valid. The case when the second moment is not finite still leads to a class of universal functions, the socalled Lévystable distributions, depending on two shape parameters and two scale parameters, and the CLT can be generalized accordingly. The cases involving correlated and/or nonidentically distributed random variables is much more complicated and, in general, there are no simple criteria to ensure the applicability of the CLT or its extended version (we refer to, e.g., [21] for a recent review about this topic). However, it has been recently observed that for a given class of physical systems, namely systems with dissipative interactions, the energy distribution (where the energy is defined as a sum of random variables) becomes close to a Gumbel distribution [22, 23, 24, 25, 26, 27]. In some very simplified cases, the Gumbel distribution has been shown to be the exact one [28, 12]. The Gumbel distribution initially originates from the field of extreme value statistics (see Appendix A), and its relation to problems of sums and to dissipative systems has been clarified recently [12, 13].
In the most simple instances, a typical stochastic dissipative system can be modeled by a onedimensional lattice where: i) energy is injected at a boundary site with given rates, ii) energy is transferred from site to site and with known rates, and iii) energy is dissipated at site with given rate. For specific choices of the rates, the total energy, corresponding to the sum of the energy of each site, follows the generalized Gumbel distribution [12] defined by:
(2.3) 
where and are a location and a scale parameters, related to the mean and the spread of the distribution, respectively. For more general choices of the rates, the exact distribution is not known, but the Gumbel distribution is expected to remain a reasonably good approximation of the true distribution, and thus appears as a natural candidate to fit the data. A more detailed discussion about the Gumbel distribution and its first and second moments is given in Appendix A, while more details about the estimation of the parameters are given in the next section.
We argue that the generalized Heitler model, previously described, can be considered as such a dissipative system, where the quantities playing a fundamental role are reinterpreted as follows: i) the energy injection is equal to the primary energy at site and it is zero at the last site , corresponding to the atmospherical depth where the critical energy is reached; ii) the energy of each site is proportional to the number of secondary particles in that site; iii) the transfer of energy coincides with the shower development from the site to its neighbor ; iv) the dissipation at site corresponds to the missing energy per site.
Motivated by this analogy between the two models, we can adopt the generalized Gumbel distribution to describe the statistics of the maximum number of particles in the shower development.
In the next section we use Monte Carlo simulations of EAS to show that the generalized Gumbel distribution can be used with remarkable accuracy also to describe the distribution of values corresponding to .
3 Parameterizing the distribution of UHECR
For testing our model and its validity in reproducing the distribution, we have performed Monte Carlo simulations of the EAS, for different primary particle types, hadronic interaction models and energies. More specifically, the procedure used in this section follows two steps: (1.) the Gumbel density function is parameterized as a function of physical observables; (2.) the goodness of the parameterized function is tested against independent simulations. It is worth mentioning that we make use of values obtained by fitting the shower profile with the GaisserHillas function [29].
For the first step, we have performed simulations using the Conex code [30] for a large set of primary types (p, He, C, N, O, Si, Ca, Fe and photons). For nuclear primaries we also consider three different hadronic interaction models (QGSjet II03, Sibyll 2. and Epos 1.99, and their updated versions (QGSjet II04 and EposLHC), at high energies. Simulations are performed at fixed energies, from to eV in step of 0.5 (0.1) in the logarithm of E. In order to test the goodness of our parameterization, and its extrapolation to intermediate nuclei not involved in the fit procedure, we have generated independent sets of simulations and showers initiated by nuclei (namely Li, Ne and Mn) different from those adopted to obtain the parameterization.
As a first step we fit the generalized Gumbel distribution to the distributions obtained from simulations, for different primaries and energy intervals. We have adopted a loglikelihood maximization procedure to estimate the three parameters , and (we refer to Appendix A for alternative methods). The fit was carried out for all energies and primary nuclei, and the obtained values of the parameters have been parameterized as a function of energy and nuclear mass. Some representative examples are shown in Fig. 2 for the cases of photon (left panel), proton (middle panel) and iron (right panel) with energy 10 EeV and the QGSJet II hadronic interaction model. The distribution obtained from Monte Carlo simulations (markers) is well reproduced by the generalized Gumbel function (red solid line). In Sec. 1 we have mentioned a model based on the convolution of an exponential function, related to the X, and a Gaussian distribution, related to the fluctuations of the shower development. For comparison, we have also shown in Fig. 2 the bestfit curves corresponding to such an approach (blue dashed curve). Both functions provide a good description of the distribution even if the generalized Gumbel distribution better reproduces the tails for small and large values of .
Thus we parameterize the density function as a function of energy and primary type. However, given the significantly different development of hadronic and electromagnetic showers, the details of this parameterization are discussed below for nuclei and photons separately.
3.1 Case of nuclei
As introduced above, to parameterize the distribution for nuclei, we have simulated several extensive air showers in a wide range of energies and for different values of the nuclear mass and then we have parameterized the density as a function of and . For this study we used 8 primary masses, corresponding to the p, He, C, N, O, Si, Ca and Fe.
We have assumed the following empirical parameterizations:
(3.1)  
(3.2)  
(3.3) 
where and are the energy and mass of the primary particle, eV is a reference value and the dependence of the parameters on the nuclear mass is empirically found as:
(3.4)  
(3.5)  
(3.6) 
For each available hadronic model, we have fitted the distribution of as a function of and . Therefore, a set of 21 parameters describes the full set of simulations. These parameters are summarized in Tab. 1 for three hadronic interaction models and their updated versions. It is worth noting that some experiments make use of a different estimation, determined by a quadratic interpolation around the maximum of the energy deposit profile. The resulting value of might differ from the one derived using a GaisserHillas fit of profile by a few g cm. For sake of completeness, we also provide the parameters obtained with this alternative estimation, in Appendix B. However, in the following we will make use of the first parameterization by means of Tab. 1. We checked that, for each energy and primary mass, the residuals between the parameterized function and the bestfit have not any significant bias. In Fig. 3 we show the values of the three parameters , and as a function of energy, for a few representative examples of primary particles, for the case of Sibyll 2.1. The values of and are related to the mean and the variance of the underlying distribution, respectively (see Appendix A). As expected, the parameter linearly increases with energy and it decreases for increasing nuclear mass. The parameter decreases for increasing nuclear mass while the parameter shows the opposite trend, increasing from lighter to heavier nuclear mass. It is worth remarking that we find the same behaviour for all the hadronic models considered in this study.
The absolute residuals of the difference between the and values obtained from generalized Gumbel expectation and the values obtained from simulations (using GaisserHillas fit, i.e., Tab. 1) are plotted against energy in Fig. 4. We show the results for Sibyll2.1 (middle panel) and the two uptodate hadronic models considered in this study, namely Epos LHC (left panel) and QGSJet II04 (right panel), as a function of the energy, for a few representative examples of primary particles. Similar results are obtained for QGSJet II and Epos 1.99 hadronic models. The absolute residuals are smaller than about 3 gr cm independently of energy, improving the accuracy of previous studies [9]. It is worth remarking the goodness of our parameterization in reproducing the distribution over a wide range of nuclear masses and energy.
QGSJet II  

758.444  10.692  1.253  48.892  0.02  0.179  
39.033  7.452  2.176  4.390  1.688  0.170  
0.857  0.686  0.040  0.179  0.076  0.0130  
QGSJet II04  
761.383  11.719  1.372  57.344  1.731  0.309  
35.221  12.335  2.889  0.307  1.147  0.271  
0.673  0.694  0.007  0.060  0.019  0.017  
Sibyll 2.1  
770.104  15.873  0.960  58.668  0.124  0.023  
31.717  1.335  0.601  1.912  0.007  0.086  
0.683  0.278  0.012  0.008  0.051  0.003  
Epos 1.99  
780.013  11.488  1.906  61.911  0.098  0.038  
28.853  8.104  1.924  0.083  0.961  0.215  
0.538  0.524  0.047  0.009  0.023  0.010  
EposLHC  
775.589  7.047  2.427  57.589  0.743  0.214  
29.403  13.553  3.154  0.096  0.961  0.150  
0.563  0.711  0.058  0.039  0.067  0.004 
QGSJet II  

2.346  0.348  0.086  
QGSJet II04  
0.355  0.273  0.137  
Sibyll 2.1  
1.423  0.977  0.191  
Epos 1.99  
0.405  0.163  0.095  
EposLHC  
0.820  0.169  0.027 
Moreover we tested our model against nuclear masses not involved during the fitting procedure previously described. The results of this test are shown in Fig. 5 where we compare the predictions of our model (lines) against the distributions of obtained from simulations (points) for the cases of Li, Ne and Mn nuclei, at eV, in the case of EposLHC (left panel), Sibyll2.1 (middle panel) and QGSJet II04 (right panel) hadronic models. The agreement between expectations and simulations is remarkable, even when extrapolating to nuclei not adopted to derive the parameterization.
3.2 Case of photons
In this section, we extend the study done for nuclei to the case of photon induced showers. As briefly mentioned before, even if the development of electromagnetic showers can be modelled in a simpler way, several effects (LandauPomerachukMigdal, LPM [31, 32], and preshowering in the geomagnetic field [33, 34, 35]) plays an important role at high energy and can significantly modify the distribution of . The average for photons, in the energy range considered in this paper, is typically separated from those of nuclear primaries by 200 g cm. At energy above 10 EeV the for photons further increases because of the LPM effect which suppresses Bremsstrahlung and pairproduction crosssections. Moreover the LPM has the effect of increasing the fluctuations in the shower development. At higher energy ( 50 EeV), photons may convert in the geomagnetic field generating a preshower before entering the atmosphere, and the primary energy is distributed among secondary products (e, e, photons), resulting in a smaller and in a more hadronlike behavior. However, the probability of preshowering is a function of the local geomagnetic field, the energy and arrival direction of the primary photon. Thus a significant difference in the average may be expected in different location on the Earth.
An example of the bestfit curve of the Gumbel distribution to the for photons has been shown in Fig. 2 (left panel). We performed a fit of with =1 and and as in Eq. (3.7) and Eq. (3.8).
(3.7) 
(3.8) 
where , , and are the fitted parameters. Their values, obtained from the fit, are reported in Tab. 2. We found that the use of a generalized Gumbel function () does not improve significantly these results while increase the number of fitted parameters.
3.16e4  5.50e3  3.16e2  6.14  
41.0  17.4  4.10 
The values of estimated from our model is in good agreement (within 5 g cm) with the from an independent set of simulations over the full energy range, as shown in the left panel of Fig. 6. As shown in figure, the method can not be extended above 50 EeV where the LPM and the preshowering effects start to be significant. An example of distribution at high energy (about 80 EeV) is shown in Fig. 6 (right panel) for the parameterized (dots) and for simulations (lines). Because of the LPM effect, the tail of the distributions at large is more enhanced while, the preshowering is responsible of the fraction of events having small values of (around 900 g cm). For this plot the conversion probability is the one calculated in the geomagnetic field as in the Auger South site [36]. The distribution of for “unconverted photons” (dotdashed line) is also shown for comparison.
4 Application and Discussion
In the previous sections we have introduced the Gumbel function and we have shown that it can describe well the distribution of for a wide range of primary nuclei and for photons. We have also shown that the derived parameters are correlated to the primary energy and nuclear masses and we have briefly discussed the limitations of the method. In this section we want to briefly discuss one among the possible applications of the method introduced in the previous sections. More specifically, we propose a procedure to study the chemical composition of observed UHECR with reconstructed energy and a given measured value of .
As anticipated in the introduction, several experimental results, making an extensive use of the mean and the rms of the distribution, suggest that the composition of UHECR is mixed, changing from light to a heavier component, in the energy range between 10 eV and 10 eV.
In the following, let us assume that the observed elongation rate of UHECR, as well as the observed spectrum, is the result of the mixing of different species, with nuclear masses (). For each energy , we describe the distribution of each species by means of a Gumbel probability density (see Eq. (A.3) in App. A) parameterized as in Sec. 3.1 in the case of UHE nuclei and as in Sec. 3.2 in the case of UHE photons.
It is worth remarking that, for a fixed value of the energy and a fixed value of , the functions can not be interpreted as the probability of observing a primary with mass at that energy. Our hypothesis simply consists of assuming that any observed UHECR has a mass among those one considered in the model with masses. Therefore, for any given value of and the sum over all species must sum to 1, in order to obtain a meaningful physical interpretation. Hence, we define
(4.1) 
as the probability that the observed extensive air shower has been initiated by an UHECR with reconstructed energy , measured and mass . In Fig. 7 we show the probability as a function of for EeV, in a model with masses (left panels), masses (central panels) and masses (right panels). Moreover, we show the expectation from Sibyll 2.1 (middle panels) and the uptodate hadronic models considered in this study, namely EposLHC (top panels) and QGSJet II04 (bottom panels).
As a practical application to real data, for instance, information shown in Fig. 7 can be used to estimate the fraction of observed events corresponding to each nuclear mass. In high energy physics, a standard procedure of fitting template distributions obtained from Monte Carlo simulations is given in [37]. In this study, we show a very simplified application in the case of synthetic set of events, generated according to different scenarios. For each scenario we simulate random realizations of per energy bin, ranging from 1 EeV to 35 EeV. In order to create a more realistic sample of events, we smear each value of by means of a Gaussian distribution centered on with dispersion 50 g cm. More specifically, we consider two scenarios with nuclear masses, namely proton (p) and iron (Fe), and four scenarios with nuclear masses, namely proton, helium (He), nitrogen (N) and iron, where we vary the fractions corresponding to each nucleus. We choose these fractions in order to have i) scenarios dominated by lighter composition; ii) scenarios dominated by heavier composition; iii) scenarios with equally distributed composition. It is worth remarking that, for a fixed scenario, we do not vary the fractions with energy.
Here we propose a new procedure to determine with good approximation the chemical composition of a distribution of by assigning weights () to each mass , with the constraint that . The sum, opportunely normalized over all species and for any value of , of the functions , provides a probability density which can be used to approximate the observed distribution by varying the set of nuclear masses and of weights. The goodness of the approximation is estimated by means of the KullbackLeibler divergence [38]
(4.2) 
an information theoretical measure of the difference between two probability distributions. The KullbackLeibler divergence is not symmetric, it is lowerbounded by 0, e.g., when , and it is not upperbounded. Recently, the KullbackLeibler divergence has been adopted to analyze the clustering of UHECR, by quantifying the deviation from isotropic expectation at a given angular scale [39]. It has been shown that an alternative, although equivalent, way to perform the maximum loglikelihood fit is to minimize such a divergence (see Appendix A in [39] and references therein for further detail). More interestingly, the KullbackLeibler divergence has many deep theoretical interpretation, among which i) it quantifies the expected information gain about the “true” distribution when using the model ; ii) it corresponds to the expected loglikelihood ratio [40]; iii) it represents the natural framework to define the Akaike Information Criterion, one of the most efficient methods to perform model selection [41, 42].
For our purpose, thus, we are interested in minimizing the KullbackLeibler divergence, i.e., we look for the set of mass and weight parameters providing the lowest value of , hence, the best approximation to the observed distribution .
The exhaustive search over the whole parameter space is computationally intensive, hence, we make use of a faster but less precise procedure. In fact, the weights are randomly sampled from a uniform distribution between 0 and 1 and constrained to sum up to 1. A total of sets of random weights, for each mass scenario, is already sufficient for the present study. As a representative application, we show in Fig. 8 the distribution obtained from a toy model built with values of from four nuclear masses, p, He, N and Fe, generated with Sibyll 2.1 at 10 EeV. In this example scenario, the fractions corresponding to each nucleus are 90%, 1%, 1% and 8% respectively. The solid line indicates the reconstructed model obtained by means of the KullbackLeibler divergence minimization, showing an excellent agreement.
Scenario  

(p=90%,Fe=10%)  0.87  0.13 
(p=10%,Fe=90%)  0.12  0.88 
(p=10%,He=20%,N=20%,Fe=50%)  0.11  0.51 
(p=25%,He=25%,N=25%,Fe=25%)  0.31  0.33 
(p=90%,He=2.5%,N=2.5%,Fe=5%)  0.86  0.07 
(p=95%,He=1%,N=1%,Fe=3%)  0.90  0.05 
We consider datasets of events simulated with EposLHC and corresponding to different scenarios, including a Gaussian smearing of 20 g cm. We consider two scenarios with nuclear masses (p,Fe) and four scenarios with nuclear masses (p,He,N,Fe). More specifically, we consider the particular case with EeV, and we show the estimated fractions of proton and iron in Tab. 3 by means of KullbackLeibler divergence minimization. By considering the presence of a smearing, the estimated fractions are in excellent agreement with the injected fractions, in each scenario, and similar performances hold for all values of the energy.
5 Summary and outlook
We have introduced a new method suitable to describe the distribution of shower maxima for extensive air showers initiated by ultrahigh energy cosmic rays. Our model is motivated by an interesting relationship between the generalized Heitler model and the distribution of the sum of nonidentically distributed variables in dissipative stochastic systems. In particular, we have shown that the generalized Gumbel distribution provides an excellent framework to parameterize for a wide range of nuclear masses, from UHE nuclei to photons, over a large energy range.
In this paper we also provide the parameters to describe the shower maxima distribution of nuclei and photons from eV to eV, assuming different hadronic interaction models, including the ones recently updated taking into account the LHC results. We tested our parameterizations with independent simulations of intermediate nuclear masses finding a remarkable agreement. For photons we found that a Gumbel distribution with provides a reasonable description of the distribution in a energy regime where the LPM and the preshowering effects are not significantly affecting the shower development. Both for nuclei and photons, the resulting and differ by less than a few g cm from the simulated value.
In section 4, we introduced a novel method, based on information theory, to be used with our parameterization for the study of the mass composition of UHECRs. The current experimental results suggest a mixed composition from light to heavier nuclei at the highest energies. The interpretation of the experimental measurements relies on the Monte Carlo simulations of the hadronic interaction models and on the assumed scenarios of nuclear masses mixtures. Our parameterization can be used to generate in a fast way the different scenarios of masses and interaction models to reproduce the data. Moreover, since our parameterization provides analytic probability functions, it can be use to estimate, for a given scenario, the fraction of each nuclear species in the observed events. We also performed a Monte Carlo test to check the goodness of this method by generating a distribution of events for a mixed composition scenario and we found that the reconstructed fractions of nuclear masses are well in agreement with the generated ones. The extension of this approach to the experimental data is straightforward but it is out of the scope of this paper.
Acknowledgment. Authors thank M. Unger for precious comments and suggestions. M.S. acknowledges support by the BMBF Verbundforschung Astroteilchenphysik and by the Helmholtz Alliance for Astroparticle Physics (HAP).
Appendix A Extreme value theory
Extreme value theory is the research area dealing with the statistical analysis of the extremal values of a stochastic variable. Let () be i.i.d. random outcomes of a distribution . If , the probability that the maximum is less than or equal to a value is:
It can be shown that the limiting distribution is degenerate and should be normalized [43]. However, if there exists sequences of real constants and such that
converges to a finite limit ,
(A.1) 
then the function G(x) is the generalized extreme value (GEV) or FisherTippett distribution
(A.2) 
defined for if and for if , where and are the location, scale and shape parameters, respectively. The Gumbel distribution is related to the distribution of maxima [44, 45] and it is retrieved for [43]. The corresponding probability density is easily obtained from as
(A.3) 
The two parameters (location) and (scale) can be related to the mean and to the standard deviation of the distribution, by means of the following relations:
(A.4)  
(A.5) 
where is the Euler constant. The generalized Gumbel distribution, dealing with the statistics of the th extremal value and the statistics of the sum of random correlated variables, includes an additional shape parameter , being defined by
(A.6) 
being a normalization factor depending on . The case corresponds to the classical Gumbel distribution described above. It is worth remarking here that the generalized Gumbel distribution has been recently adopted to successfully describe the statistics of many physically relevant situations, like physics of disordered systems [46], chemical fracture [47], hydrology [48], seismology [49], finance [50, 51] or anisotropy of ultra high energy cosmic rays [39], to quote only some of them.
The method of moments allows to relate the parameters of the generalized Gumbel distribution to its mean and variance. A convenient form of Eq. (2.3) for fitting procedures is obtained by standardizing the samples to zero mean and unitary variance, if they exist and are finite, through the transformation . Following Ref. [52], it is possible to show that all parameters of the generalized Gumbel distribution in the standardized case are a function of :
(A.7) 
being the Euler gamma function and the digamma function. Such a parameterization is particularly useful in applications because only the parameter has to be determined by fitting standardized data, although the parameters and remain to be estimated.
Hence, let us define , and . It is possible to show [52] that the first two noncentral moments of the generalized Gumbel distribution are given by
(A.8)  
(A.9) 
respectively. The corresponding cumulants, i.e., the mean and the variance , are given by and . Finally, we obtain the relations
(A.10)  
(A.11) 
completing the procedure to determine all the parameters of the generalized Gumbel distribution fitting the observation. For , an by noting that and , the mean and the variance of the classical Gumbel distribution are retrieved.
Appendix B Alternative parameterization
Throughout this work we assumed the definition of as the atmospheric depth at which the profile of the extensive air shower reaches its maximum . As mentioned in section 3.1, an alternative definition of the value, hereafter denoted as , is also adopted
in literature and in shower simulation tools. is defined as the depth at which the shower energy deposit profile reaches its maximum and, for instance, it
can be estimated by means of a quadratic interpolation around the maximum of .
Such a definition is mostly used by the experimentalists involved in measurements with fluorescence
telescopes since the fluorescence light emitted along the shower development is proportional to the dE(X)/dX.
We compared the values of and obtained from our simulations for different hadronic models, primary energies and nuclei and we found a perfect linear correlation between the two variables with a systematic negative offset  ranging from 5 to 10 g/cm, depending on the scenario. Such differences do not alter the nature of the expected probability distribution, that is still a generalized Gumbel, while they correspondingly affect the values of the parameters (for instance ).
It is worth mentioning that the residuals are similar to the ones shown in Fig. 4 obtained by using the GaisserHillas fit. Therefore, for sake of completeness we additionally provide in Tab. 4 the values of the resulting parameters when is adopted instead of .
QGSJet II  

756.881  10.982  1.259  49.665  0.296  0.251  
40.751  7.169  2.209  5.120  2.061  0.228  
0.901  0.700  0.048  0.200  0.066  0.011  
QGSJet II04  
760.023  12.107  1.364  57.973  1.836  0.349  
36.355  12.199  2.876  0.600  1.221  0.276  
0.699  0.697  0.007  0.070  0.028  0.021  
Sibyll 2.1  
768.815  16.440  0.954  60.039  0.560  0.044  
33.472  0.615  0.535  1.287  0.242  0.078  
0.730  0.267  0.009  0.029  0.054  0.009  
Epos 1.99  
778.090  11.873  1.930  62.926  0.310  0.083  
30.205  7.914  1.982  0.110  0.675  0.081  
0.570  0.557  0.029  0.018  0.065  0.011  
EposLHC  
774.647  7.659  2.385  57.943  0.810  0.273  
30.727  12.734  2.953  0.371  1.516  0.300  
0.590  0.691  0.069  0.046  0.038  0.007 
QGSJet II  

2.222  0.150  0.058  
QGSJet II04  
0.337  0.203  0.137  
Sibyll 2.1  
1.010  0.668  0.147  
Epos 1.99  
0.233  0.047  0.055  
EposLHC  
1.029  0.157  0.022 
References
 [1] K.H. Kampert and M. Unger, Astropart.Phys. 35, (2012), 660.
 [2] J. Abraham et al. [The Pierre Auger Collaboration], Phys. Rev. Lett. 104, (2010) 091101.
 [3] P. Facal for the Pierre Auger Coll. ICRC 2011, arXiv:1107.4804.
 [4] R. Abbasi et al. [The HiRes Collaboration], Phys. Rev. Lett. 104, (2010) 161101.
 [5] C. Jui et al. [The TA Collaboration], Proc. APS DPF Meeting, arXiv:1110.0133.
 [6] E.G. Berezhko et al., Astrop. Phys. 36, (2012) 31.
 [7] E. Barcikowski, J. Bellido, J. Belz, Y. Egorov, S. Knurenko, V. de Souza, , Y. Tsunesada and M. Unger for the HiRes, Pierre Auger, Telescope Array and Yakutsk Collaborations, Mass Composition Working Group Report, UHECR 2012 Symposium, CERN 2012, EPJ Web of Conferences (in press).
 [8] P. Abreu et al. [The Pierre Auger Collaboration], JCAP 1302, (2013), 026.
 [9] C.J.T. Peixoto, V. de Souza and J. Bellido, (2013), arXiv:1301.5555.
 [10] J. Matthews, Astrop. Phys. 22 (2005) 387.
 [11] T. Pierog, R. Engel, D. Heck, Czech. J. Phys. 56 (2006), 161.
 [12] E. Bertin, Phys. Rev. Lett. 95 (2005), 170601.
 [13] E. Bertin, M. Clusel, J. Phys. A: Math. Gen. 39 (2006), 7607.
 [14] S. Ostapchenko, Phys. Rev. D 74, (2006) 014026.
 [15] E.J. Ahn, R. Engel, T.K. Gaisser, P. Lipari, T. Stanev, Phys. Rev. D 80 (2009) 094003.
 [16] T. Pierog, K. Werner, Phys. Rev. Lett. 101, (2008) 171101.
 [17] D. d’Enterria et al, Astrop. Phys. 35 (2011) 98.
 [18] S. Ostapchenko, Phys. Rev. D 83, (2011) 014018.
 [19] K. Werner, I. Karpenko, and T. Pierog, Phys. Rev. Lett. 106 (2011) 122004
 [20] W. Heitler, The Quantum Theory of Radiation, third ed., Oxford University Press, London, 1954, p. 386.
 [21] M. Clusel and E. Bertin, Int. J. Mod. Phys. B 22 (2008), 3311.
 [22] S. T. Bramwell, P. C. W. Holdsworth, J.F. Pinton, Nature (London) 396, 552 (1998).
 [23] S. T. Bramwell et al., Phys. Rev. E 63, 041106 (2001).
 [24] S. Aumaitre, S. Fauve, S. mcNemara, P. Poggi, Eur. Phys. J. B 19, 449 (2001).
 [25] A. Noullez, J.F. Pinton, Eur. Phys. J. B 28, 231 (2002).
 [26] J. J. Brey, M. I. García de Soria, P. Maynar, M. J. RuizMontero, Phys. Rev. Lett. 94, 098001 (2005).
 [27] E. Bertin, P.C.W. Holdsworth (2013), arXiv:1303.4689.
 [28] T. Antal, M. Droz, G. Györgyi, Z. Rácz, Phys. Rev. Lett. 87, 240601 (2001).
 [29] T.K. Gaisser, A.M. Hillas, Proc. 15th ICRC, Plovdiv, Bulgaria, 1326 Aug 1977, 8 353.
 [30] T. Bergmann et al., Astrop. Phys. 26, (2007) 420.
 [31] L.D. Landau, I.Ya. Pomeranchuk, Dokl. Akad. Nauk, SSSR 92, (1953) 535.
 [32] A.B. Migdal, Phys. Rev. 103 (1956) 1811.
 [33] T. Erber, Rev. Mod. Phys. 38 (1966), 626.
 [34] B. McBreen and C.J. Lambert, Phys. Rev. D 24, (1981) 2536.
 [35] P. Homola et al., Comput. Phys. Commun. 184, (2013), 1468 and arXiv:1302.6939.
 [36] J. Abraham et al. [The Pierre Auger Collaboration], Nucl. Instrum. Meth. 523 (2004) 50.
 [37] R.Barlow and C.Beeston, Comp. Phys. Comm. 77, (1993) 219.
 [38] S. Kullback and R.A. Leibler, Ann. Math. Stat. 22 (1951) 79.
 [39] M. De Domenico, A. Insolia, H. Lyberis and M. Scuderi, J. Cosm. Astrop. Phys. 03, (2011), 008.
 [40] S. Eguchi and J. Copas, J. Multiv. Anal. 97 (2006) 2034–2040.
 [41] H. Akaike, Information theory and an extension of the maximum likelihood principle in 2nd International Symposium on Information Theory (Edited by B. N. Petrov and F. Csaki) (1973), 267281. Akademia Kiado, Budapest.
 [42] J. B. Johnson and K. S. Omland, Trends in ecology & evolution 19 (2004) 101–108
 [43] L. De Haan and A. Ferreira, Extreme value theory: an introduction, Springer Verlag (2006).
 [44] E.J. Gumbel, Statistical theory of extreme values and some practical applications: A series of lectures, National Bureau of Standards Washington, DC (1954).
 [45] E.J. Gumbel, Statistics of extremes, Dover Pub. (2004).
 [46] J.P. Bouchaud and M. Mezard, J. Phys. A 30, 7997 (1997).
 [47] A. Baldassarri, A. Gabrielli and B. Sapoval, Europhys. Lett. 59, (2002) 232.
 [48] R. W. Katz, M. B. Parlangi and P. Naveau, Advances in Water Res. 25, 1287 (2002).
 [49] D. Sornette, L. Knopoff, Y. Kagan and C. Vannest, J. Geophys. Res. 101, 13883 (1996).
 [50] F. Longin, J. Bank. Finance 24, 1097 (2000).
 [51] J.P. Bouchaud and M. Potters, Theory of Financial Risk and Derivative Pricing, Cambridge University Press, Cambridge, 2003, 2nd edition.
 [52] S.C. Chapman, G.Rowlands and N.W.Watkins, Nonl. Proc. Geoph. 9 (2002), 409.