The expectation of cosmic ray proton and helium energy spectrum below 4 PeV measured by LHAASO
Large High Altitude Air Shower Observatory(LHAASO) is a composite cosmic ray observatory consisting of three detector arrays: kilometer square array (KM2A) which includes the electromagnetic detector array and muon detector array, water Cherenkov detector array (WCDA) and wide field of view Cherenkov telescope array (WFCTA). One of the main scientific objectives of LHAASO is to precisely measure the cosmic rays energy spectrum of individual components from eV to eV. The hybrid observation will be employed by LHAASO experiment, in which the lateral and longitudinal distributions of the extensive air shower can be observed simultaneously. Thus many kinds of parameters can be used for primary nuclei identification. In this paper, high purity cosmic ray simulation samples of light nuclei component are obtained through Multi-Variable Analysis. The apertures of 1/4 LHAASO array for pure proton and mixed proton and helium (H&He) samples are and respectively. A prospect of proton and H&He spectra from 100 TeV to 4 PeV is discussed.
HAASO; hybrid measurement; energy spectrum; composition; TMVA.
An unsolved problem in cosmic ray observation is the “knee” structure in the energy spectrum, namely a significant bending of the spectrum from the power-law index of approximately -2.7 to -3.1 around a few PeV. The causes of the knee structure are closely related to the origin, acceleration and propagation mechanism of cosmic rays, but there is no consistent measurement result so far. Several experiments have measured the all-particle spectra around the knee region, such as KASCADE-Grand , Tibet-AS , EAS-TOP , etc. These experiments exhibited a similar knee-like structure at energies of about 4 PeV, within a factor of two in the flux values .
In addition, the energy spectra of individual components are much different. For example, the unfolded proton spectrum measured by the KASCADE experiment shows the steepening at 2 PeV or 3PeV based on QGSJET01 model and SIBYLL2.1 model respectively . However, the hybrid experiment of ARGO-YBJ and two prototypes of WFCTA shows that the energy spectrum for H&He has a knee-like structure at 700 TeV . The main reasons for this situation are the lack of absolute energy scale and the way to identify the type of the primary particles of cosmic rays. LHAASO [7, 8], located in Daocheng Haizishan, 4410 m a.s.l., Sichuan Province, China, will throw light on this traditional problem.
The LHAASO site is at an ideal altitude for knee physics research because the atmospheric depth is close to the maximum of the development of cosmic ray extensive air shower (EAS) , so that the fluctuation of EAS and the dependence of the interaction model will be the minimum. Similar to the ARGO-YBJ experiment, WCDA can obtain absolute-energy-scale through moon shadow observation . Because the detection threshold of WCDA can be as low as several hundred GeV, it can be directly compared with the measurement results of space experiments to study the error of energy scale.
Furthermore, LHAASO contains four types of detectors, which can detect electromagnetic particles, muons, Cherenkov and fluorescent photons in the EAS. Multi-Variable measurements of the EAS can effectively distinguish the components of cosmic rays. Thus, the energy spectra of the individual components can be precisely measured by LHAASO through Multi-Variable Analysis (MVA).
LHAASO is planed to obtain the consecutive measurement of energy spectra by four stages. First of all, WCDA delivers the absolute-energy-scale to WFCTA by hybrid observation of cosmic rays from 10 TeV to 100 TeV. In this energy band, the energy spectra measured by LHAASO can overlap with the spectra of space experiments. The second stage is mainly for the knee observation of light nuclei components of cosmic rays in the energies from 100 TeV to 10 PeV by the hybrid detection of WFCTA, WCDA, and KM2A. In the third stage, the layout of WFCTA will be changed to measure the knee of the heavy nuclei from 10 PeV to 100 PeV . WCDA will not be included in the hybrid detection due to saturation. The last stage is for the second knee study from 100 PeV to 1 EeV, in which WFCTA will be operated in fluorescence observation mode .
In the first three observation stages, the elevation angles of the WFCTA are 90, 60, 45 respectively. Their corresponding atmospheric depths are respectively around the shower maximum positions of cosmic rays in their own observation stages. This paper is devoted to the prospect of energy spectra observation of pure proton and H&He in the second stage through simulation.
2 The Hybrid Experiment
LHAASO was formally approved on December 31, 2015 and now is under construction. The detectors layout of LHAASO experiment is shown in Fig. 1. LHAASO consists of four types of detectors: electromagnetic detectors (red dots), muon detectors (blue dots), water Cherenkov detector (blue squares) and wide field of view (FOV) Cherenkov telescopes(black rectangles). There are many vacancies in the muon detectors (blue dots) distribution, which is due to the terrain making it impossible to install detectors.
WCDA , located at the center of LHAASO, is a water Cherenkov detector array with a total area of 78,000 . It detects the Cherenkov light produced in water by cascade processes of secondary particles in EAS. WCDA consists of three water ponds: two have area of and one has area of . All of the ponds have the water depth of 4.5 m. Each pond is divided into small cells with an area of , having two Photomultiplier Tubes (PMT) anchored on the bottom at the center of the cell. In one of the ponds, the pond in the lower left quarter in Fig. 1, a 1-inch PMT is placed right by an 8-inch PMT to enlarge the dynamic range to 200,000 photoelectrons. It allows the measurement of cosmic rays with energy up to 10 PeV . This pond is called WCDA++ to achieve the hybrid detection with the WFCTA.
WFCTA is located at 85 meters away from the center of WCDA++. It detects the Cherenkov and fluorescence photons in the air shower. The angle of elevation of the telescope’s main axis is set to . Each telescope includes a spherical mirror to collect Cherenkov and fluorescence photons and reflect them onto the camera  with the effective area of . The camera is located at the focal point of the spherical mirror and its function is to convert Cherenkov/fluorescence photons into electrical signals. The optical sensor of the camera is SiPM , with 1024 pieces in total. The pixel of each SiPM is and the camera watches a FOV of . All the parts of the telescope are placed in a container for the convenience of movement and arrangement. Two telescope prototypes have been operated successfully at Yangbajing cosmic ray observatory in Tibet .
KM2A contains two sub-arrays: electromagnetic detector array (ED) and muon detector array (MD). ED array  covers an area of 1.3 consisted of 5195 plastic scintillator detectors with a spacing of 15 meters. One ED detector has an effective area of 1 . As its name describes, the ED detects the electromagnetic particles in the EAS. MD array is an underground water Cherenkov detector array . There are 1171 muon detectors also covering 1 area with a spacing of 30 meters. The area of one muon detector is 36 , and it detects the information of muon in the EAS.
The 1/4 LHAASO array includes the WCDA++, six Cherenkov telescopes, 1272 EDs and 300 MDs. The layout of the 1/4 array is shown in Fig. 1. This 1/4 array will run for a few years to achieve the physical goal of light nuclei components spectra measurement.
Since full-coverage array can provide more accurate geometric information of air showers, the secondary particles are mainly measured by WCDA++ in the second hybrid detection stage. It means that the shower core position in this stage is outside of the ED array. Therefore, ED is not included in this simulation.
3 Simulation and Event Reconstructions
The cascade processes of primary cosmic rays in the atmosphere are simulated by the CORSIKA  program with the version of 6990. The EGS4 model is chosen for the electromagnetic interaction. The QGSJET02 and GHEISHA models are chosen for the high and low energy hadronic processes respectively. Both the information of Cherenkov photons and secondary particle at the level of observatory are recorded to simulate the hybrid observation. Five components, proton, helium, CNO, MgAlSi, and iron are generated with energies from 10 TeV to 10 PeV according to a power law spectrum with an index of -2.7. The directions of the showers are from to in zenith and from to in azimuth. The shower core position is evenly distributed in an area of 260 m260 m. The primary energy spectrum is normalized to the exponent of -2.7 from 10 TeV to 10 PeV, as shown in Fig. 2. Different components are normalized to the same with proton.
In addition, five components are also normalized according to Hörandel model. The relationship between the original energy spectra and the two normalized spectra is that the number of original proton events, the number of proton events with exponent of -2.7 and the number of proton events with exponent from Hörandel model are the same at 100 TeV. The events number of other energy bins and other components are normalized in proportion. The normalized statistics of each component are equivalent to the observation data of one year with 10% duty cycle according to Hörandel model .
The simulation of the response of three LHAASO detector arrays for EAS is performed separately. In the simulation, the actual coordinates of the detector arrays are transferred to the CORSIKA coordinate system. The program of photons ray-tracing is used for WFCTA simulation. The trigger pattern of the telescope is set to 7, namely seven neighboring pixels should be triggered. The fast simulation program  is used for the simulation of WCDA++ and MD. Only photoelectrons are sampled according to the energy, direction and particle id of the secondary particles in EAS. The time response is not included. Then the simulated results of different detectors are merged before reconstruction.
3.2 Event Reconstruction
According to the actual data acquisition, as long as the Cherenkov telescope is triggered, the events measured by three detector arrays are reconstructed.
Firstly, the shower core position is given by WCDA++ through NKG fitting . Due to the lack of time response in WCDA++ fast simulation program, there is no information about the shower fronts. Therefore the arrival direction of the shower is obtained by a Gauss sampling, of which angular resolution is 0.3.
Secondly, the perpendicular distance between the telescope and the shower axis () is calculated. Next is Cherenkov image cleaning. The first step is to remove the fired pixels whose photoelectrons measured is less than 30 in the image. The second step is to find the fired pixel with the largest number of photoelectrons, and use it as the center to traverse the whole image outward. Then remove all the islanded fired pixels.
After image cleaning, the Hillas parameters , are given, as well as the total photoelectrons () in the image. is a good energy estimator. Before energy reconstruction, should be normalized to and , namely
where is the space angle between the shower direction and the optical axis of the telescope, and the parameters and depend on the primary component of cosmic rays.
3.3 Event Selection
There are three basic principles for events selection: firstly, the shower core position falls in WCDA++; secondly, the Cherenkov image is complete; thirdly, the muon lateral distribution is well fitted.
In detail, the reconstructed shower core position located in arround WCDA++ is selected. For a Cherenkov image, the number of fired pixels are more than 10; and the angular distance between the weighted center of the image and the center of the camera plane is less than . For muon lateral distribution fitting, the fitted normalized muon size () should be more than .
The number of events in each energy bin after selection is shown in Fig. 3 (left). The dotted line represents the number of proton events involved in the simulation, which is consistent with the black line in Fig. 2. The solid lines represent the number of events after selection, and different colors represent different components. It points out that the hybrid observation becomes nearly fully efficient above 100 TeV for all components.
Detection efficiency is defined as the ratio of the number of selected events to the number of simulation events in each energy bin. That is the black solid line divided by the blue dotted line in Fig. 3 (left). In this simulation, the detection efficiency is about 15% above 100 TeV for all components, as shown in Fig. 3(right). The red dots show the detection efficiency of H&He and the black dots show the the detection efficiency of proton. Besides, it is obvious that the error bars of the last two points in Fig. 3 (right) are large due to the small statistics. Therefore the results of are masked in the subsequent analysis.
4 Proton and Helium Event Selection
In this section, the parameters of the three types of detector arrays are analyzed according to the development characteristics of the EAS. The component sensitive parameters of LHAASO hybrid observation are introduced. Then, the Toolkit for Multivariate Analysis (TMVA) [25, 26] is used and the selection results are presented.
4.1 Component Sensitive Parameters
4.1.1 Parameters from WFCTA
It is known that iron-induced air showers develop larger and faster for the smaller interaction mean free path while primary particles travel into the atmosphere . Thus, the proton initiated showers develop to its maximum later than one of iron showers. The atmosphere depth of shower maximum, , is a traditional mass sensitive parameter. Here, , the exactly parameter used to reconstruct the , is applied instead of the reconstructed . is the angular distance between the shower arriving direction and the gravity center of the Cherenkov image. But it can not be used directly to classify primary particles because of the and shower energy () dependence. After normalization, the structure of the mass sensitive parameter is as follows:
Moreover, the proton-induced showers exhibit an elongated elliptic shape . And the ratio of length and width of the cherenkov image is also a traditional and effective parameter:
The distributions of and for proton and iron showers are shown in Fig. 4.
4.1.2 Parameters from MD
The muon size of a shower heavily depends on the atomic number () of the primary particle: , where is approximately 0.9  and indicates the proton shower.
Therefore, the muon size by fitting (), the total number of detected muons () and the number of fired MDs (NMD) are significant variables to identify the primary particles:
is the perpendicular distance between the center of the MD array and the shower axis. The distributions of three parameters , and for proton and iron showers are shown in Fig. 5.
4.1.3 Parameters from WCDA
It is also known that the iron initialed shower is more extensive in the same observation level. It means that the lateral extension of secondary particles in iron-induced air shower is larger. WCDA++, the fully covered detector array, can well detect the lateral distribution of secondary particles. The formulas for the average lateral distribution and fluctuation are expressed as Eq. 2 to Eq. 5.
where is the distance between the shower core position and the fired cell in WCDA++. is photoelectronic measured by the fired cell.
where the and are the centroid of distribution of secondary particles in WCDA++:
These parameters are not only related to the primary component but also to the shower direction and the shower energy, hence it should be corrected before MVA:
The distributions of three parameters , and for proton and iron showers are shown in Fig. 5.
In addition, nearby the shower core axis, the particle density of iron-shower is much less than that of proton-shower . Hence the photoelectrons in the brightest cell () measured by WCDA++ is sensitive to components:
The total photoelectrons () measured by WCDA++ is also a mass sensitive variable, even it is weaker than :
The distributions of two parameters and for proton and iron showers are shown in Fig. 6.
Other parameters have also been studied, such as the photoelectrons measured by the brightest pixel in air Cherenkov image (), the total photoelectrons located 45 meters away from the shower core in the WCDA++ (), the muon density detected from 80 m to 100 m away from the shower core position (), and etc. Nevertheless, the particle classification of these variables is weak. After tuning of parameters, these ten parameters are candidates used as input parameters for the BDTG classifiers.
4.1.4 Correlation of Parameters
The parameters described above are not independent. There is a correlation between some parameters. Parameters provided by the same detector array are highly correlated, such as, and , and . The correlations of parameters from WCDA and MD for five components of cosmic rays are shown in Fig. 7. The parameters of and are negative correlation (left plot) and and are position correlation (right plot). According to the development characteristics of the EAS, it is easy to understand that the more extensive of the lateral distribution of the secondary particles, the less particles nearby the shower core axis; and the more muon detectors fired, the more muon contents in the shower.
The correlation of parameters provided by the different detector arrays is weak, such as, and , and . The correlations of parameters from different detector arrays for five components are shown in Fig. 8. The distributions are approximate to circle. However, the correlation between parameters of different components is slightly different; parameters for proton tend to be more correlated, as shown by the black dots in Fig. 8.
The correlation matrix of these ten variables for proton initialed showers is shown in Fig. 9. The linear correlation coefficient 100% means a linear positive correlation and -100% means linear negative correlation. 0 means no correlations among two parameters. It should be noticed that the parameter is basically independent of other parameters, as shown in the third column or in the eighth row in Fig. 9. This is because reflects the overall development characteristic, not just the lateral or longitude distribution of the air shower.
After removing the highly correlated (97% or 98% correlations) parameters, six parameters, , , , , and , are used for TMVA training and analysis.
4.2 TMVA analysis
Drawing on the experience of ARGO-YBJ & WFCT hybrid detection, two parameters are used for particle identification  in the event-by-event cut. However, too many variables are no longer suitable for particle identification through event-by-event cut because it is complicated to get the optimal cut value of each parameter. Therefore, MVA is taken.
As an important branch of statistics, multivariate analysis has been applied to most of the disciplines. TMVA is specially developed for high energy physics based on the ROOT-integrated environment. It is powerful for signal and background classification. In accelerator physics, it can effectively screen out the b-tagging signals from a large number of background particles in the jet . And likewise, it enables to identify the components of cosmic rays .
TMVA works based on machine learning. It integrates multiple advanced algorithm classifiers, such as Boosted Decision Trees (BDT), Artificial Neural Networks (ANN), Support Vector Machine (SVM), and etc. Users need to input variable samples and select the algorithm. Then the machine training and testing are carried out. The result is to output one variable for users to select signals.
Here Boosted Decision Trees with Gradient boosting(BDTG) is chosen, which is the most widely used so far. The classification of and from other mass components is carried out independently.
The selected hybrid events are divided into two equal parts. One is used for TMVA training and testing; the other is used as data. The statistics of signal (proton or H&He) and background (others) are shown in the Table 1.
|NO. of events||Proton||H&He|
To avoid overtraining, several parameters are adjusted to achieve the best performance of the classifier. Parameters of BDTG classifiers for the separation of H/other nuclei are as follows:
Number of trees in the forest is 380;
Minimum training events required in a leaf node are 30;
Max depth of the decision tree is 2.
Parameters of BDTG classifiers for the separation of H&He/other nuclei are as follow:
Number of trees in the forest is 300;
Minimum training events required in a leaf node are 50;
Max depth of the decision tree is 2.
The other parameters are set as default values.
The training results are shown in Fig. 10. can be well separated from other components; However, separation of proton from other nuclei is barely satisfactory. The background rejection versus signal efficiency (“ROC curve”) is obtained, by cutting on the BDTG classifier outputs for the events of the proton and H&He samples, as shown in Fig. 11. It is obviously that under the same signal efficiency, the background rejection of other heavy nuclei (H&He vs. other nuclei) is higher.
The training results are applied to the other half of the data sample. Generally, the signal events can be selected according to the best cut points provided by TMVA. However, because our final goal is to obtain the selection efficiency under different energy bins, the distribution of the output of BDTG classifier versus reconstructed energy is studied firstly. As shown in Fig. 12, the separation of the output BDTG between H&He and heavy nuclei become larger as primary energy increases. The mean value of the signal (red dots) in each energy bin is slightly away from the background (black dots) and the RMS of the signal is gradually decreased. The causes for the separation becoming larger is that the performance of the input parameters gets better in higher energies.
After the cut described above, the aperture and contamination of hybrid observation are calculated. The aperture of pure proton is about and the aperture of H&He is about , as shown in Fig. 13 (left). The contamination of proton sample is about 10% and the contamination of H&He sample is less than 3% according to the Hörandel model, as shown in Fig. 13 (right).
Considering the uncertainty due to the hadronic interaction models, there are two batches of data, QGSJET-FLUKA and EPOS-FLUKA, were used for error analysis. It is found that the aperture of H&He is consistent within .
5 Proton and Helium Spectrum Expectation of LHAASO
After the primary particle identification, the accuracy of event reconstruction for light components is obtained. For proton and H&He, the shower core resolution is less than 3 m. The energy reconstruction is obtained by WFCTA, as described in Section 3. There is a linear correlation between shower energy E and the corrected number of Cherenkov photoelectrons, . For proton, and ; the reconstructed energy is . The variables and for H&He energy reconstruction are approximately equal to that of proton. The reconstructed bias and resolution for proton and H&He are shown in Fig. 14. The Bias is about 4% and the resolution is less than 20%.
Based on the aperture described above, the cosmic ray spectra of proton and H&He are predicted according to the Hörandel model  and the ARGO-YBJ & WFCT model  which is an experimental model. The exposure time is set to one year with 10% duty cycle. Since the experimental model ARGO-YBJ & WFCT only shows the energy spectrum of H&He, the spectrum of pure proton is obtained by dividing by 2, that is, the ratio of proton and helium is 1:1. The bending of the knee keeps unchanged.
As shown in Fig. 15, the left panel shows the spectrum with the Hörandel model; and the right panel shows the ARGO-YBJ & WFCT model. The black dots show proton and the red dots show . The statistical error are very small with the aperture and effective time presented above. The corresponding event rate in one year is shown in Fig. 16.
At 800 TeV, 5121/6461 H&He events and 1130/1476 proton events can be selected in one year according to the Hörandel model and the ARGO-YBJ & WFCT model respectively. At 2 PeV, 1023/849 H&He events and 222/210 proton events can be selected in one year according to the Hörandel model and the ARGO-YBJ & WFCT model respectively.
Therefore, if the knee of the cosmic ray light component is below 1 PeV, the 1/4 LHAASO array can give good measurements within one year. If the knee of light component is higher than 3 PeV, more observation time or more effective methods is needed to get enough statistics. Considering the application of SiPM, it allows observation in the moon night of WFCTA. The duty cycle of the hybrid observation can be extended. Moreover, during the construction of LHAASO, more Cherenkov telescopes can also increase the statistics effectively .
The simulation for the second stage of 1/4 LHAASO hybrid detection is operated. Three types of detectors, WFCTA, WCDA and MD, are included in this study. Parameters from each detector array are studied and tuned in detail. After removing the highly correlated parameters, six parameters are used as input of the BDTG classifier for TMVA machine training and testing. The results show that the classification of pure proton is weaker than that of H&He.
After cutting, high purity light component samples were selected. The aperture of pure proton is 900 , the contamination is around 10% according to the Hörandel model; the aperture of H&He is 1800 , the contamination is less than 3%. For the aperture H&He, the uncertainty of the different strong interaction models is about 5%. Moreover, the bias of energy reconstruction is about 4% and the resolution is less than 20% for both pure proton and H&He samples.
According to the energy spectrum expectation of the ARGO-YBJ & WFCT model, the spectrum of light component in cosmic rays will be measured accurately in a short time by 1/4 LHAASO array.
This work is supported by the National Key R&D Program of China (NO. 2018YFA0404201 and NO. 2018YFA0404202). This work is also supported by the Key Laboratory of Particle Astrophysics, Institute of High Energy Physics, CAS. Projects No. Y5113D005C, No. 11563004 and No. 11775248 of National Natural Science Foundation (NSFC) also provide support to this study.
-  M. Bertaina et al., Nuclear Physics B (Proc. Suppl.) 256-257 (2014) 149-160.
-  M. Amenomori et al.(for the Tibet AS Coll.), Astrophysical Journal, 2008, arXiv:0801.1803v1.
-  M. Aglietta et al.(EAS-TOP Coll.), Astroparticle Physics, 10:l-9.
-  Johannes Blümer, Ralph Engel, Jörg R.Hörandel, Progress in Particle and Nuclear Physics 63(2009) 293-338.
-  T. Antoni et al., Astroparticle Physics, 2005, Vol. 24(1-2):Pages 1-25.
-  B. Bartoli et al.(for ARGO-YBJ and LHAASO Coll.), Phys. Rev. D, 2015, 92(9): 25-31.
-  Z. Cao (for LHAASO Coll.), Chin.Phys. C, 2010, (34)249-252.
-  Z. Cao (for LHAASO Coll.), 31th ICRC, 2009.
-  Peter K.F.Grieder, ISBN978-3-540-76940-8(2010).
-  B. Bartoli et al.(for ARGO-YBJ Coll.), Astrop.Phys., 90 (2017) 20-27.
-  L.L. Ma, 33th ICRC, 2013, p.1013.
-  J.L. Liu, 33th ICRC, 2013, p.0880.
-  Q. An et al.(for LHAASO Coll.), Nucl.Instrum.Meth. A, 2013, 724: 12-19.
-  C. Liu et al. (for LHAASO Coll.), 35th ICRC, 2017, p.301.
-  S. S. Zhang et al.(for LHAASO Coll.), TIPP 2017.
-  Baiyang Bi et al.(for LHAASO Coll.), Nucl.Instrum.Meth. A, 899 (2018) 94-100.
-  S.S.Zhang et al., Nucl.Instrum.Meth. A, 2012, 629(1)57-65.
-  Zhao Jing et al.(for LHAASO Coll.). Chin.Phys. C, 2014, 38(3) 036002.
-  X. Zuo et al.(for LHAASO Coll.), Nucl.Instrum.Meth. A, 2015, 789: 143.
-  D. Heck and T. Pierog, Extensive Air Shower Simulation with CORSIKA: A User’s Guide, 2015.
-  Jörg R.Hörandel, Astroparticle Physics, 2003, 19(2): 193-220.
-  Xiurong Li et al. (for LHAASO Coll.), 35th ICRC, 2017, p.381.
-  Yu.A. Fomin et al., Nuclear Physics B (Proc. Suppl.) 175-176 (2008) 334-337.
-  Hillas, A. 1985, 19th ICRC Vol.3, 445-448.
-  A. Hoecker, P. Speckmayer et al, TMVA Users Guide, 2013, arXiv:physics/0703039.
-  P Speckmayer, A Höcker et al., Journal of Physics: Conference Series 219 (2010) 032057.
-  B. Bi, Z. Cao et al.(for LHAASO Coll.), 2016, Frascati Physics Series Vol. 64.
-  T Lampén, F Garcia et al., Journal of Physics: Conference Series 119 (2008) 032028
-  Zizhao Zong et al.(for LHAASO Coll.), 35th ICRC, 2017(547).