A simulation study to distinguish prompt photon from and beam halo in a granular calorimeter using deep networks
In a hadron collider environment identification of prompt photons originating in a hard partonic scattering process and rejection of non-prompt photons coming from hadronic jets or from beam related sources, is the first step for study of processes with photons in final state. Photons coming from decay of ’s produced inside a hadronic jet and photons produced in catastrophic bremsstrahlung by beam halo muons are two major sources of non-prompt photons. In this paper the potential of deep learning methods for separating the prompt photons from beam halo and ’s in the electromagnetic calorimeter of a collider detector is investigated, using an approximate description of the CMS detector. It is shown that, using only calorimetric information as images with a Convolutional Neural Network, beam halo (and ) can be separated from photon with 99.96% (97.7%) background rejection for 99.00% (90.0%) signal efficiency which is much better than traditionally employed variables.
a]S. Ghosh, b,c,1]A. Harilal, 11footnotetext: Corresponding author. b,d]A. R. Sahasransu, b]R. K. Singh a]and S. Bhattacharya Prepared for submission to JINST
A simulation study to distinguish prompt photon from and beam halo in a granular calorimeter using deep networks
High Energy Nuclear and Particle Physics Division, Saha Institute of Nuclear Physics, HBNI,
1/AF Bidhannagar, Kolkata, India
Department of Physical Sciences, Indian Institute of Science Education and Research Kolkata,
Mohanpur, 741246, India
Department of Physics, Carnegie Mellon University, Pittsburgh, USA
Vrije Universiteit Brussel, Belgium
Keywords: Calorimeters, Detector modelling and simulations, Data processing methods, Performance of High Energy Physics Detectors, Deep Learning, Convolutional Neural Network
Understanding the symmetry breaking mechanism in the standard model (SM) and search for new physics beyond the standard model are major goals of the physics program of the Large Hadron Collider (LHC). Photons and electrons play important roles in these searches as they provide clean signatures in hadron environments. Distinguishing prompt photons (coming from hard scattering) from photons coming from neutral meson () decays or other anomalous sources (e.g. bremsstrahlung photon coming from beam halo muon) is of critical importance. An example is the search for the Higgs boson using its H decay mode, which was one of the main channels for discovery of Higgs boson. The di-photon final state produced by this decay, has a large background coming from hadronic jets, where most of the jet energy has gone to single . Sensitivity of this search depends largely on the power of rejection of this background. Another example is search for dark matter and large extra dimensions in final states with a photon and large missing transverse momentum. In this channel, along with the photons coming from pion decays, beam halo photons pose an additional challenge.
Currently employed traditional techniques use combination of shower shape variables which are intelligently constructed by the physicist, to try to capture the difference in spatial patterns of the signal and the background events, from the raw output of an electromagnetic calorimeter, but are only a few in a set of infinite such possible variables. Classification using supervised learning algorithms like an artificial neural network (ANN) or a boosted decision tree (BDT), with the high level features made by physicsts as inputs, have also been done successfully. A prime example of such an analysis is the search for Higgs boson in the Higgs to diphoton channel performed by the CMS experiment at the LHC, where a boosted decision tree was deployed for identification of prompt photons. However, with the advent of the the new image recognition networks it should now be possible to benefit from these modern techniques. Only recently significant efforts have been made in this direction. There are a number of recent instances in High Energy Physics where physics classification problems have been recast as computer vision problems like neutrino classification [2, 3] and jet image classification [4, 5, 6, 7].
The analysis presented in this paper derives its motivation from such recent instances of using the emerging techniques in Deep Learning (DL) like Convolutional Neural Network (CNN) . In this approach the machine learns to construct many high level feature variables starting from the raw detector image. Every filter in the CNN projects such a high level variable. A CNN is thereby employed in trying to extract maximum amount of information from the raw output from the electromagnetic calorimeter, which can lead to better performance in discriminating between the two classes. In this study, data directly from the detector as images are used and a CNN is run on them without providing any high level physics information. Performance of the CNN is compared with an ANN fed with specialized physics variables so as to evaluate the efficacy of these deep learning algorithms in identifying features from the data without much human input.
The rest of the paper is planned as follows: in Section 2, an outline of photon identification in high energy physics and the various classes of photons considered in this paper are given. In Section 3, the details of the detector simulation are described, and in Section 4, the full analysis and results are presented, followed by a conclusion of this study in Section 5.
2 Photon Identification
2.1 Prompt photons
Prompt photons are photons produced in Compton scattering, annihilation or in fusion. Photons from decay of hadrons such as , fragmentation process as well as photons from bremsstrahlung of charged particles faking a prompt photon, form a large background over the signal. The tracker material in front of the electromagnetic calorimeter (ECAL) causes photons to convert to pairs in the tracker before reaching the ECAL. Unlike the photons that pass through the tracker unconverted, these converted photons have hits in the tracker and have ECAL energy deposition more spread in due to bending of the trajectories in the strong magnetic field. Typical energy deposit maps of different classes of photons are shown in Figure 1.
2.2 Photons from decay and fragmentation
Diphoton decay of a neutral meson () inside a hadronic jet is the single largest background to prompt photons. In this study, the two photons from a above 10 GeV hit the same crystal in the ECAL and appear like one photon.The hadronic multijet production cross section at the LHC is orders of magnitude higher than a typical new physics process. These jets copiously produce neutral mesons, a small fraction of which can fake a prompt photon. For search this background is almost two orders of magnitude larger than the signal, assuming a jet faking photon rate of ~ .
The energy deposits of prompt photons, and photons from and beamhalo in the ECAL as seen from the interaction point of our simulation is shown in Figure 2.
2.3 Beam halo photons
One of the main backgrounds, known as machine induced background (MIB), in high energy particle detectors comes from particles entering the detector from the accelerator. These particles which are produced in the hadronic and electromagnetic showers resulting from beam protons interacting with collimators or residual gas molecules in the vacuum pipe are called beam halo. Pions, being the lightest hadrons, are produced easily in these hadronic interactions, and constitute majority of the beam halo. Being short lived, the neutral pion decays into two photons, and the charged pions decay into muons. Some of these muons are very energetic with energies of hundreds of GeV, i.e, greater than the critical energy of muons in the lead tungstate crystals of the calorimeter. These high energy muons undergo bremsstrahlung as they interact with the atoms of the crystals, and give out photons. These halo photons result in final states with a single photon with no other object to balance its energy and momentum, thus mistakenly pointing to final states with invisible particles recoiling against photons. These photons are along the direction parallel to the beam pipe, and so have an elongated shower along .
A model detector comprising of calorimeter and tracker has been constructed  using GEANT4  to resemble CMS from its technical design report (TDR) . The calorimeter construction includes the barrel region between pseudorapidity . It is made of parametrized volumes of part of a sphere arranged as an array of crystals placed in a cylindrical arrangement. Each parametrized volume has coverage, azimuthal angle () coverage and radial length. The size of the crystals is in the front face and vary from to in the back face. These value are within of the ECAL crystal sizes in the TDR. A greatly simplified version of a tracker has been implemented as concentric cylinders of varying thickness of silicon. The first three layers have a thickness of followed by four layers of and six more layers of thickness. Additional material has been added to have a material budget similar to that of the CMS tracker in the pseudo-rapidity region . A uniform magnetic field of 4 T has been applied along the positive z-axis. The cross-sectional view of the geometry is shown in Figure 3.
The simulation has a flag for filtering out photons which convert anywhere inside the tracking volume.
The standard physics list FTFP-BERT has been used for simulating physics process to keep the simulation as close to the general purpose detectors CMS and ATLAS as possible . These include bremmstrahlung, pair production and photo-electric effect for photons, as well as Compton scattering for . The particle is assumed to deposit its entire energy at a point if it can’t travel a distance of mm further from the point.
4 Analysis and Results
This section describes the details of the analysis procedure and results. The information from the cylindrical ECAL has been represented as a 2D image in the space as nn (n = 11 or 25 depending on the problem described in sections below) matrix of cells around a local maximum in energy deposit (seed crystal). The cells represent calorimeter crystals and the values of the cells the energy contained in them. The values have been further normalized to the seed crystal energy before feeding the matrix of cells as an input to the networks. Three different network analyses have been performed for each classification problem :
- A shallow ANN with the traditional shape variables constructed out of cell values, as input.
- The normalized cell values fed to an ANN or DNN.
- nn matrix of normalized cell energies fed to a CNN.
In the first analysis the shape variables are constructed from intuition, utilizing the knowledge about the narrow lateral shower profile of an electromagnetic shower. Some variables used are designed after the standard shower shapes variables used in the CMS ECAL [14, 15] and other prior studies done on homogeneous, granular calorimeter hodoscopes . The main study with the cylindrical geometry described in section 3, has been cross checked with a planar geometry with a magnetic field parallel to the face of the crystals.
4.1 Network Architecture and Ranking
- Convolutional Neural Network (CNN):
A CNN has been constructed with two convolutional layers with filters of size 33 and stride 11, acted on with activation function of rectified linear unit (RELU)  on the outputs along with L2 regularisation, followed by maxpooling of pool window size 22. It is followed by a fully connected layer of 64 nodes, with dropout regularization  of 30%. Finally there is a fully connected layer with the softmax activation function giving a binary output. The structure of the CNN used is shown in Figure 4.
- Artificial Neural Nework (ANN):
For the photon-beam halo separation, an ANN with one hidden layer of 32 nodes with RELU activation, and an output layer of 2 nodes with softmax activation is used. We see that optimal classification is obtained using this simple one layered network. The network architecture is shown in Figure 5, where only one hidden layer is considered.
- Deep Neural Network (DNN):
For photon- separation, an ANN with two hidden layers is used, as the problem is more difficult and a deeper network is found to perform better. The first layer has 64 nodes, the second layer has 32 nodes, both with RELU activation, followed by a dropout of 30%, and the output layer has 2 nodes with softmax activation. The structure of the ANN used is shown in Figure 5.
All the networks use cross-entropy loss function with ADADELTA optimizer .
Receiver Operating Characteristics (ROC) curve with signal efficiency Vs background rejection have been plotted to evaluate the performance of each classifier. The area under the curve (AUC) of the ROC has also been used as a quality criterion for the classification.
51,000 images each of the signal (prompt photons) and background (non-prompt photons) classes have been generated. For prompt photon - beam halo separation, images of dimensions 1111 have been used, while for prompt photon - separation, images of dimensions 2525 have been used to efficiently capture the conversion of photons.
Out of the total sample set, 80% has been used for training and 20% for testing. Out of the training set, 30% has been used for cross validation.
4.3 Beam halo - prompt photon separation
10 GeV samples of prompt photons and beam halo photons have been generated, and the CNN and the ANN have been run on the 1111 images.
The CNN and the ANN have been observed to perform remarkably well and comparable to one another with high background rejection for a high signal efficiency (see Figure 7 and Table 1).
Another approach of distinguishing prompt photons from fake photons is by using shower shape variables that capture the differences in the energy spread in the space as shown in Figure 1. To the ANN, the following set of shower shape variables based on the energy in a 55 cluster centered around the maximum energy (seed) crystal have been fed as the input:
s is the ratio of the energy contained within the 33 matrix of crystals centered on the seed crystal to the total energy contained in the 55 matrix around the seed crystal and,
where and are the energy and index of the crystal within the 55 cluster, and is the index of the seed crystal . Similarly for we have,
With the beam halo photons having an elongated spread in and the prompt photons with a more circular spread in space, these variables have been chosen aiming to distinguish between the two classes utilizing these differences.
The distributions of these variables are shown in Figure 6.
|CNN on image||99.96||99.00||0.9997|
|ANN on image||99.89||99.00||0.9990|
|ANN on 3 variables||71.31||99.00||0.9748|
4.4 prompt photon separation
The sample containing prompt photons has about 38% chance of getting converted into pairs leading to a different signature (see Figure 1). The sample has two photons, and there is about 61% chance that at least one will be converted.
The samples have been grouped into different sets as follows:
Set A - converted and unconverted photons Vs converted and unconverted ’s
Set B - unconverted photons Vs converted and unconverted ’s
Set C - unconverted photons Vs unconverted ’s
The CNN and the ANN have been run on these sets of 2525 images, and from the ROCs in Figure 9, it is evident that the CNN outperforms the ANN.
The ANN is fed with the following set of shower shape variables centered around the maximum energy seed crystal in a 55 cluster as the input:
|CNN on image set A||76.4||90.0||0.9030|
|ANN on image set A||73.4||90.0||0.8825|
|ANN on 9 variables for set A||46.8||90.0||0.8196|
|CNN on image set B||92.5||90.0||0.9567|
|CNN on image set C||97.7||90.0||0.9848|
Along with variables defined in section 4.3, is the energy of the seed crystal, is the maximum energy contained within a 22 matrix centered on the seed crystal, is the maximum energy contained within a 44 matrix centered on the seed crystal, is the energy of the crystal with the next highest energy adjacent to the seed crystal, is the ratio of s with the energy in the 2525 supercluster, and
(a) , (b) , (c) , (d ) , (e) , (f) , (g) , (h) , and (i) .
The distributions of each of these variables for set A of photons and ’s have been shown in Figure 8. They do not have much discriminating power, and the ROCs in Figure 9 indicate how poorly they perform compared to the CNN on the images. The background rejection, signal efficiency and the area under the curve of the ROCs are listed in Table 2.
A study has been presented using deep learning techniques for separating prompt photons from neutral pions and beam halo. Different network types have been compared on simulation data from approximated CMS detector geometry consisting of a tracker and a calorimeter.
For separating photons from beam halo, CNN based on image gives the maximum background rejection of 99.96 for 99.00 signal efficiency. For separating neutral pions from prompt photons, CNN based on image gives the maximum background rejection of 97.7 for 90.0 signal efficiency in the best case scenario. For both the cases, it is evident from the ROC AUCs that the CNN outperforms the ANN with the same input image as well as the ANN using topological variables.
For the beam halo separation, the spatial patterns of the prompt photon and beam halo photon are so distinct that a simple one layered ANN will suffice to do a high accuracy classification. The classification proved to be a more difficult problem owing to the very similar energy deposition patterns of the decay photons and a prompt photon. The CNN still gives a good performance on unconverted photons Vs unconverted ’s. These techniques of neural networks based on images is generic and can be applied to any calorimeter.
The authors would like to thank Dr. Ananda Dasgupta for the useful discussions through the course of this project.
-  CMS collaboration, S. Chatrchyan et al., Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC, Phys. Lett. B 716 (2012) 30.
-  A. Aurisano, A. Radovic, D. Rocco, A. Himmel, M. D. Messier, E. Niner et al., A Convolutional Neural Network Neutrino Event Classifier, JINST 11 (2016) P09001, [1604.01444].
-  MicroBooNE collaboration, R. Acciarri et al., Convolutional Neural Networks Applied to Neutrino Events in a Liquid Argon Time Projection Chamber, JINST 12 (2017) P03011, [1611.05531].
-  L. de Oliveira, M. Kagan, L. Mackey, B. Nachman and A. Schwartzman, Jet-images â deep learning edition, JHEP 07 (2016) 069, [1511.05190].
-  P. Baldi, K. Bauer, C. Eng, P. Sadowski and D. Whiteson, Jet Substructure Classification in High-Energy Physics with Deep Neural Networks, Phys. Rev. D93 (2016) 094034, [1603.09349].
-  J. Barnard, E. N. Dawe, M. J. Dolan and N. Rajcic, Parton Shower Uncertainties in Jet Substructure Analyses with Deep Neural Networks, Phys. Rev. D95 (2017) 014018, [1609.00607].
-  P. T. Komiske, E. M. Metodiev and M. D. Schwartz, Deep learning in color: towards automated quark/gluon jet discrimination, JHEP 01 (2017) 110, [1612.01551].
-  Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (November, 1998) 2278–2324.
-  M. Pieri, S. Bhattacharya, I. Fisk, J. Letts, V. A. Litvine and J. G. Branson, Inclusive search for the Higgs boson in the H —> gamma gamma channel, .
-  A. R. Sahasransu and R. Singh, “Opencmsg4v0.1.” https://github.com/ars12ms062/OpenCMSG4, 2018.
-  GEANT4 collaboration, S. Agostinelli et al., GEANT4: A Simulation toolkit, Nucl. Instrum. Meth. A506 (2003) 250–303.
-  CMS collaboration, The CMS tracker: addendum to the Technical Design Report. Technical Design Report CMS. CERN, Geneva, 2000.
-  S. Banerjee and C. Experiment, Validation of physics models of geant4 using data from cms experiment, Journal of Physics: Conference Series 898 (2017) 042005.
-  CMS Collaboration collaboration, Photon reconstruction and identification at sqrt(s) = 7 TeV, Tech. Rep. CMS-PAS-EGM-10-005, CERN, Geneva, 2010.
-  CMS Collaboration collaboration, Isolated Photon Reconstruction and Identification at sqrts, Tech. Rep. CMS-PAS-EGM-10-006, CERN, Geneva, 2011.
-  “Keras.” https://keras.io/, 2018.
-  “Tensorflow.” https://www.tensorflow.org/, 2018.
-  V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, (USA), pp. 807–814, Omnipress, 2010.
-  N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research 15 (2014) 1929–1958.
-  M. D. Zeiler, ADADELTA: an adaptive learning rate method, CoRR abs/1212.5701 (2012) , [1212.5701].