Continuous simulation of hypothetical physics processes with multiple free parameters
We propose a new approach to simulate hypothetical physics processes which are defined by multiple free parameters. Compared to the conventional grid-scan approach, the new method can produce accurate estimations of the detector acceptance and signal event yields continuously over the parameter space with fewer simulation events. The performance of this method is illustrated with two realistic cases.
keywords:Continuous simulation; Bayesian Neural Network; Multivariate;
[cor1]Corresponding author. Jiahang.Zhong@cern.ch, (+41)0227674478. B104 2-C07 CERN, CH-1211, Switzerland.
As data collected by experiments at the Large Hadron Collider (LHC) is rapidly increasing, many efforts have been devoted to searches for various Beyond the Standard Model (BSM) physics at the energy frontier. Many BSM models are defined by multiple free parameters, such as the masses of hypothetical particles and their coupling constants, which are to be constrained by data. The various cross sections and branching ratios, as well as detector acceptance and signal event yields, depend on the parameters. Consequently, the search results for BSM physics are usually expressed in terms of excluded regions in the multi-dimensional parameter space.
In most BSM searches, the hypothetical signal processes are studied in the “grid-scan” approach, i.e. Monte-Carlo (MC) samples are simulated with specific values of the parameters, corresponding to discrete points on a grid in the parameter space. Each MC sample generally contains a considerable number of events for adequate statistical precision. The signal yields are estimated from MC results of each sample and interpolated between these grid points. However, to produce many samples with full-detector simulation is computationally expensive, especially if sufficient granularity in the multivariate parameter space is desired. It suffers from the so-called “curse of dimensionality”, i.e. the number of grid points needed increases exponentially with the number of free parameters. Usually, the grid-scanned parameter space is limited to two dimensions at a time, keeping the other parameters fixed. Moreover, the grid is often constructed sparsely. As a result, the search limits derived are subject to systematic uncertainties from the interpolation which have to be estimated.
In this paper, we propose a new method to achieve continuous prediction of signal yields over the entire parameter space, under the assumption that the acceptance is a smooth function of the parameters. With such an assumption, the simulated events on neighboring space points are no longer treated independently, therefore fewer events are needed in total.
This paper is organized as follows. In section 2 we describe the technical procedure of this method. Then we illustrate the performance of the approach in section 3, with a 2-D example which simulates the production of the hypothetical right-handed W boson and heavy Majorana neutrino in the Left-Right Symmetric Model (LRSM). Another 4-D example of a simplified SuperSymmetric (SUSY) model is demonstrated in section 4, to highlight the advantage in higher-dimensional parameter space. In section 5, we discuss potential improvements that can be implemented in various special scenarios.
2 Continuous simulation
In order to estimate the signal yield, two parts of information are needed from MC simulation. One is the production rate of the considered process (denoted as ), which includes the cross-section and the branching ratio. This can normally be obtained at the event generation level and is relatively light-weight from a computational point of view. The more relevant part is the detector acceptance , namely the probability for the events to be successfully reconstructed and selected. This is related to the detector coverage and the reconstruction efficiency, for which many MC events with parton showers and full detector-simulation are needed for a reliable estimation. The signal yield (N) is the product of these two components and the integrated luminosity (),
Considering the fact that detector simulation is the dominant component of the computing requirement, it is crucial to reduce the statistics demand of the acceptance study. In the grid-scan method, this is achieved by limiting the grid size in the parameter space and correspondingly the number of MC samples. However, a considerable number of events are still required within each sample, in order to evaluate the acceptance with sufficient statistical accuracy.
Contrary to the grid-scan method, the continuous simulation technique simulates events over more parameter space points, either stochastically distributed or on a grid of very fine granularity. The overall number of MC events needed is reduced by generating only a few events at each selected parameter space point. If we treated each point independently, little statistical precision on the determination of could be achieved with those few events alone. However, under the assumption that changes smoothly with respect to the parameters , the MC events on neighboring points help to constrain . This is realized by fitting the function using a Bayesian Neural Network (BNN) MacKay (1995).
Neural Network (NN) is a very powerful technique for multivariate pattern recognition as it can handle multi-dimensional non-linear data without suffering much of the ”curse of dimensionality”. In the past decade, it has been extensively used in the field of Particle Physics. In most applications it was used to construct a multivariate discriminant, with the aim of separating signals from backgrounds Abazov et al. (2001); Chiappetta et al. (1994); Abramowicz et al. (1995). On the other hand, NN has long been recognized as a universal approximator for multivariate functions, as long as it has sufficient neurons Cybenko (1989); Hornik et al. (1989). This functionality as a non-parametric regression tool has also been exploited by physicists in recent years Forte et al. (2002); Rojo (2006); Graczyk et al. (2010).
Our usage of BNN to fit the acceptance distribution is also a regression application. It differs from previous regression applications in that the output of our function is a probability taking values between 0 and 1, and that our input samples are MC events with Boolean target values to indicate whether the event is selected or not. The grid-scan approach needs to accumulate many MC events at each single point before it can evaluate to a good precision. Instead, the BNN algorithm we used here Zhong et al. (2011) estimates the whole probability distribution in an unbinned manner. It can achieve similar precision to the grid-scan method with much fewer MC events.
This unbinned regression is carried out as follows. For each generated MC event, the corresponding parameters of the model are passed to the BNN as the input variables. In addition, a target value of 1 or 0 is assigned to each event, depending on whether it survived the selection cuts. Then the BNN is fitted (or ”trained”) to have its output approximating the probability distribution by minimizing the cross-entropy cost function
where loops over all simulated events. The cross-entropy function is derived from Bernoulli likelihood and naturally attributes the meaning of success probability to the output, .
The trained BNN can then be used to estimate . Given any point in the parameter space, the BNN will output the estimated value of the acceptance. Multiplying it to the production rate estimated from MC generators and the integrated luminosity, we obtain the corresponding signal yield at any .
The Bayesian implementation of the NN provides not only the fitted value but also the estimation of the uncertainty for each predicted Zhong et al. (2011). This uncertainty estimation reflects the statistical fluctuation of the input events as well as the goodness of BNN fitting. It is based on the same training sample so that no additional simulation and computing resources are needed.
3 Demo: 2-D Left-Right Symmetric Model
In the following we will demonstrate the application of the continuous MC technique by simulating the hypothetical production of a right-handed W boson () which subsequently decays into a heavy Majorana neutrino () as predicted in the Left-Right Symmetrical Models (LRSM). This is a typical BSM model searched for at the LHC Ferrari et al. (2000). Two leptons are expected in the final state, as shown in figure 1.
The masses of the two hypothetical particles, and , are generally considered as the free parameters of this model. These values affect not only the cross-section of the process, but also the kinematics used in the event selection. In this example, we considered the parameter space region with mass between 0.5 TeV and 1.5 TeV, which is being actively studied at the LHC experiments. The mass was limited to be smaller than the mass in order to accommodate the concerned decay mode.
The Pythia generator Sjostrand et al. (2006) is used to simulate this process. For simplicity, detector effects are not included in this demonstration. Only the acceptance at the generator level is studied. From the above-mentioned mass ranges we picked 100,000 points stochastically, which are dense enough to cover the 2D parameter space. Then 10 MC events were generated at each point so that we have a total of 1 million events simulated for this demonstration of continuous MC. The reason for generating multiple events at each point is to reduce the overhead of generator re-initialization and the computing load of BNN training. However, we still maintain a relatively low number of events at each point in order to spread over as many space points as possible.
The event selection we assumed requires at least two leptons (,) with transverse momentum greater than 20 GeV and pseudo-rapidity between -2.5 and 2.5. A 100% fiducial efficiency is assumed for lepton identification. In addition, a common isolation cut for background suppression is applied, requiring the transverse energy flow of all interactive particles inside a cone of size around the lepton to be less than 10% of the lepton’s transverse momentum.
The acceptance of this event selection is fitted by a BNN with forty neurons in the first hidden layer and ten neurons in the second hidden layer. This topology is arbitrarily chosen, merely to provide redundant complexity within the computing budget. The BNN can automatically constrain the complexity and avoid over-fitting by a regulator mechanism, which is tuned based on the Bayesian evidence framework MacKay (1995); Zhong et al. (2011). The fitted distribution of is given in figure 2.
To test the performance, we also simulated another set of discrete samples by the grid-scan approach. The values of are from 600 GeV to 1400 GeV with 100 GeV steps, and the values of are from 50 GeV to GeV with 50 GeV steps. A total of 171 samples have been simulated with these points, each containing 100,000 events for statistical accuracy. The acceptance function evaluated from these samples is shown in figure 2.
We then compared the values fitted by the BNN at these grid points to the reference values directly estimated from the discrete samples. The relative deviations are shown in figure 3. For most points, the deviation is well within a few percent of the reference value, which is negligible compared to the systematic uncertainties of simulation. Notably, the relative deviation becomes larger when approaches to 0. This is partly due to the smaller value of itself in this region. In addition, when approaching the edge of the parameter space of the training sample, the lack of neighboring points increases the uncertainty of the fitting.
As mentioned in section 2, the BNN also estimates the uncertainties of the fitted probabilities, denoted as . The uncertainties account for the deviation we observed, both in the central region with stable and at the edge region where approaches 0. By comparing the deviation between the fitted value and the reference value to the BNN estimated uncertainty (figure 4), we can see good consistency over the entire parameter space.
Only 1 million MC events were used by our example of continuous simulation. For the grid scan technique, a total of 17.1 million events at 171 discrete points were used to build a typical set of samples in the same parameter space region. To quantitatively compare the statistics demand, we calculated the number of events at each grid point that would yield a statistical uncertainty equal to the BNN uncertainty at the same point. It is worth mentioning that the BNN uncertainty includes not only the statistical fluctuation, but also the uncertainty on the goodness of fitting. This comparison shows, ignoring the interpolation errors, a total number of at least 6.3 million events for these 171 grid-scan points are needed in order to reach a similar precision as the 1 million continuous MC samples do.
As mentioned above, the BNN uncertainty includes the uncertainty on the goodness of fitting. The counterpart in the grid-scan approach is the interpolation uncertainty when estimating the acceptance at any point not on the grid. To evaluate this uncertainty for the grid-scan method, we assumed that the value at the interpolated point is uniformly distributed between the values at the neighboring grid points in each dimension. The uncertainty is estimated as the standard deviation of this distribution, i.e. of the difference between the two values. We averaged the upward and downward uncertainties (if they exist) to get an uncertainty for each grid point. This uncertainty of grid-scan is compared to the BNN uncertainty of continuous simulation, as shown in figure 5. We can see that the interpolation could often introduce a non-negligible uncertainty, especially in the low mass region where the acceptance changes steeply. To achieve similar interpolation precision as the BNN fitting, the grid-scan will need finer granularity and more statistics, as well as extra effort for optimization of the grid structure.
4 Demo: 4-D Simplified SuperSymmetric Model
In this section, we will illustrate the performance of the continuous simulation in a 4-D scenario. This is gluino pair production with a two-step cascade decay in one of the Simplified SuperSymmetric Models Alwall et al. (2009); Alves et al. (2011). In this model, the gluino () can decay to the Lightest Supersymmetric Particle (LSP), assumed to be the , through the decay chain of . The BSM parameters in this model are the masses of the four SUSY particles. For this demonstration, we studied the mass range of between 200 GeV and 1200 GeV. To accommodate the decay chain concerned, the mass of the was limited to between 200 GeV and M(), and the mass of the was limited to between 100 GeV and M(). Finally, the mass of the LSP was limited to between 0 GeV and M().
The Herwig++ generator Bahr et al. (2008) was used to simulate this process. For the continuous MC sample, a total number of 1 million events were generated at 100k space points, stochastically distributed in the parameter space region. For simplicity, the study is again limited to generator level acceptance. We looked for final states with at least two leptons (,) with transverse momentum greater than 20 GeV and with pseudo-rapidity between -2.5 and 2.5. A 100% fiducial efficiency was assumed for lepton identification. In addition, the transverse energy flow of all interactive particles around the lepton within a cone was required to be below 5 GeV. As a typical SUSY signature, we also required the visible energy imbalance in the transverse direction to be greater than 50 GeV. The acceptance of this di-lepton plus missing transverse energy selection was then parameterized by a BNN of the same topology as in the previous example.
For comparison, we also simulated discrete samples by grid-scan, with 100 GeV steps in all four directions. The values of M() are from 250 GeV up to 1150 GeV. The values of M() are from 200 GeV to M()-50 GeV. The values of M() are from 100 GeV to M()-100 GeV. And the values of M(LSP) are from 0 GeV to M()-100 GeV. These made up a 4-D grid with 715 points, with 30k events simulated at each point.
With these 715 grid-scan samples, we obtained reference values of ranging from 0.04 to 0.16. Using the BNN trained by the continuous MC sample, we observed similar distributions of , as shown in figure 6. Further comparison of the BNN estimations and the reference values at each point revealed that most of them are consistent, well within 10% relative deviation (fig. 7).
We also examined the uncertainty estimated by the BNN, comparing it to the actual difference between BNN estimation and the reference value. From figure 8, we can see that the uncertainties are certainly consistent with the actual deviation.
To demonstrate the statistical efficiency of continuous simulation, we again calculated the minimal number of events at each grid point, if statistical precision equivalent to the BNN uncertainty is desired. The comparison shows that, ignoring the interpolation errors, at least 3.2 million events on the 715 considered grid points are needed, which is already greater than the 1 million statistics we used for the continuous MC sample.
The grid-scan suffers more uncertainties of interpolation in a 4-D space. With the same measurements as in section 3, we estimated the interpolation uncertainties on each of the four parameters as well as their quadratic sum (fig. 9). Notably, in each direction, due to the constraint of the mass hierarchy, 55 out of the 715 points cannot have this interpolation uncertainty measured, because they have no neighboring points. Overall there are 189 points whose interpolation uncertainties are underestimated due to this. Nevertheless, comparing them with the BNN uncertainties, we can see that the interpolation errors are worse than the BNN uncertainties at a large fraction of the parameter space points (fig. 9). This indicates the necessity of a grid with finer granularity. However, increasing the number of points in each dimension by a mere factor of two would already mean 16 times larger statistics: a manifestation of the effect of the ”curse of dimensionality”.
Several potential improvements beyond what has been demonstrated can be made depending on the subject under study. For example, the choice of parameter space points can be more intelligently informed. In our examples, the parameters are chosen from the space with a flat probability distribution. However, after the first round of acceptance fitting, we may identify certain parameter space regions in which the fitted values have relatively larger uncertainties, either due to the smaller values of or due to the proximity to the edge of the parameter space of the training samples. A typical case is the low region in LRSM production, as shown in section 3. If enhancement of precision for these regions is desired, it can be achieved by simulating additional events in these regions. Fitting another BNN with the enlarged sample will then give more accurate result for such regions.
In case it is impractical to produce additional MC events, an alternative solution is to fit another BNN with the existing sample, but assign higher weights during the fitting to those events in the concerned regions. This is equivalent to repeatedly using these events as additional samples. Although the statistical fluctuation remains, it will drive the fitting to focus more on the numerical feature of these regions and rely less on the interpolation from other regions.
Another potential improvement is to split the parameter space according to different physics processes. The BNN fitting assumes smooth distribution over the parameter space. However, this assumption of smoothness may not hold at boundaries where the physics is different on each side. If such boundaries are known as a priori knowledge, independent BNNs should be applied on each sub-set of the parameter space within which the signal yield is smooth.
Besides fitting the acceptance , the production rate can also be fitted by the BNN. This might be necessary if a continuous function of is desired or if calculating cross-sections for a large number of parameter space points is impractical. The way of BNN fitting needs to be changed slightly for such a regression application. The NN output should be allowed to range over all real numbers, while the target value will be the for each event. The cost function will be the Mean-Square-Error between and , representing a Gaussian likelihood. As a result, the signal expectation can also be obtained continuously over the parameter space.
In this work, we introduced a new approach to simulate hypothetical processes with multiple free parameters and its use in estimating signal acceptance and yields continuously over the parameter space. In particular, the BNN technique is used for unbinned fitting of the acceptance, which is the key to providing continuous estimation and reducing the required number of simulation events. The two examples we showed both suggest that the acceptance can be well estimated with fewer MC events than needed in the grid-scan approach. In addition, the uncertainty of the fitted signal yield is estimated by the BNN simultaneously which is a natural consequence of the Bayesian inference.
We thank Song-Ming Wang, Rachid Mazini, Sarah Livermore for helpful discussions. Jiahang Zhong and Shih-Chang Lee are partially supported by the National Science Council, Taiwan under the contract number NSC99-2119-M-001-015. Jiahang Zhong is also supported by the UK Science and Technology Facilities Council.
- D. J. C. MacKay, Network: Computation in Neural Systems 6 (1995) 469–505.
- V. M. Abazov, et al., Phys. Lett. B517 (2001) 282–294.
- P. Chiappetta, P. Colangelo, P. De Felice, G. Nardulli, G. Pasquariello, Phys. Lett. B322 (1994) 219–223.
- H. Abramowicz, A. Caldwell, R. Sinkus, Nucl. Instrum. Meth. A365 (1995) 508–517.
- G. Cybenko, Mathematics of Control, Signals, and Systems (MCSS) 2 (1989) 303–314.
- K. Hornik, M. Stinchcombe, H. White, Neural Networks 2 (1989) 359 – 366.
- S. Forte, L. Garrido, J. I. Latorre, A. Piccione, JHEP 05 (2002) 062.
- J. Rojo, JHEP 05 (2006) 040.
- K. M. Graczyk, P. Plonski, R. Sulej, JHEP 09 (2010) 053.
- J. Zhong, R.-S. Huang, S.-C. Lee, Computer Physics Communications 182 (2011) 2655 – 2660.
- A. Ferrari, J. Collot, M.-L. Andrieux, B. Belhorma, P. de Saintignon, J.-Y. Hostachy, P. Martin, M. Wielers, Phys. Rev. D 62 (2000) 013001.
- T. Sjostrand, S. Mrenna, P. Z. Skands, JHEP 05 (2006) 026.
- J. Alwall, P. C. Schuster, N. Toro, Phys. Rev. D 79 (2009) 075020.
- D. Alves, et al., ArXiv e-prints hep-ph/1105.2838 (2011).
- M. Bahr, S. Gieseke, M. Gigg, D. Grellscheid, K. Hamilton, O. Latunde-Dada, S. Platzer, P. Richardson, M. Seymour, A. Sherstnev, B. Webber, The European Physical Journal C - Particles and Fields 58 (2008) 639–707.