Particlelevel kinematic fingerprints and the multiplicity of neutral particles from lowenergy strong interactions
{onecolabstract}The contamination, or background, from uninteresting lowenergy strong interactions is a major issue for data analysis at the Large Hadron Collider. In the light of the challenges associated with the upcoming higherluminosity scenarios, methods of assigning weights to individual particles have recently started to be used with a view to rescaling the particle fourmomentum vectors. We propose a different approach whereby the weights are instead employed to reshape the particlelevel kinematic distributions in the data. We use this method to estimate the number of neutral particles originating from lowenergy strong interactions in different kinematic regions inside individual collision events. Given the parallel nature of this technique, we anticipate the possibility of using it as part of particlebyparticle event filtering procedures at the reconstruction level at future highluminosity hadron collider experiments.
Keywords: 29.85.Fj; High Energy Physics; Particle Physics; Large Hadron Collider; LHC; background discrimination; mixture models; latent variable models; sampling; Gibbs sampler; Markov Chain Monte Carlo; Expectation Maximisation
1 Introduction
The subtraction of contamination from lowenergy physics processes described by Quantum Chromodynamics (QCD) is a critical task at the Large Hadron Collider (LHC). The impact of such a correction is going to become even more significant in the upcoming scenarios whereby the highenergy parton scattering of interest will be superimposed with a higher number of lowenergy interactions associated with collisions between other protons, the socalled pileup events.
Pileup results in the presence of multiple vertices inside collision events, which often makes the study of rare processes particularly challenging. Subtraction techniques are well established, and typically combine tracking information for charged particles with estimates of the energy flow associated with neutral particles
A number of pileup subtraction techniques have been proposed over the years and are part of the core reconstruction pipelines at hadron collider experiments. With a view to achieving improved performance at higher luminosity, techniques that work at the level of individual particles inside collision events have recently been proposed [2, 3, 4] and are being evaluated at the LHC. In particular, particle weighting methods have been presented whereby individual particles are assigned a probability for their origin in soft QCD interactions as opposed to the signal hard parton scattering. The weights are typically used either to rescale the particle fourmomentum vectors [2] or in conjunction with multiple interpretations of the data [5].
In this article, we propose a different approach whereby the weights are instead employed to reshape the particlelevel kinematic distributions inside individual collision events.
We build on a view of events as mixtures of particles originating from different physics processes, namely a signal hard parton scattering and background lowenergy strong interactions
Due to the quantum nature of the underlying physics, the kinematic distributions of particles originating from a given process, e.g. from lowenergy strong interactions, normally exhibit a certain degree of variability across collisions. In other words, individual events can be associated with distinctive particlelevel pileup kinematic patterns, or “fingerprints”, as discussed in section 5.4.
We report on the use of this technique on simulated data and show that our algorithm produces reasonable estimates of the number of neutral pileup particles in different kinematic regions inside events regardless of whether or not particles originating from the hard parton scattering are present. To our knowledge, this is the first method of estimating how neutral pileup particles are distributed in different kinematic regions inside individual events thereby taking into account the inherent variability across collisions.
We expect this technique to improve further on the resolution of the missing transverse energy
Missing transverse energy plays an important role in a number of physics analysis scenarios at the LHC, particularly with regard to searches for new particles beyond the Standard Model, which is the currentlyaccepted model of particle interactions. Notable examples are the search for Dark Matter candidates, i.e. for new particles that could explain 85% of the mass of the universe currently not accounted for, as well as searches for new particles predicted by the theory of supersymmetry and by theories that postulate the existence of extra dimensions. Moreover, missing transverse energy is an essential ingredient in the study of a number of Standard Model processes, such as the decay of the recentlydiscovered Higgslike boson to pairs of leptons, as well as processes that involve bosons and top quarks in the final state.
In our previous studies, we proposed the idea of filtering individual events particle by particle at the reconstruction level in order to improve on the rejection of contamination from lowenergy strong interactions in highluminosity hadron collider environments. The algorithm that we describe in this article is a simplified deterministic variant of the Markov Chain Monte Carlo technique that we used in [6, 7].
It is our opinion that the simplicity and parallelisation potential of this technique make it a promising candidate for inclusion in particlebyparticle event filtering procedures at the reconstruction level at future highluminosity hadron collider experiments.
We see this algorithm as complementary to the particle weighting methods that have been recently proposed at the LHC. Since our technique is based on a different approach, we expect its combination with stateoftheart algorithms to result in improved performance at higher pileup rates. As more particle weighting methods are proposed, we also envisage the possibility of combining the different weights, e.g. in the context of a multivariate framework, with a view to exploiting all the information available in the data at the level of individual particles.
2 Background
Historically, methods of subtracting contamination from soft QCD interactions have been developed with a view to correcting observables associated with hard jets, i.e. with collections of finalstate particles produced by the showering and hadronisation of scattered highenergy partons.
The state of the art includes a number of techniques that are often based on different principles, and that are typically used in combination at the ATLAS and CMS experiments at the LHC.
In this context, an important role is played by correction procedures that relate to the concept of jet area [8], which provides a measure of the susceptibility of jets to contamination from lowenergy particles. A core ingredient of jet areabased algorithms is the estimation of an eventlevel soft QCD transverse momentum
With the introduction of jet substructure techniques, soft QCD contamination started to be studied in terms of individual components inside hard jets, thereby exploiting the hierarchical structure of jets. This has resulted in an important suite of new tools for reconstruction and analysis at the LHC, particularly with regard to jet grooming [9, 10, 11, 12, 13] and jet cleansing [14].
In general terms, the algorithmic evolution outlined above has gradually moved toward more “local” estimates of soft QCD contamination that take into account the variability across collisions as well as inhomogeneity inside individual events. This has ultimately led to the development of methods working at the level of individual particles as the most finegrained level of information available in the data. Notable examples are PUPPI [1, 2], SoftKiller [3] and the particlelevel technique presented in [4].
For instance, PUPPI exploits the existence of collinear singularities in the physics that underlies the showering process. This makes it possible to assign individual particles weights that reflect the likelihood of them originating from the hard parton scattering as opposed to soft QCD interactions. Specifically, the weights rely on a measure of proximity between particles in a space defined in terms of particle transverse momentum, , pseudorapidity
3 The approach
The probability density functions (PDFs) that describe the kinematics of particles originating from soft QCD interactions as opposed to a hard parton scattering reflect the properties of the underlying physics processes, and describe the expected shapes of the corresponding particlelevel distributions. However, even when the processes involved are exactly the same, individual collision events contain independent, and therefore different, realisations of the underlying quantum processes, and the shapes of the corresponding particlelevel distributions are for this reason generally different in different events. In other words, the shape of the kinematic distribution of particles originating from soft QCD interactions is generally eventspecific, i.e. each event can in principle be associated with its own particlelevel soft QCD kinematic “fingerprint”.
A key aspect of our approach is the idea of using the particle weights to estimate the shape of the soft QCD kinematic distribution in terms of particle and inside individual events, thereby taking into account the inherent variability across collisions due to the presence of statistical fluctuations in the data. Given an estimate of the neutral soft QCD particle fraction in each event, this is equivalent to estimating the corresponding number of neutral soft QCD particles in different bins.
Although the actual numbers of particles originating from background soft QCD interactions as opposed to the signal hard scattering are not known, given a signal model, it is possible to estimate the expected number of signal particles, , in each region. On the other hand, the expected number of background particles normally cannot be estimated due to the nonperturbative nature of the underlying physics processes.
If and denote the unknown true numbers of signal and background soft QCD particles in each bin, then , where is the corresponding number of particles in the data. In general, whenever an event exhibits an excess
(1) 
where the average is taken over the space. This also implies that the statistical fluctuations on the number of particles in a given region are typically dominated by the fluctuations on the number of soft QCD particles, i.e. . Under such conditions, it is reasonable to express the estimated number of soft QCD particles in terms of
(2) 
In the following, we estimate the shape of the particlelevel distribution of neutral soft QCD particles inside individual events using an eventlevel estimate of the neutral soft QCD particle fraction, as well as PDF templates obtained from highstatistics control samples. The procedure is outlined below:

Control samples are first used to estimate the shapes of the expected (, ) distributions of neutral finalstate particles originating from soft QCD interactions and from the signal hard parton scattering. Such distributions reflect the properties of the underlying physics processes, and their shapes correspond to what is expected from an average over multiple events.

The overall fraction of neutral soft QCD particles in each event is estimated based on the corresponding charged particle fraction.

The above information is used to define weights that reflect the probability for individual particles to originate from soft QCD interactions as opposed to the hard scattering.

The weights are employed to reshape the particlelevel (, ) distribution in the data, with a view to estimating the number of neutral soft QCD particles in different kinematic regions event by event.
4 The algorithm
For the purpose of this study, the particlelevel space is each event has been subdivided into bins of widths and GeV/c. We focus on particles with GeV/c, which are the majority of those produced by soft QCD interactions. The algorithm consists of the following steps, along the lines discussed in the previous section:
 [label=0]

Obtain the shapes of the particlelevel PDFs from the highstatistics control samples. In the following, and will denote the PDFs of neutral particles originating from soft QCD interactions and from the signal hard scattering, respectively.

In each event, estimate the overall fraction of neutral soft QCD particles, , in terms of the corresponding charged particle fraction, :
(3) The role of the correction factor , which is estimated from Monte Carlo as described in section 5, is to correct on average for the difference between neutral and charged particle kinematics. This includes a correction for the number of charged particles with below 500 MeV/c that do not reach the tracking detectors. Taking the minimum in (3) ensures that is always lower than 1.

Combine the above information into particle weights:
(4) with . The quantity provides an estimate of the probability for individual particles in each bin to originate from soft QCD interactions as opposed to the hard parton scattering of interest.

Use to reshape the distribution of neutral particles in the data in order to estimate the distribution of neutral soft QCD particles in each event. The expected number of neutral soft QCD particles, , is estimated in terms of
(5) where is the corresponding number of neutral particles in the data. Given the expected number , the unknown number of neutral soft QCD particles in each bin can be treated as a random variable following a binomial distribution with mean given by (5) and standard deviation
(6) 
Estimate the number of neutral soft QCD particles in each bin in terms of:
(7)
It is worth noticing that the algorithm is inherently parallel, since different bins can be processed independently. It is our opinion that the simplicity and parallelisation potential of this technique make it a promising candidate for inclusion in future particlelevel event filtering procedures upsteam of jet reconstruction at highluminosity hadron collider experiments.
In this article, we have used the weights defined in (4) to illustrate this approach. However, it should be emphasised that the idea of employing the weights to reshape the particlelevel distribution inside individual events does not require this choice of weights, and can in principle be used in conjunction with any particle weighting procedure, as discussed in section 5.3.
5 Results
We discuss the results of a feasibility study of this approach on Monte Carlo data at the generator level. We used Pythia 8.176 [15, 16] to generate 1,000 events, each consisting of a hard parton scattering at TeV superimposed with 50 soft QCD interactions to simulate the presence of pileup.
5.1 Control sample PDF templates
We generated control sample data sets containing particles originating from the signal hard scattering and particles associated with background soft QCD interactions. These data sets were used to obtain the shapes of the PDFs of signal and background neutral particles. The latter reflect the particlelevel kinematic signatures of the underlying physics processes, from which the corresponding distributions in the data generally deviate due to the presence of statistical fluctuations.
Figure 1 displays the corresponding distributions of neutral soft QCD particles (a) and of neutral particles from the hard scattering (b), each normalised to unit volume.
5.2 Eventbyevent neutral particle fractions
One of the pieces of information required by the choice of weights that we have made for the purpose of this study is an eventbyevent estimate of the overall fraction of neutral particles originating from soft QCD interactions.
We estimate the neutral pileup particle fraction in each event in terms of the corresponding charged fraction. We apply a correction factor, , that represents an average over multiple Monte Carlo events according to (3), i.e , where and are the overall neutral and charged pileup particle fractions in the event, respectively. Specifically, the correction factor is given by , where () is the fraction of neutral (charged) pileup particles estimated from Monte Carlo, and the average is taken over the 1,000 events generated in this study.
Figure 2 displays the ratio between the fraction of neutral pileup particles and the corresponding quantity for charged particles in the events generated. The results shown in the following have been obtained using , which corresponds to the mean of the distribution in figure 2. Multiple runs of the algorithm were performed whereby was varied within 5% of its nominal value, and produced consistent results.
5.3 Particle weights
For the purpose of this study, the particle weights have been defined as a function of the fraction of neutral soft QCD particles in each event as well as of the control sample PDF templates, according to (4). To our knowledge, this is the first particle weighting method that directly exploits the particlelevel kinematic signatures of the underlying physics processes in the space. We see this choice of weights as complementary to those adopted in the recentlyproposed algorithms that use measures of proximity between particles defined in terms of particle , and .
According to the above choice of weights, all particles in the same bin are assigned the same weight. However, as previously noticed, the idea of employing the weights to reshape the distribution in the data is more general and can in principle be used in conjunction with any particle weighting method. In fact, if denotes the set of particles in each bin in the data and the corresponding particle multiplicity, rescaling by reduces to (5) when .
In other words, rescaling the distribution in the data in order to estimate the number of soft QCD particles across the particle kinematic space is equivalent to setting the bin contents to a function of the data that is given by . If it were known which particles in the event originate from the hard scattering and which from soft QCD interactions, the weights would be either 0 or 1. With reference to those bins that contain no signal particles, i.e. where all particles in the bin originate from soft QCD interactions, and . With regard to those bins, the problem of estimating the number of soft QCD particles becomes trivial, and . Correspondingly, the uncertainty associated with the estimated number of soft QCD particles, according to (6), becomes zero.
In reality, it is only possible to estimate a probability for individual particles to originate from either process, also in the light of the role played by colour connection, and . The results presented in the following show that our method produces reasonable estimates of the number of neutral soft QCD particles in different bins regardless of whether or not particles originating from the signal hard scattering are present. In general, the accuracy of is expected to increase with increasing accuracy of the weights.
It is worth noticing that the use of is associated with a relatively coarsegrained decomposition of the space, particularly along the axis. However, as mentioned above, we are not proposing to use these weights in isolation, but rather in combination with other particlelevel metrics, such as those presented in [2].
It should also be emphasised that encodes properties of the underlying physics processes that are not employed by other methods, and we expect the combined use of different weights to be beneficial. For instance, some of the results presented in [2] seem to suggest oversubtraction of soft QCD particles, whereby particles originating from the hard scattering are erroneously misidentified as pileuprelated. We foresee the possibility of implementing optimised particle weighting algorithms that make use of all the particlelevel information available in the data, thereby further improving on the performance of pileup subtraction in highluminosity environments.
5.4 Soft QCD kinematic fingerprints
A key aspect of our approach is the use of binbybin estimates of the neutral soft QCD particle fractions in order to estimate how the corresponding particles are distributed across the space in each event. This method therefore takes into account the variability of the shape of the soft QCD particle distribution across collisions, i.e. it enables the estimation of eventspecific soft QCD kinematic “fingerprints” at the particle level.
The performance of the algorithm is illustrated in the following with regard to one of the Monte Carlo events generated in this study, chosen as a reference. Consistent results were obtained on all events analysed.
Figure 3 (a) displays the true particlelevel (, ) distribution of neutral soft QCD particles in the reference event. As shown by a comparison with the corresponding highstatistics control sample distribution in figure 1 (a), the particlelevel soft QCD distribution inside individual events generally exhibits local features that are washed out when multiple events are lumped together.
5.5 Soft QCD particle counting
We illustrate here the proposed use of particle weights to reshape the particlelevel distribution in the data, using the weights defined in section 5.3 as an example. The objective is to estimate , which describes how soft QCD neutral particles are distributed across the space inside individual events. A tentative estimate could in principle be obtained using the control sample PDF templates, and , in terms of
(8) 
where is the total number of neutral particles in the event. However, although this relies on an eventbyevent estimate of the neutral soft QCD particle fraction, it cannot account for the inhomogeneity of the distribution of soft QCD particles that is typically observed inside individual events.
If the unknown true probability densities for a particle with transverse momentum and pseudorapidity to originate from the signal hard scattering and from background soft QCD interactions are denoted by and , the true fractions of background and signal particles inside each bin are and , respectively, where and are the widths of the bins along the and axes.
In reality, and are not known, and we use the corresponding control sample PDF templates in (4) to estimate the local fractions of soft QCD particles.
If the unknown true number of soft QCD particles, , is higher than the corresponding average number due to fluctuations in the data, a fraction of the excess will add to the estimated number of neutral soft QCD particles in that bin. At the same time, a fraction will be associated with the hard parton scattering. A similar line of reasoning can be applied to a situation whereby fluctuations lead to a depletion in terms of the number of soft QCD particles with respect to the average. In other words, is generally expected to reflect the true distribution more accurately than (8).
While represents the expected number of neutral soft QCD particles, the unknown actual number can be treated as a random variable following a binomial distribution with mean and standard deviation given by (6). The estimated distribution of neutral soft QCD particles in the event chosen to illustrate our method is shown in figure 3 (b). As can be seen, the local features of the distribution due to the presence of fluctuations in the data are reasonably described, e.g. the excess at and GeV/c.
The same results are shown in figure 4, where the estimated number of soft QCD particles across the space is superimposed with a heatmap corresponding to the relative uncertainty on . As expected, the uncertainty is higher in lowerstatistics bins, i.e. is a more reliable estimate of the local fraction of soft QCD particles in more highlypopulated bins.
Based on these results, we expect the use of this technique in conjunction with the recentlyproposed particle weighting methods to be particularly beneficial in the region MeV/c, which contains most of the particles originating from soft QCD interactions.
In order to verify the agreement between the shapes of the estimated and of the true distributions of neutral soft QCD particles, we compared the estimated distribution to the true one using Monte Carlo truth information in different bins. Figure 5 displays the ratio between the control sample distribution of neutral soft QCD particles and the corresponding true distribution in the reference event as a function of particle in bins of width 0.05 GeV/c between 0 and 0.5 GeV/c. The error bars correspond to one Poisson standard deviation on the number of particles in the control sample. The plots highlight the effect of statistical fluctuations in the data, which are responsible for the observed discrepancies between the shapes of the particlelevel distributions inside individual events and the “average” shape that corresponds to the highstatistics control sample.
The corresponding ratio between the estimated and the true distribution in the reference event is displayed in figure 6, which shows a significantlyimproved agreement. The error bars are calculated based on (6).
(GeV/c)  (GeV/c)  

2.25  0.025  3  20  0.946  21.7  1.1  0.022  0.0005 
2.25  0.075  3  16  0.946  18  1  0.039  0.0029 
2.25  0.125  2  24  0.945  24.6  1.2  0.011  0.0014 
1.75  0.025  4  25  0.938  27.2  1.3  0.019  0.0005 
1.75  0.075  3  34  0.945  35  1.4  0.009  0.0007 
1.75  0.125  3  18  0.939  19.7  1.1  0.023  0.0029 
1.25  0.025  4  12  0.938  15  1  0.093  0.0023 
1.25  0.075  2  20  0.94  20.7  1.1  0.008  0.0006 
1.25  0.125  2  22  0.937  22.5  1.2  0.016  0.002 
0.75  0.025  3  28  0.937  29  1.4  0.007  0.0002 
0.75  0.075  3  29  0.935  29.9  1.4  0.011  0.0008 
0.75  0.125  5  16  0.933  19.6  1.1  0.08  0.01 
0.25  0.025  2  32  0.937  31.9  1.4  0.03  0.0007 
0.25  0.075  3  26  0.935  27.1  1.3  0.005  0.0004 
0.25  0.125  1  28  0.934  27.1  1.3  0.047  0.0059 
0.25  0.025  4  22  0.935  24.3  1.3  0.026  0.0006 
0.25  0.075  3  29  0.933  29.9  1.4  0.013  0.001 
0.25  0.125  2  14  0.932  14.9  1  0.005  0.0006 
0.75  0.025  4  17  0.933  19.6  1.1  0.048  0.0012 
0.75  0.075  4  30  0.94  31.9  1.4  0.008  0.0006 
0.75  0.125  2  27  0.936  27.2  1.3  0.024  0.0031 
1.25  0.025  1  15  0.939  15  1  0.026  0.0006 
1.25  0.075  4  12  0.943  15.1  0.9  0.098  0.0073 
1.25  0.125  5  23  0.941  26.3  1.2  0.048  0.006 
1.75  0.025  3  18  0.939  19.7  1.1  0.023  0.0006 
1.75  0.075  3  40  0.943  40.6  1.5  0.017  0.0013 
1.75  0.125  2  24  0.944  24.6  1.2  0.012  0.0015 
2.25  0.025  2  20  0.941  20.7  1.1  0.006  0.0002 
2.25  0.075  2  34  0.945  34  1.4  0.023  0.0017 
2.25  0.125  2  28  0.944  28.3  1.3  0.019  0.0023 
Table 1 further illustrates the performance of this technique on the same event presented in the plots. The figures in the table correspond to bins with GeV/c and with at least two particles in the data, i.e. . The columns correspond to the centres of the and bins, to , , , , , , and to .
As pointed out in section 5.3, although this approach is being presented with reference to scenarios where the number of signal particles is on average much lower than the number of soft QCD particles, the task of estimating the latter becomes trivial in the limit where the number of signal particles is zero, since in that case and . However, in practice, it is not known which bins contain particles originating from the signal hard scattering and which do not.
With a view to verifying that our results are more accurate than those that would be obtained if the presence of signal particles in the data was neglected, the deviation of the estimated number of neutral soft QCD particles from the corresponding true value, , was compared to the true number of signal particles, . Figure 7 displays in those bins that contain more than 1 particle in the data, at least one of which originating from the signal hard scattering. The absolute difference between the estimated number of neutral soft QCD particles and the unknown true number averaged over the events analysed in this study was found to be , where denotes the average over those bins that contain at least 1 background particle, and the subscript “” refers to the average over the events. The average absolute error on on the data set analysed was therefore found to be lower than 1 particle.
It should also be emphasised that, although we are not explicitly proposing our algorithm with reference to pileup subtraction inside jets, we also envisage the possibility of combining this method with stateoftheart jet calibration techniques, e.g. using local estimates of neutral pileup particle multiplicity as constraints in jet substructure algorithms.
5.6 Missing transverse energy resolution
In this section, we employ a similar approach to [2] whereby the weights are used to rescale the particle fourmomentum vectors, with a view to assessing the impact of the weights defined in section 5.3 on the resolution of the missing transverse energy, .
A full analysis of the impact on resolution at the LHC is outside the scope of this article. We are here providing a preliminary estimate concentrating on the effect of pileup, assuming 50 vertices per event.
It is worth recalling that the measurement of relies on information that is provided by independent sources, such as the calorimeters, the trackers and the muon subdetectors, and that it is sensitive both to pileup contamination and to beaminduced effects [17].
In particular, is one of the observables that are most significantly affected by contamination from soft QCD interactions at hadron colliders. It is estimated that each additional pileup interaction at the LHC adds GeV in quadrature to the Particle Flow resolution and, although existing methods have been shown to be extremely useful below 35 vertices per event [1], there is still a margin for improvement. Moreover, it is not clear what the performance of the existing techniques is going to be like as the number of vertices per event increases.
The pileuprelated contribution to the resolution associated with the use of the weights defined in section 5.3 is a function of the deviation of from the corresponding true value, . Specifically, when is used to rescale the fourmomentum of a particle with transverse momentum , a deviation of the weight from its true value results in a fraction of the particle being assigned to the wrong physics process. If denotes the deviation of the particle weight from its true value, the amount of incorrectlyassigned for that particle is given by .
Figure 8 (a) displays the estimated number of neutral soft QCD particles across the space, superimposed with a heatmap corresponding to in MeV/c. A comparison with figure 4 (b) shows that those bins that exhibit a lower relative uncertainty on the estimated number of soft QCD particles, , also correspond to lower , as expected. It is worth noticing that does not depend on Monte Carlo truth information and can be estimated directly from the data. This makes it possible to restrict the use of this technique to bins that are associated with a value of lower than a predefined threshold.
The quantity provides an estimate of the contribution of individual particles in the event to the resolution with reference to pileup contamination. A preliminary estimate of the impact of on the resolution can be given in terms of , where is the total number of particles in the event and is the standard deviation of averaged over the space. For the sake of an approximate calculation, we assume .
Figure 8 (b) shows the distribution of over the Monte Carlo events analysed in this study. In each event, the average in is weighted with the number of particles in the data, , and is taken over the space. Assuming a total number of particles in the event
It should be emphasised that the above remarks relate to a generatorlevel study and exclusively concentrate on the effect of pileup. While the latter is a major source of concern in the upcoming higherluminosity regimes at the LHC, a proper investigation will also have to take into account other factors, such as detector effects, misreconstrution, beamrelated events, and contamination from cosmic muons. Moreover, a full investigation of this technique in a proper analysis framework will need to be performed.
6 Conclusions and outlook
We have presented a proof of concept of a new approach to the use of particle weights at highluminosity hadron collider experiments, a distinctive feature of which is the idea of employing the weights to reshape the particlelevel kinematic distributions in the data. We have applied this method to the task of estimating the number of neutral particles associated with pileup, i.e. with lowenergy strong interactions from other protonproton collisions, in different kinematic regions inside collision events. Pileup is a major source of contamination at the Large Hadron Collider, and its impact on physics analysis is expected to become even more significant in the upcoming higherluminosity regimes.
We build on a view of collision events as mixtures of particles originating from different physics processes, whereby the use of particle weights helps resolve the conceptual issues associated with colour connection.
Because of the quantum nature of the underlying physics processes, the kinematic patterns of pileup particles are typically different in different events, i.e. individual events can be associated with distinctive pileup kinematic “fingerprints” at the particle level. We have shown that our approach makes it possible to estimate the number of neutral pileup particles in different kinematic regions inside events with reasonable accuracy, regardless of whether or not particles originating from the signal hard scattering are present. Since the estimates do not correspond to average numbers of particles, but rather to the actual numbers in each event, our approach takes into account the inherent variability across collisions due to the presence of statistical fluctuations in the data.
With regard to the reconstruction pipelines of the experiments, we concentrate on a stage whereby individual particles have not yet been assigned to jets, i.e. to collections of finalstate particles originating from the same scattered hard parton.
We expect the combined use of this technique with existing methods to result in furtherimproved performance in terms of pileup subtraction in higherluminosity scenarios at the Large Hadron Collider, particularly with reference to the contamination from lowenergy neutral particles. From a broader perspective, as more particle weighting methods are proposed, we envisage the possibility of combining the different weights, e.g. using multivariate techniques, with a view to making use of all the information available in the data with regard to which process individual particles originated from. It is also our opinion that the simplicity and parallelisation potential of this algorithm make it a promising candidate for inclusion in particlelevel event filtering procedures upstream of jet reconstruction at future highluminosity hadron collider experiments.
We intend to investigate possible ways of improving on the performance of this method, as well as to study more in detail the relation between this algorithm and the Markov Chain Monte Carlo technique that we used in a previous study where we proposed the idea of filtering individual collision events on a particlebyparticle basis at highluminosity hadron colliders.
Acknowledgments
The author wishes to thank the High Energy Physics Group at Brunel University London for a stimulating environment, and particularly Prof. Akram Khan, Prof. Peter Hobson and Dr. Paul Kyberd for fruitful conversations, as well as Dr. Ivan Reid for help with technical issues. Particular gratitude also goes to the High Energy Physics Group at University College London, especially to Prof. Jonathan Butterworth for his valuable comments. The author also wishes to thank Prof. Trevor Sweeting at the UCL Department of Statistical Science, as well as Dr. Alexandros Beskos at the same department for fruitful discussions. Finally, particular gratitude goes to Prof. Carsten Peterson and to Prof. Leif Lönnblad at the Department of Theoretical Physics, Lund University.
Footnotes
 Whenever neutral particles are referred to in the text, neutrinos are not considered.
 It is worth noticing that, while the idea of assigning individual particles a single process of origin is per se conceptually flawed in hadron collider environments due to the presence of colour connection, the use of particle weights provides the required flexibility in interpreting the origin of individual particles.
 Missing transverse energy is the eventlevel energy imbalance measured on a plane perpendicular to the direction of the colliding particle beams.
 The transverse momentum, , of a particle is defined as the absolute value of the component of the particle momentum vector on a plane perpendicular to the direction of the colliding beams.
 Particle pseudorapidity, is a kinematic quantity expressed in terms of the particle polar angle in the laboratory frame by .
 A similar line of reasoning applies to a depletion in the number of particles.
 Both charged and neutral
References
 The CMS Collaboration 2014 PAS JME14001
 Bertolini D, Harris P, Low M and Tran N 2014 J. High Energy Phys. 1410:059
 Cacciari M, Salam G P and Soyez G 2014 (Preprint arXiv:1407.0408 [hepph])
 Berta P, Spousta M, Miller D W and Leitner R 2014 (Preprint J. High Energy Phys. 1406:092
 Kahawala D, Krohn D and Schwartz M D 2013 J. High Energy Phys. 1306:006
 Colecchia F 2013 J. Phys.: Conf. Ser. 410 012028
 Colecchia F 2012 J. Phys.: Conf. Ser. 368 012031
 Cacciari M and Salam G P 2008 Phys. Lett. B 659:11926
 Butterworth J M, Davison A R, Rubin M and Salam G P 2008 Phys. Rev. Lett. 100 242001
 Krohn D, Thaler J and Wang LT 2010 J. High Energy Phys. 1002:084
 Ellis S D, Vermilion C K and Walsh J R 2009 Phys. Rev. D 80, 051501
 Ellis S D, Vermilion C K and Walsh J R 2010 Phys. Rev. D 81, 094023
 Larkoski A J, Marzani S, Soyez G and Thaler J 2014 J. High Energy Phys. 1405:146
 Krohn D, Schwartz M D, Low M and Wang LT Phys. Rev. D 90, 065020
 Sjöstrand T, Mrenna S and Skands P 2006 J. High Energy Phys. 0605:026
 Sjöstrand T, Mrenna S and Skands P 2008 Comput. Phys. Comm. 178
 The CMS Collaboration 2013 EPJC 73 2568