Particle-flow reconstruction and global event description with the CMS detector

Particle-flow reconstruction and global event description with the CMS detector

September 16, 2019
Abstract

The CMS apparatus was identified, a few years before the start of the LHC operation at CERN, to feature properties well suited to particle-flow (PF) reconstruction: a highly-segmented tracker, a fine-grained electromagnetic calorimeter, a hermetic hadron calorimeter, a strong magnetic field, and an excellent muon spectrometer. A fully-fledged PF reconstruction algorithm tuned to the CMS detector was therefore developed and has been consistently used in physics analyses for the first time at a hadron collider. For each collision, the comprehensive list of final-state particles identified and reconstructed by the algorithm provides a global event description that leads to unprecedented CMS performance for jet and hadronic decay reconstruction, missing transverse momentum determination, and electron and muon identification. This approach also allows particles from pileup interactions to be identified and enables efficient pileup mitigation methods. The data collected by CMS at a centre-of-mass energy of 8\TeVshow excellent agreement with the simulation and confirm the superior PF performance at least up to an average of 20 pileup interactions.

\cmsNoteHeader

PRF-14-001

\RCS

\RCS \RCS

\cmsNoteHeader

PRF-14-001

0.1 Introduction

Modern general-purpose detectors at high-energy colliders are based on the concept of cylindrical detection layers, nested around the beam axis. Starting from the beam interaction region, particles first enter a tracker, in which charged-particle trajectories (tracks) and origins (vertices) are reconstructed from signals (hits) in the sensitive layers. The tracker is immersed in a magnetic field that bends the trajectories and allows the electric charges and momenta of charged particles to be measured. Electrons and photons are then absorbed in an electromagnetic calorimeter (ECAL). The corresponding electromagnetic showers are detected as clusters of energy recorded in neighbouring cells, from which the energy and direction of the particles can be determined. Charged and neutral hadrons may initiate a hadronic shower in the ECAL as well, which is subsequently fully absorbed in the hadron calorimeter (HCAL). The corresponding clusters are used to estimate their energies and directions. Muons and neutrinos traverse the calorimeters with little or no interactions. While neutrinos escape undetected, muons produce hits in additional tracking layers called muon detectors, located outside the calorimeters. This simplified view is graphically summarized in Fig. 1, which displays a sketch of a transverse slice of the CMS detector [1].

Figure 1: A sketch of the specific particle interactions in a transverse slice of the CMS detector, from the beam interaction region to the muon detector. The muon and the charged pion are positively charged, and the electron is negatively charged.

This apparent simplicity has led to a tradition at hadron colliders of reconstructing physics objects based—at least to a large extent—on the signals collected by a given detector as follows:

  • Jets consist of hadrons and photons, the energy of which can be inclusively measured by the calorimeters without any attempt to separate individual jet particles. Jet reconstruction can therefore be performed without any contribution from the tracker and the muon detectors. The same argument applies to the missing transverse momentum111The CMS coordinate system is oriented such that the axis points to the centre of the LHC ring, the axis points vertically upward, and the axis is in the direction of the counterclockwise proton beam, when looking at the LHC from above. The origin is centred at the nominal collision point inside the experiment. The azimuthal angle (expressed in radians in this paper) is measured from the axis in the plane, and the radial coordinate in this plane is denoted . The polar angle is defined in the plane with respect to the axis and the pseudorapidity is defined as . The component of the momentum transverse to the axis is denoted . The missing transverse momentum \ptmissis the vectorial sum of the undetectable particle transverse momenta. The transverse energy is defined as . (\ptmiss) reconstruction.

  • The reconstruction of isolated photons and electrons primarily concerns the ECAL.

  • The tagging of jets originating from hadronic decays and from b quark hadronization is based on the properties of the pertaining charged particle tracks, and thus mostly involves the tracker.

  • The identification of muons is principally based on the information from the muon detectors.

A significantly improved event description can be achieved by correlating the basic elements from all detector layers (tracks and clusters) to identify each final-state particle, and by combining the corresponding measurements to reconstruct the particle properties on the basis of this identification. This holistic approach is called particle-flow (PF) reconstruction. Figure 2 provides a foretaste of the benefits from this approach. This figure shows a jet simulated in the CMS detector with a transverse momentum of 65\GeV. This jet is made of only five particles for illustrative purposes: two charged hadrons (a and a ), two photons (from the decay of a ), and one neutral hadron (a ). The charged hadrons are identified by a geometrical connection (link) in the views between one track and one or more calorimeter clusters, and by the absence of signal in the muon detectors. The combination of the measurements in the tracker and in the calorimeters provides an improved determination of the energy and direction of each charged hadron, dominated by the superior tracker resolution in that particular event. The photons and neutral hadrons are in general identified by ECAL and HCAL clusters with no track link. This identification allows the cluster energies to be calibrated more accurately under either the photon or the hadron hypothesis. No attempt is made to distinguish the various species of neutral and charged hadrons in the PF reconstruction. Electrons and muons are not present in this jet. Electrons would be identified by a track and an ECAL cluster, with a momentum-to-energy ratio compatible with unity, and not connected to an HCAL cluster. Muons would be identified by a track in the inner tracker connected to a track in the muon detectors.

Figure 2: Event display of an illustrative jet made of five particles only in the view (upper panel), and in the () view on the ECAL surface (lower left) and the HCAL surface (lower right). In the top view, these two surfaces are represented as circles centred around the interaction point. The , the , and the two photons from the decay are detected as four well-separated ECAL clusters denoted . The does not create a cluster in the ECAL. The two charged pions are reconstructed as charged-particle tracks , appearing as vertical solid lines in the () views and circular arcs in the view. These tracks point towards two HCAL clusters . In the bottom views, the ECAL and HCAL cells are represented as squares, with an inner area proportional to the logarithm of the cell energy. Cells with an energy larger than those of the neighbouring cells are shown in dark grey. In all three views, the cluster positions are represented by dots, the simulated particles by dashed lines, and the positions of their impacts on the calorimeter surfaces by various open markers.

The PF concept was developed and used for the first time by the ALEPH experiment at LEP [2] and is now driving the design of detectors for possible future colliders, ILC and CLIC [3, 4], FCC-ee [5], and CEPC [6]. Attempts to repeat the experience at hadron colliders had not met with success so far. A key ingredient in this approach is the fine spatial granularity of the detector layers. Coarse-grained detectors may cause the signals from different particles to merge, especially within jets, thereby reducing the particle identification and reconstruction capabilities. Even in that case, however, the tracker resolution can be partially exploited by locally subtracting from the calorimeter energy either the energy expected from charged hadrons or the energy measured within a specific angle from the charged hadron trajectories. Such energy-flow algorithms [7, 8, 9, 10, 11, 12, 13, 14] are used in general to improve the determination of selected hadronic jets or hadronic tau decays. If, on the other hand, the subdetectors are sufficiently segmented to provide good separation between individual particles, as shown for CMS in Fig. 2, a global event description becomes possible, in which all particles are identified. From the list of identified particles, optimally reconstructed from a combined fit of all pertaining measurements, the physics objects can be determined with superior efficiencies and resolutions.

Prior to the LHC startup, however, it was commonly feared that the intricacy of the final states arising from proton-proton or heavy ion collisions would dramatically curb the advantages of the PF paradigm. The capacity to individually identify the particles from the hard scatter was indeed expected to be seriously downgraded by the proton or ion debris, the particles from pileup interactions (proton-proton interactions concurrent to the hard scatter in the same or different bunch crossings), the particle proximity inside high-energy jets, the secondary interactions in the tracker material, etc. Detailed Monte Carlo (MC) simulations performed in 2009, and the commissioning of the algorithm in the first weeks of LHC data taking at and \TeVin December 2009, and at \TeVin March 2010, demonstrated the adequacy of the CMS detector design for PF reconstruction of proton-proton collisions, with benefits similar to those observed in collisions. The holistic approach also gave ways to quickly cross-calibrate the various subdetectors, to validate their measurements, and to identify and mask detector backgrounds. The PF reconstruction was ready for use in physics analyses in June 2010, and was implemented in the high level trigger and in heavy ion collision analyses in 2011. Since then, practically all CMS physics results have been based on PF reconstruction, and the future detector upgrade designs are routinely assessed by reference to it.

This paper is organized as follows. In Section 0.2, the properties of the CMS detector are summarized in view of its PF capabilities. The implementation of the PF concept for CMS is the subject of the following two sections. Section 0.3 describes the basic elements needed for a proper particle reconstruction through its specific signals in the various subdetectors. The algorithm that links the basic elements together and the subsequent particle identification are presented in Section 0.4. The expected performance of the resulting physics objects is compared to that of the traditional methods in Section 0.5, in the absence of pileup interactions. Finally, the physics object performance observed in data, and the mitigation of the effects of pileup interactions—for which the final state particles, also exclusively reconstructed by the PF approach, provide precious additional handles—are underlined in Section 0.6.

0.2 The CMS detector

The CMS detector [1] turns out to be well-suited to PF, with:

  • a large magnetic field, to separate the calorimeter energy deposits of charged and neutral particles in jets;

  • a fine-grained tracker, providing a pure and efficient charged-particle trajectory reconstruction in jets with up to around 1\TeV, and therefore an excellent measurement of 65% of the jet energy;

  • a highly-segmented ECAL, allowing energy deposits from particles in jets (charged hadrons, neutral hadrons, and photons) to be clearly separated from each other up to a jet of the order of 1\TeV. The resulting efficient photon identification, coupled to the high ECAL energy resolution, allows for an excellent measurement of another 25% of the jet energy;

  • a hermetic HCAL with a coarse segmentation, still sufficient to separate charged and neutral hadron energy deposits in jets up to a jet of 200–300\GeV, allowing the remaining 10% of the jet energy to be reconstructed, although with a modest resolution;

  • an excellent muon tracking system, delivering an efficient and pure muon identification, irrespective of the surrounding particles.

The characteristics of the magnet and of the CMS subdetectors relevant to PF are described in this section.

0.2.1 The magnet

The central feature of the CMS design is a large superconducting solenoid magnet [15]. It delivers an axial and uniform magnetic field of 3.8\unitT over a length of 12.5\unitm and a free-bore radius of 3.15\unitm. This radius is large enough to accommodate the tracker and both the ECAL and HCAL, thereby minimizing the amount of material in front of the calorimeters. This feature is an advantage for PF reconstruction, as it eliminates the energy losses before the calorimeters caused by particles showering in the coil material and facilitates the link between tracks and calorimeter clusters. At normal incidence, the bending power of 4.9\unit to the inner surface of the calorimeter system provides strong separation between charged- and neutral-particle energy deposits. For example, a charged particle with is deviated in the transverse plane by 5\unitcm at the ECAL surface, a distance large enough to resolve its energy deposit from that of a photon emitted in the same direction.

0.2.2 The silicon inner tracker

The full-silicon inner tracking system [16, 17] is a cylinder-shaped detector with an outer radius of 1.20\unitm and a length of 5.6\unitm. The barrel (each of the two endcaps) comprises three (two) layers of pixel detectors, surrounded by ten (twelve) layers of micro-strip detectors. The 16 588 silicon sensor modules are finely segmented into 66 million pixels and 9.6 million -to--wide strips. This fine granularity offers separation of closely-spaced particle trajectories in jets.

As displayed in Fig. 3, these layers and the pertaining services (cables, support, cooling) represent a substantial amount of material in front of the calorimeters, up to 0.5 interaction lengths or 1.8 radiation lengths. At , the probability for a photon to convert or for an electron to emit a bremsstrahlung photon by interacting with this material is about 85%. Similarly, a hadron has a 20% probability to experience a nuclear interaction before reaching the ECAL surface. The large number of emerging secondary particles turned out to be a major source of complication in the PF reconstruction algorithm. It required harnessing the full granularity and redundancy of the silicon tracker measurements for this complication to be eventually overcome.

Figure 3: Total thickness of the inner tracker material expressed in units of interaction lengths (left) and radiation lengths (right), as a function of the pseudorapidity . The acronyms TIB, TID, TOB, and TEC stand for “tracker inner barrel”, “tracker inner disks”, “tracker outer barrel”, and “tracker endcaps”, respectively. The two figures are taken from Ref. [18].

The tracker measures the \ptof charged hadrons at normal incidence with a resolution of 1% for . The relative resolution then degrades with increasing \ptto reach the calorimeter energy resolution for track momenta of several hundred \GeV. Because the fragmentation of high-\ptpartons typically produces many charged hadrons at a lower \pt, the tracker is expected to contribute significantly to the measurement of the momentum of jets with a \ptup to a few TeV.

0.2.3 The electromagnetic calorimeter

The ECAL [19, 20] is a hermetic homogeneous calorimeter made of lead tungstate (PbWO) crystals. The barrel covers and the two endcap disks . The barrel (endcap) crystal length of 23 (22)\unitcm corresponds to 25.8 (24.7) radiation lengths, sufficient to contain more than 98% of the energy of electrons and photons up to 1\TeV. The crystal material also amounts to about one interaction length, causing about two thirds of the hadrons to start showering in the ECAL before entering the HCAL.

The crystal transverse size matches the small Molière radius of PbWO, 2.2\unitcm. This fine transverse granularity makes it possible to fully resolve hadron and photon energy deposits as close as 5\unitcm from one another, for the benefit of exclusive particle identification in jets. More specifically, the front face of the barrel crystals has an area of , equivalent to in the () plane. In the endcaps, the crystals are arranged instead in a rectangular grid, with a front-face area of . The intrinsic energy resolution of the ECAL barrel was measured with an ECAL supermodule directly exposed to an electron beam, without any attempt to reproduce the material of the tracker in front of the ECAL [21]. The relative energy resolution is parameterized as a function of the electron energy as {linenomath}

(1)

Because of the very small stochastic term inherent to homogeneous calorimeters, the photon energy resolution is excellent in the 1–50\GeVrange typical of photons in jets.

The ECAL electronics noise is measured to be about per crystal in the barrel (endcaps). Another important source of spurious signals arises from particles directly ionizing the avalanche photodiodes (APD), aimed at collecting the crystal scintillation light [22]. This effect gives rise to single-crystal spikes with a relative amplitude about times larger than the scintillation light. Such spikes would be misidentified by the PF algorithm as photons with an energy up to 1\TeV. Since these spikes mostly affect a single crystal and more rarely two neighbouring crystals, they are rejected by requiring the energy deposits to be compatible with arising from a particle shower: the ratios and should exceed 5% and 10% respectively, where () is the energy collected in the considered crystal (crystal pair) and () is the energy collected in the four (six) adjacent crystals. The timing of the energy deposits in excess of 1\GeVis also required to be compatible with the beam crossing time to better than 2\unitns.

A much finer-grained detector, known as preshower, is installed in front of each endcap disk. It consists of two layers, each comprising a lead radiator followed by a plane of silicon strip sensors. The two lead radiators represent approximately two and one radiation lengths, respectively. The two planes of silicon sensors have orthogonal strips with a pitch of 1.9\unitmm. When either a photon or an electron passes through the lead, it initiates an electromagnetic shower. The granularity of the detector and the small radius of the initiating shower provide an accurate measurement of the shower position. Originally, the aim of the superior granularity of the preshower was twofold: (i) resolve the photons from decays so as to discriminate them from prompt photons; and (ii) indicate the presence of a photon or an electron in the ECAL by requiring an associated signal in the preshower. Parasitic signals, however, are generated by the large number of neutral pions produced by hadron interactions in the tracker material, followed by photon conversions and electron bremsstrahlung. These signals substantially affect the preshower identification and separation capabilities. In the PF algorithm, these capabilities can therefore not be fully exploited, and the energy deposited in the preshower is simply added to that of the closest associated ECAL cluster, if any, and discarded otherwise.

0.2.4 The hadron calorimeter

The HCAL [23] is a hermetic sampling calorimeter consisting of several layers of brass absorber and plastic scintillator tiles. It surrounds the ECAL, with a barrel () and two endcap disks (). In the barrel, the HCAL absorber thickness amounts to almost six interaction lengths at normal incidence, and increases to over ten interaction lengths at larger pseudorapidities. It is complemented by a tail catcher (HO), installed outside the solenoid coil. The HO material (1.4 interaction lengths at normal incidence) is used as an additional absorber. At small pseudorapidities (), this thickness is enhanced to a total of three interaction lengths by a 20\unitcm-thick layer of steel. The total depth of the calorimeter system (including ECAL) is thus extended to a minimum of twelve interaction lengths in the barrel. In the endcaps, the thickness amounts to about ten interaction lengths.

The HCAL is read out in individual towers with a cross section for and at larger pseudorapidities. The combined (ECAL+HCAL) calorimeter energy resolution was measured in a pion test beam [24] to be {linenomath}

(2)

where is expressed in \GeV.

The typical HCAL electronics noise is measured to be per tower. Additionally, rare occurrences of high-amplitude, coherent noise were observed in the HCAL barrel [25]. This coherent noise was understood as follows. The barrel is made of two half-barrels covering positive and negative , respectively. Each half barrel is made of 18 identical azimuthal wedges, each of which contains four rows of 18 towers with the same value. All towers in a row are read out by a single pixelated hybrid photodiode (HPD). The four HPDs serving a wedge are installed in a readout box (RBX). Discharges in the HPD affect blocks of up to 18 cells at the same value in a half-barrel, while a global pedestal drifting in an RBX may affect all 72 towers in the wedge. Since this coherent HCAL noise would be misinterpreted as high-energy neutral hadrons by the PF algorithm, the affected events are identified by their characteristic topological features and rejected at the analysis level.

The HCAL is complemented by hadron forward (HF) calorimeters situated at \unitm from the interaction point that extend the angular coverage on both sides up to . The HF consists of a steel absorber composed of grooved plates. Radiation-hard quartz fibres are inserted in the grooves along the beam direction and are read out by photomultipliers. The fibres alternate between long fibres running over the full thickness of the absorber (about 165\unitcm, corresponding to typically ten interaction lengths), and short fibres covering the back of the absorber and starting at a depth of 22 \cmfrom the front face. The signals from short and long fibres are grouped so as to define calorimeter towers with a cross section over most of the pseudorapidity range. In each calorimeter tower, the signals from the short and long fibres are used to estimate the electromagnetic and hadronic components of the shower. If () denotes the energy measured in the long (short) fibres, the energy of the electromagnetic component, concentrated in the first part of the absorber, can be approximated by , and the energy of the hadronic component is the complement, \ie. Spurious signals in the HF, caused for example by high-energy beam-halo muons directly hitting the photomultiplier windows, are reduced by rejecting (i) high-energy deposits not backed up by a deposit in the same tower; (ii) out-of-time or deposits of more than 30\GeV, (iii) deposits larger than 120\GeVwith in the same tower; (iv) isolated deposits larger than 80\GeV, with small and deposits in the four neighbouring towers.

0.2.5 The muon detectors

Outside the solenoid coil, the magnetic flux is returned through a yoke consisting of three layers of steel interleaved with four muon detector planes [26, 27]. Drift tube (DT) chambers and cathode strip chambers (CSC) detect muons in the regions and , respectively, and are complemented by a system of resistive plate chambers (RPC) covering the range . The reconstruction, described in Section 0.3.3, involves a global trajectory fit across the muon detectors and the inner tracker. The calorimeters and the solenoid coil represent a large amount of material before the muon detectors and thus induce multiple scattering. For this reason, the inner tracker dominates the momentum measurement up to a \ptof about 200\GeV.

0.3 Reconstruction of the particle-flow elements

This section describes the advanced algorithms specifically set up for the reconstruction of the basic PF elements: the reconstruction of the trajectories of charged particles in the inner tracker is discussed first; the specificities of electron and muon track reconstruction are then introduced; finally, the reconstruction and the calibration of calorimeter clusters in the preshower, the ECAL, and the HCAL, are presented.

0.3.1 Charged-particle tracks and vertices

Charged-particle track reconstruction was originally aimed [28] at measuring the momentum of energetic and isolated muons, at identifying energetic and isolated hadronic decays, and at tagging b quark jets. Tracking was therefore primarily targeting energetic particles and was limited to well-measured tracks. A combinatorial track finder based on Kalman Filtering (KF) [29] was used to reconstruct these tracks in three stages: initial seed generation with a few hits compatible with a charged-particle trajectory; trajectory building (or pattern recognition) to gather hits from all tracker layers along this charged-particle trajectory; and final fitting to determine the charged-particle properties: origin, transverse momentum, and direction. To be kept for further analysis, the tracks had to be seeded with two hits in consecutive layers in the pixel detector, and were required to be reconstructed with at least eight hits in total (each contributing to less than 30% of the overall track goodness-of-fit ) and with at most one missing hit along the way. In addition, all tracks were required to originate from within a cylinder of a few mm radius centred around the beam axis and to have larger than .

The performance in terms of reconstruction efficiency and misreconstruction rate of this global combinatorial track finder can be found in Ref. [28] for muons and charged pions within jets and is shown in Fig. 4 for charged hadrons in a sample of simulated QCD multijet events as a function of the reconstructed track . The efficiency is defined as the fraction of simulated tracks reconstructed with at least 50% of the associated simulated hits, and with less than 50% of unassociated simulated hits. The misreconstruction rate is the fraction of reconstructed tracks that cannot be associated with a simulated track. The stringent track quality criteria are instrumental in keeping the misreconstructed track rate at the level of a few per cent, but limit the reconstruction efficiency to only 70–80% for charged pions with above , compared to 99% for isolated muons. Below a few tens of \GeV, the difference between pions and muons is almost entirely accounted for by the possibility for pions to undergo a nuclear interaction within the tracker material. For a charged particle to accumulate eight hits along its trajectory, it must traverse the beam pipe, the pixel detector, the inner tracker, and the first layers of the outer tracker before the first significant nuclear interaction. The probability for a hadron to interact within the tracker material, before reaching the eight-hits threshold—causing the track to be missed—can be inferred from Fig. 3 (left) and ranges between 10 and 30%. The tracking efficiency is further reduced for values above 10\GeV: these high- particles are found mostly in collimated jets, in which the tracking efficiency is limited by the silicon detector pitch, \ieby the capacity to disentangle hits from overlapping particles.

Figure 4: Efficiency (left) and misreconstruction rate (right) of the global combinatorial track finder (black squares); and of the iterative tracking method (green triangles: prompt iterations based on seeds with at least one hit in the pixel detector; red circles: all iterations, including those with displaced seeds), as a function of the track \pt, for charged hadrons in multijet events without pileup interactions. Only tracks with are considered in the efficiency and misreconstruction rate determination. The efficiency is displayed for tracks originating from within 3.5\unitcm of the beam axis and \unitcm of the nominal centre of CMS along the beam axis.

Each charged hadron missed by the tracking algorithm would be solely (if at all) detected by the calorimeters as a neutral hadron, with reduced efficiency, largely degraded energy resolution, and biased direction due to the bending of its trajectory in the magnetic field. As two thirds of the energy in a jet are on average carried by charged hadrons, a 20% tracking inefficiency would double the energy fraction of identified neutral hadrons in a jet from 10% to over 20% and therefore would degrade the jet energy and angular resolutions—expected from PF reconstruction to be dominated by the modest neutral-hadron energy resolution—by about 50%. Increasing the track reconstruction efficiency while keeping the misreconstructed rate unchanged is therefore critical for PF event reconstruction.

The tracking inefficiency can be substantially reduced by accepting tracks with a smaller (to recover charged particles with little probability to deposit any measurable energy in the calorimeters) and with fewer hits (to catch particles interacting with the material of the tracker inner layers). This large improvement, however, comes at the expense of an exponential increase of the combinatorial rate of misreconstructed tracks [30]: the misreconstruction rate is multiplied by a factor of five when the threshold is loosened to 300\MeVand increases by another order of magnitude when the total number of hits required to make a track is reduced to five. It reaches a value of up to 80% when the two criteria are loosened together. These misreconstructed tracks, made of randomly associated hits, have randomly distributed momenta and thus would cause large energy excesses in PF reconstruction.

Iterative tracking

To increase the tracking efficiency while keeping the misreconstructed track rate at a similar level, the combinatorial track finder was applied in several successive iterations [18], each with moderate efficiency but with as high a purity as possible. At each step, the reduction of the misreconstruction rate is accomplished with quality criteria on the track seeds, on the track fit , and on the track compatibility with originating from one of the reconstructed primary vertices, adapted to the track , , and number of hits . In practice, no quality criteria are applied to tracks reconstructed with at least eight hits, as the misreconstruction rate is already small enough for these tracks. The hits associated with the selected tracks are masked in order to reduce the probability of random hit-to-seed association in the next iteration. The remaining hits may thus be used in the next iteration to form new seeds and tracks with relaxed quality criteria, increasing in turn the total tracking efficiency without degrading the purity. The same operation is repeated several times with progressively more complex and time-consuming seeding, filtering, and tracking algorithms.

\topcaption

Seeding configuration and targeted tracks of the ten tracking iterations. In the last column, is the targeted distance between the track production position and the beam axis. Iteration Name Seeding Targeted Tracks 1 InitialStep pixel triplets prompt, high \pt 2 DetachedTriplet pixel triplets from b hadron decays, 3 LowPtTriplet pixel triplets prompt, low \pt 4 PixelPair pixel pairs recover high \pt 5 MixedTriplet pixel+strip triplets displaced, 6 PixelLess strip triplets/pairs very displaced, 7 TobTec strip triplets/pairs very displaced, 8 JetCoreRegional pixel+strip pairs inside high \pt jets 9 MuonSeededInOut muon-tagged tracks muons 10 MuonSeededOutIn muon detectors muons

The seeding configuration and the targeted tracks of each of the ten iterations are summarized in Table 0.3.1. The tracks from the first three iterations are seeded with triplets of pixel hits, with additional criteria on their distance of closest approach to the beam axis. The resulting high purity allows the requirements on and on the track to be loosened to typically three and 200\MeV, respectively. With an overall efficiency of 80%, the fractions of hits masked for the next iterations amount to 40% (20%) in the pixel (strip) detector. The fourth and fifth iterations aim at recovering tracks with one or two missing hits in the pixel detector. They address mostly detector inefficiencies, but also particle interactions and decays within the pixel detector volume. The next two iterations are designed to reconstruct very displaced tracks. Without pixel hits to seed the tracks, they can only be processed after the first five iterations, which offer an adequate reduction of the number of leftover hits in the strip detector. The eighth iteration addresses specifically the dense core of high-\ptjets. In these jets, hits from nearby tracks may merge and be associated with only one track—or even none because of their poorly determined position—causing the tracking efficiency to severely decrease. Merged pixel hit clusters, found in narrow regions compatible with the direction of high-energy deposits in the calorimeters, are split into several hits. Each of these hits is paired with one of the remaining hits in the strip detector to form a seed for this iteration. The last two iterations are specifically designed to increase the muon-tracking reconstruction efficiency with the use of the muon detector information in the seeding step.

As shown in Fig. 4, the prompt iterations, which address tracks seeded with at least one hit in the pixel detector (iterations 1, 2, 3, 4, 5, and 7), recover about half of the tracks with above 1\GeVmissed by the global combinatorial track finder, with slightly smaller misreconstruction rate levels. These iterations also extend the acceptance to the numerous particles with as small as 200\MeV, typically below the calorimeter thresholds. (Particles with a \ptbetween 200 and 700\MeVnever reach the calorimeter barrel, but follow a helical trajectory to one of the calorimeter endcaps.) With such performance, and also because track reconstruction was found to be twice as fast with several iterations than in a single step (because of the much smaller number of seeds identified at each step), iterative tracking quickly became the default method for CMS. Despite the significant improvement, the tracking efficiency at high \ptremains limited. The consequences for jet energy and angular resolutions are minute, as the calorimeter resolutions are already excellent at these energies. The significant increase of the misreconstructed track rate at high \ptis dealt with when the information from the calorimeters and the muon system becomes available, as described in Section 0.4.

Nuclear interactions in the tracker material

Nuclear interactions in the tracker material may lead to either a kink in the original hadron trajectory, or to the production of a number of secondary particles. On average, two thirds of these secondary particles are charged. Their reconstruction efficiency is enhanced by the sixth and seventh iterations of the iterative tracking. The tracking efficiency and misreconstruction rate with all iterations included are displayed in Fig. 4. While the displaced-track iterations typically add 5% to the tracking efficiency, they also increase the total misreconstruction rate by 1% for tracks with between 1 and 20\GeV. The relative misreconstruction rate of these iterations is therefore at the level of 20%.

A dedicated algorithm was thus developed to identify tracks linked to a common secondary displaced vertex within the tracker volume [31, 32]. Figure 5 shows the positions of these reconstructed nuclear interaction vertices in the inner part of the tracker. The observed pattern matches well the tracker layer structure and material. The misreconstruction rate is further reduced with a specific treatment of these tracks in the PF algorithm, described in Section 0.4.

Figure 5: Maps of nuclear interaction vertices for data collected by CMS in 2011 at \TeV, corresponding to an integrated luminosity of 1\nbinv, in the longitudinal (left) and transverse (right) cross sections of the inner part of the tracker, exhibiting its structure in concentric layers around the beam axis.

0.3.2 Tracking for electrons

Electron reconstruction, originally aimed at characterizing energetic, well-isolated electrons, was naturally based on the ECAL measurements, without emphasis on the tracking capabilities. More specifically, the traditional electron seeding strategy (hereafter called the ECAL-based approach) [33] makes use of energetic ECAL clusters (\GeV). The cluster energy and position are used to infer the position of the hits expected in the innermost tracker layers under the assumptions that the cluster is produced either by an electron or by a positron. Because of the significant tracker thickness (Fig. 3 right), most of the electrons emit a sizeable fraction of their energy in the form of bremsstrahlung photons before reaching the ECAL. The performance of the method therefore depends on the ability to gather all the radiated energy, and only that energy. The energy of the electron and of possible bremsstrahlung photons is collected by grouping into a supercluster the ECAL clusters reconstructed in a small window in and an extended window in around the electron direction (to account for the azimuthal bending of the electron in the magnetic field).

For electrons in jets, however, the energy and position of the associated supercluster are often biased by the overlapping contributions from other particle deposits, leading to large inefficiencies. In addition, the backward propagation from the supercluster to the interaction region is likely to be compatible with many hits from other charged particles in the innermost tracker layers, causing a substantial misreconstruction rate. To keep the latter under control, the ECAL-based electron seeding efficiency has to be further limited, \egby strict isolation requirements, to values that are unacceptably small in jets when a global event description is to be achieved. Similarly, for electrons with small , whose tracks are significantly bent by the magnetic field, the radiated energy is spread over such an extended region that the supercluster cannot include all deposits. The missed deposits bias the position of the supercluster and prevent it from being matched with the proper hits in the innermost tracker layers.

To reconstruct the electrons missed by the ECAL-based approach, a tracker-based electron seeding method was developed in the context of PF reconstruction. The iterative tracking (Section 0.3.1) is designed to have a large efficiency for these electrons: nonradiating electrons can be tracked as efficiently as muons and radiating electrons produce either shorter or lower \pttracks largely recovered by the loose requirements on the number of hits and on the \ptto form a track. All the tracks from the iterative tracking are therefore used as potential seeds for electrons, if their exceeds 2\GeV.

The large probability for electrons to radiate in the tracker material is exploited to disentangle electrons from charged hadrons. When the energy radiated by the electron is small, the corresponding track can be reconstructed across the whole tracker with a well-behaved and be safely propagated to the ECAL inner surface, where it can be matched with the closest ECAL cluster. (Calorimeter clustering and track-cluster matching in PF are described in Sections 0.3.4 and 0.4.1, respectively.) For these tracks to form an electron seed, the ratio of the cluster energy to the track momentum is required to be compatible with unity. In the case of soft photon emission, the pattern recognition may still succeed in collecting most hits along the electron trajectory, but the track fit generally leads to a large value. When energetic photons are radiated, the pattern recognition may be unable to accommodate the change in electron momentum, causing the track to be reconstructed with a small number of hits. A preselection based on the number of hits and the fit is therefore applied and the selected tracks are fit again with a Gaussian-sum filter (GSF) [34]. The GSF fitting is more adapted to electrons than the KF used in the iterative tracking, as it allows for sudden and substantial energy losses along the trajectory. At this stage, a GSF with only five components is used, in order to keep the computing time under control. A final requirement is applied to the score of a boosted-decision-tree (BDT) classifier that combines the discriminating power of the number of hits, the of the GSF track fit and its ratio to that of the KF track fit, the energy lost along the GSF track, and the distance between the extrapolation of the track to the ECAL inner surface and the closest ECAL cluster.

The electron seeds obtained with the tracker- and ECAL-based procedures are merged into a unique collection and are submitted to the full electron tracking with twelve GSF components. The significant increase of seeding efficiency brought by the tracker-based approach is shown in the left panel of Fig. 6 for electrons in b quark jets. The probability for a charged hadron to give rise to an electron seed is displayed in the same figure. At this preselection stage, the addition of the tracker-based seeding almost doubles the electron efficiency and extends the electron reconstruction down to a \ptof 2\GeV. These improvements come with an increase of misidentification rate, dealt with at a later stage of the PF reconstruction, when more information becomes available (Section 0.4.3). Here, the misidentification rate is only a concern for the electron track reconstruction computing time, kept within reasonable limits by the preselection. For isolated electrons, the ECAL-based seeding is already quite effective, but the tracker-based seeding improves the overall efficiency by several per cent, as shown in the right panel of Fig. 6, and makes it possible to reconstruct electrons with a below 4\GeV.

Figure 6: Left: Electron seeding efficiency for electrons (triangles) and pions (circles) as a function of , from a simulated event sample enriched in b quark jets with \ptbetween 80 and 170\GeV, and with at least one semileptonic b hadron decay. Both the efficiencies for ECAL-based seeding only (hollow symbols) and with the tracker-based seeding added (solid symbols) are displayed. Right: Absolute efficiency gain from the tracker-based seeding for electrons from Z boson decays as a function of . The shaded bands indicate the \ptbin size and the statistical uncertainties on the efficiency.

The tracker-based seeding is also effective at selecting electrons and positrons from conversions in the tracker material, for both prompt and bremsstrahlung photons. The recovery of the converted photons of the latter category and their association to their parent electrons is instrumental in minimizing energy double counting in the course of the PF reconstruction.

0.3.3 Tracking for muons

Muon tracking [28, 27] is not specific to PF reconstruction. The muon spectrometer allows muons to be identified with high efficiency over the full detector acceptance. A high purity is granted by the upstream calorimeters, meant to absorb other particles (except neutrinos). The inner tracker provides a precise measurement of the momentum of these muons. The high-level muon physics objects are reconstructed in a multifaceted way, with the final collection being composed of three different muon types:

  • standalone muon. Hits within each DT or CSC detector are clustered to form track segments, used as seeds for the pattern recognition in the muon spectrometer, to gather all DT, CSC, and RPC hits along the muon trajectory. The result of the final fitting is called a standalone-muon track.

  • global muon. Each standalone-muon track is matched to a track in the inner tracker (hereafter referred to as an inner track) if the parameters of the two tracks propagated onto a common surface are compatible. The hits from the inner track and from the standalone-muon track are combined and fit to form a global-muon track. At large transverse momenta, , the global-muon fit improves the momentum resolution with respect to the tracker-only fit.

  • tracker muon. Each inner track with larger than 0.5\GeVand a total momentum in excess of 2.5\GeVis extrapolated to the muon system. If at least one muon segment matches the extrapolated track, the inner track qualifies as a tracker muon track. The track-to-segment matching is performed in a local coordinate system defined in a plane transverse to the beam axis, where is the better measured coordinate. The extrapolated track and the segment are matched either if the absolute value of the difference between their positions in the coordinate is smaller than 3\unitcm, or if the ratio of this distance to its uncertainty (pull) is smaller than 4.

Global-muon reconstruction is designed to have high efficiency for muons penetrating through more than one muon detector plane. It typically requires segments to be associated in at least two muon detector planes. For momenta below about 10\GeV, this requirement fails more often because of the larger multiple scattering in the steel of the return yoke. For these muons, the tracker muon reconstruction is therefore more efficient, as it requires only one segment in the muon system [35].

Owing to the high efficiency of the inner track and muon segment reconstruction, about 99% of the muons produced within the geometrical acceptance of the muon system are reconstructed either as a global muon or a tracker muon and very often as both. Global muons and tracker muons that share the same inner track are merged into a single candidate. Muons reconstructed only as standalone-muon tracks have worse momentum resolution and a higher admixture of cosmic muons than global and tracker muons.

Charged hadrons may be misreconstructed as muons \egif some of the hadron shower remnants reach the muon system (punch-through). Different identification criteria can be applied to the muon tracks in order to obtain the desired balance between identification efficiency and purity. In the PF muon identification algorithm (Section 0.4.2), muon energy deposits in ECAL, HCAL, and HO are associated with the muon track and this information is used to improve the muon identification performance.

0.3.4 Calorimeter clusters

The purpose of the clustering algorithm in the calorimeters is fourfold: (i) detect and measure the energy and direction of stable neutral particles such as photons and neutral hadrons; (ii) separate these neutral particles from charged hadron energy deposits; (iii) reconstruct and identify electrons and all accompanying bremsstrahlung photons; and (iv) help the energy measurement of charged hadrons for which the track parameters were not determined accurately, which is the case for low-quality and high- tracks.

A specific clustering algorithm was developed for the PF event reconstruction, with the aims of a high detection efficiency even for low-energy particles and of separating close energy deposits, as illustrated in Fig. 2. The clustering is performed separately in each subdetector: ECAL barrel and endcaps, HCAL barrel and endcaps, and the two preshower layers. In the HF, no clustering is performed: the electromagnetic or hadronic components of each cell directly give rise to an HF EM cluster and an HF HAD cluster. All parameters of the clustering algorithm are described in turn below. Their values are summarized in Table 0.3.4.

First, cluster seeds are identified as cells with an energy larger than a given seed threshold, and larger than the energy of the neighbouring cells. The cells considered as neighbours are either the four closest cells, which share a side with the seed candidate, or the eight closest cells, including cells that only share a corner with the seed candidate. Second, topological clusters are grown from the seeds by aggregating cells with at least a corner in common with a cell already in the cluster and with an energy in excess of a cell threshold set to twice the noise level. In the ECAL endcaps, because the noise level increases as a function of , seeds are additionally required to satisfy a threshold requirement on .

An expectation-maximization algorithm based on a Gaussian-mixture model is then used to reconstruct the clusters within a topological cluster. The Gaussian-mixture model postulates that the energy deposits in the individual cells of the topological cluster arise from Gaussian energy deposits where is the number of seeds. The parameters of the model are the amplitude and the coordinates in the plane of the mean of each Gaussian, while the width is fixed to different values depending on the considered calorimeter. The expectation-maximization algorithm is an iterative algorithm with two steps at each iteration. During the first step, the parameters of the model are kept constant and the expected fraction of the energy measured in the cell at position arising from the th Gaussian energy deposit is calculated as

(3)

The parameters of the model are determined during the second step in an analytical maximum-likelihood fit yielding

(4)

The energy and position of the seeds are used as initial values for the parameters of the corresponding Gaussian functions and the expectation maximization cycle is repeated until convergence. To stabilize the algorithm, the seed energy is entirely attributed to the corresponding Gaussian function at each iteration. After convergence, the positions and energies of the Gaussian functions are taken as cluster parameters.

In the lower-right panel of Fig. 2, for example, two cluster seeds (dark grey) are identified in the HCAL within one topological cluster formed of nine cells. The two seeds give rise to two HCAL clusters, the final positions of which are indicated by two red dots. These reconstructed positions match the two charged-pion track extrapolations to the HCAL. Similarly, the bottom-left ECAL topological cluster in the lower-left panel of Fig. 2 arising from the is split in two clusters corresponding to the two photons from the decay.

\topcaption

Clustering parameters for the ECAL, the HCAL, and the preshower. All values result from optimizations based on the simulation of single photons, , , and jets. ECAL HCAL Preshower barrel endcaps barrel endcaps Cell threshold (\MeVns) 80 300 800 800 0.06 Seed # closest cells 8 8 4 4 8 Seed threshold (\MeVns) 230 600 800 1100 0.12 Seed threshold (\MeVns) 0 150 0 0 0 Gaussian width (cm) 1.5 1.5 10.0 10.0 0.2

0.3.5 Calorimeter cluster calibration

In the PF reconstruction algorithm, photons and neutral hadrons are reconstructed from calorimeter clusters. Calorimeter clusters separated from the extrapolated position of any charged-particle track in the calorimeters constitute a clear signature of neutral particles. On the other hand, neutral-particle energy deposits overlapping with charged-particle clusters can only be detected as calorimeter energy excesses with respect to the sum of the associated charged-particle momenta. An accurate calibration of the calorimeter response to photons and hadrons is instrumental in maximizing the probability to identify these neutral particles while minimizing the rate of misreconstructed energy excesses, and to get the right energy scale for all neutral particles. The calibration of electromagnetic and hadron clusters is described in Sections 0.3.5 and 0.3.5.

Electromagnetic deposits

A first estimate of the absolute calibration of the ECAL response to electrons and photons, as well as of the cell-to-cell relative calibration, has been determined with test beam data, radioactive sources, and cosmic ray measurements, all of which were collected prior to the start of collision data taking. The ECAL calibration was then refined with collision data collected at and \TeV [36].

The clustering algorithm described in Section 0.3.4 applies several thresholds to the ECAL cell energies. Consequently, the energy measured in clusters of ECAL cells is expected to be somewhat smaller than that of the incoming photons, especially at low energy, and than that of the superclusters used for the absolute ECAL calibration. A residual energy calibration, required to account for the effects of these thresholds, is determined from simulated single photons. This generic calibration is applied to all ECAL clusters prior to the hadron cluster calibration discussed in the next section, and to the particle identification step described in Section 0.4. Specific additional electron and photon energy corrections, on the other hand, are applied after the electron and photon reconstruction described in Section 0.4.3. Large samples of single photons with energies varying from 0.25 to 100\GeVwere processed through a \GEANTfoursimulation [37] of the CMS detector. Only the photons that do not experience a conversion prior to their entrance in the ECAL are considered in the analysis, in order to deal with the calibration of single clusters.

In the ECAL barrel, an analytical function of the type , where is the energy and the pseudorapidity of the cluster, is fitted to the two-dimensional distribution of the average ratio in the plane, where is the true photon energy. This function is, by construction, the residual correction to be applied to the measured cluster energy. It is close to unity at high energy, where threshold effects progressively vanish. The correction can be as large as 20% at low energy.

In the ECAL endcaps, the crystals are partly shadowed by the preshower. The calibrated cluster energy is therefore expressed as a function of the energies measured in the ECAL () and in the two preshower layers ( and ) as {linenomath}

(5)

The calibration parameters , , and depend on the energy and the pseudorapidity of the generated photon and are chosen in each bin to minimize the following , {linenomath}

(6)

In this expression, is an estimate of the energy measurement uncertainty for the th photon, with a dependence on similar to that displayed in Eq. (1), but with stochastic and noise terms typically four times larger than in the barrel. Analytical functions of the type are used to fit the equivalent three calibration parameters for the endcaps. A similar minimization, with only two parameters, is performed for the photons that leave energy only in one of the two preshower layers. The case where no energy is measured in the preshower, which includes the endcap region outside the preshower acceptance, is handled with the same method as that used for the ECAL barrel.

When it comes to evaluating the calibration parameters for actual clusters in the preshower fiducial region, is estimated from the ECAL cluster pseudorapidity, and is approximated by a linear combination of , , and , with fixed coefficients. These calibration parameters correct the ECAL energy by at the largest photon energies—meaning that an energetic photon loses on average 5% of its energy in the preshower material—and up to for the smallest photon energies. In all ECAL regions and for all energies, the calibrated energy agrees on average with the true photon energy to within .

Both the absolute photon energy calibration and the uniformity of the response can be checked with the abundant samples produced in pp collisions. To reconstruct these neutral pions, all ECAL clusters with a calibrated energy in excess of 400\MeVand identified as photons as described in Section 0.4.4 are paired. The total energy of the photon pair is required to be larger than 1.5\GeV. The resulting photon pair invariant mass distribution is displayed in Fig. 7, for simulated events and for the first LHC data recorded in 2010 at \TeV. The per-cent level agreement of the fitted mass resolutions in data and simulation, and that of the fitted mass values with the nominal mass, demonstrate the adequacy of the simulation-based ECAL cluster calibration for low-energy photons.

Figure 7: Photon pair invariant mass distribution in the barrel ( 1.0) for the simulation (left) and the data (right). The signal is modelled by a Gaussian (red curve) and the background by an exponential function (blue curve). The Gaussian mean value (vertical dashed line) and its standard deviation are denoted and , respectively.

Hadron deposits

Hadrons generally deposit energy in both ECAL and HCAL. The ECAL is already calibrated for photons as described in the previous section, but has a substantially different response to hadrons. The initial calibration of the HCAL was realized with test beam data for 50\GeVcharged pions not interacting in the ECAL, but the calorimeter response depends on the fraction of the shower energy deposited in the ECAL, and is not linear with energy. The ECAL and HCAL cluster energies therefore need to be substantially recalibrated to get an estimate of the true hadron energy.

The calibrated calorimetric energy associated with a hadron is expressed as {linenomath}

(7)

where and are the energies measured in the ECAL (calibrated as described in Section 0.3.5) and the HCAL, and where and are the true energy and pseudorapidity of the hadron. The coefficient (in \GeV) accounts for the energy lost because of the energy thresholds of the clustering algorithm and is taken to be independent of . Similarly to what is done in Section 0.3.5, a large sample of simulated single neutral hadrons (specifically, ) is used to determine the calibration coefficients , , and , as well as the functions and . Hadrons that interact with the tracker material are rejected. In a first pass, the functions and are fixed to unity. For a given value of and in each bin of , the defined as {linenomath}

(8)

where and are the true energy and the expected calorimetric energy resolution of the th single hadron, is minimized with respect to the coefficients and . The energy dependence of the energy resolution , as displayed in Eq. (2), is determined iteratively. Prior to the first iteration of the minimization, a Gaussian is fitted to the distribution of in each bin of true energy. The coefficients of Eq. (2) are then fitted to the evolution of the Gaussian standard deviation as a function of . These two operations are repeated in the subsequent iterations, for which the calibrated energy, , is substituted for the raw energy, . The procedure converges at the second iteration.

The barrel and endcap regions are treated separately to account for different thresholds and cell sizes. In each region, the determination of and is performed separately for hadrons leaving energy solely in the HCAL (in which case only is determined) and those depositing energy in both ECAL and HCAL. No attempt is made to calibrate the hadrons leaving energy only in the ECAL, as such clusters are identified as photon or electron clusters by the PF algorithm. For each of the four samples, the relatively small residual dependence of the calibrated energy on the particle pseudorapidity is corrected for in a third iteration of the minimization with second-order polynomials for and , and with and taken from the result of the second iteration.

To avoid the need for an accurate estimate of the true hadron energy (which might not be available in real data), the constant is chosen to minimize the dependence on of the coefficients and , for in excess of 10\GeV. It is estimated to amount to 2.5\GeVfor hadrons showering in the HCAL only, and 3.5\GeVfor hadrons interacting in both ECAL and HCAL. The left panel of Fig. 8 shows the coefficients and , determined for each energy bin in the barrel region, as a function of the true hadron energy. The residual dependence of these coefficients on is finally fitted to adequate continuous functional forms and , for later use in the course of the PF reconstruction. As expected, the coefficient is close to unity for 50\GeVhadrons leaving energy only in the HCAL. The larger values of the coefficient for the hadrons that leave energy also in ECAL make up for the energy lost in the dead material between ECAL and HCAL, which amounts to about half an interaction length. The fact that the coefficients and depend on the true energy up to very large values is a consequence of the nonlinear calorimeter response to hadrons.

Figure 8: Left: Calibration coefficients obtained from single hadrons in the barrel as a function of their true energy , for hadrons depositing energy only in the HCAL (blue triangles), and for hadrons depositing energy in both the ECAL and HCAL, for the ECAL (red circles) and for the HCAL (green squares) clusters. Right: Relative raw (blue) and calibrated (red) energy response (dashed curves and triangles) and resolution (full curves and circles) for single hadrons in the barrel, as a function of their true energy . Here the raw (calibrated) response and resolution are obtained by a Gaussian fit to the distribution of the relative difference between the raw (calibrated) calorimetric energy and the true hadron energy.

The right panel of Fig. 8 shows that the calibrated response, defined as the mean relative difference between the calibrated energy and the true energy, is much closer to zero than the raw response, which underestimates hadron energies by up to 40% at low energy. The calibration procedure therefore restores the linearity of the calorimeter response. The relative calibrated energy resolution, displayed in the same figure, also exhibits a sizeable improvement with respect to the raw resolution at all energies. For hadrons with an energy below 10\GeV, the resolution rapidly improves when the energy decreases. This remarkable behaviour is an effect of the convergence of the and coefficients to zero in this energy range, which itself is an artefact of the presence of the constant in the calibration procedure. The explanation is as follows. Hadrons with energy below 10\GeVoften leave too little energy in the calorimeters to exceed the thresholds of the clustering algorithm. As a consequence, those that leave energy do so because of an upward fluctuation in the showering process. Such fluctuations are calibrated away by the small and values. The procedure effectively replaces the energy of soft hadrons, measured with large fluctuations, with a constant , de facto closer to the actual hadron energy.

Isolated charged hadrons selected from early data recorded at , , and \TeVhave been used to check that the calibration coefficients determined from the simulation are adequate for real data. Section 0.4.4 describes how the calibration is applied for the identification and reconstruction of nonisolated particles. Finally, it is worth stressing at this point that this calibration affects only 10% of the measured event energy. The latter is therefore expected to be modified, on average, by only a few per cent by the calibration procedure.

0.4 Particle identification and reconstruction

0.4.1 Link algorithm

A given particle is, in general, expected to give rise to several PF elements in the various CMS subdetectors. The reconstruction of a particle therefore first proceeds with a link algorithm that connects the PF elements from different subdetectors. The event display of Fig. 2 illustrates most of the possible configurations for charged hadrons, neutral hadrons, and photons. The probability for the algorithm to link elements from one particle only is limited by the granularity of the various subdetectors and by the number of particles to resolve per unit of solid angle. The probability to link all elements of a given particle is mostly limited by the amount of material encountered upstream of the calorimeters and the muon detector, which may lead to trajectory kinks and to the creation of secondary particles.

The link algorithm can test any pair of elements in the event. In order to prevent the computing time of the link algorithm from growing quadratically with the number of particles, the pairs of elements considered by the link procedure are restricted to the nearest neighbours in the () plane, as obtained with a -dimensional tree [38]. The specific conditions required to link two elements depend on their nature, and are listed in the next paragraphs. If two elements are found to be linked, the algorithm defines a distance between these two elements, aimed at quantifying the quality of the link. The link algorithm then produces PF blocks of elements associated either by a direct link or by an indirect link through common elements.

More specifically, a link between a track in the central tracker and a calorimeter cluster is established as follows. The track is first extrapolated from its last measured hit in the tracker to—within the corresponding angular acceptance—the two layers of the preshower, the ECAL at a depth corresponding to the expected maximum of a typical longitudinal electron shower profile, and the HCAL at a depth corresponding to one interaction length. The track is linked to a cluster if its extrapolated position is within the cluster area, defined by the union of the areas of all its cells in the plane for the HCAL and the ECAL barrel, or in the plane for the ECAL endcaps and the preshower. This area is enlarged by up to the size of a cell in each direction, to account for the presence of gaps between calorimeter cells or cracks between calorimeter modules, for the uncertainty in the position of the shower maximum, and for the effect of multiple scattering on low-momentum charged particles. The link distance is defined as the distance between the extrapolated track position and the cluster position in the plane. In case several HCAL clusters are linked to the same track, or if several tracks are linked to the same ECAL cluster, only the link with the smallest distance is kept.

To collect the energy of photons emitted by electron bremsstrahlung, tangents to the GSF tracks are extrapolated to the ECAL from the intersection points between the track and each of the tracker layers. A cluster is linked to the track as a potential bremsstrahlung photon if the extrapolated tangent position is within the boundaries of the cluster, as defined above, provided that the distance between the cluster and the GSF track extrapolation in is smaller than 0.05. These bremsstrahlung photons, as well as prompt photons, have a significant probability to convert to an pair in the tracker material. A dedicated conversion finder [39] was therefore developed to create links between any two tracks compatible with originating from a photon conversion. If the converted photon direction, obtained from the sum of the two track momenta, is found to be compatible with one of the aforementioned track tangents, a link is created between each of these two tracks and the original track.

Calorimeter cluster-to-cluster links are sought between HCAL clusters and ECAL clusters, and between ECAL clusters and preshower clusters in the preshower acceptance. A link is established when the cluster position in the more granular calorimeter (preshower or ECAL) is within the cluster envelope in the less granular calorimeter (ECAL or HCAL). The link distance is also defined as the distance between the two cluster positions, in the plane for an HCAL-ECAL link, or in the plane for an ECAL-preshower link. When multiple HCAL clusters are linked to the same ECAL cluster, or when multiple ECAL clusters are linked to the same preshower clusters, only the link with the smallest distance is kept. A trivial link between an ECAL cluster and an ECAL supercluster is established when they share at least one ECAL cell.

Charged-particle tracks may also be linked together through a common secondary vertex, for nuclear-interaction reconstruction (Section 0.3.1). The relevant displaced vertices are retained if they feature at least three tracks, of which at most one is an incoming track, reconstructed with tracker hits between the primary vertex and the displaced vertex. The invariant mass formed by the outgoing tracks must exceed 0.2\GeV. All the tracks sharing a selected nuclear-interaction vertex are linked together.

Finally, a link between a track in the central tracker and information in the muon detector is established as explained in Section 0.3.3 to form global and tracker muons.

In the event shown in Fig. 2, the track T is linked to the ECAL cluster E and to the HCAL clusters H (with a smaller link distance) and H (with a larger link distance), while the track T is linked only to the HCAL clusters H and H. These two tracks form a first PF block with five PF elements: T, E, and H (corresponding to the generated ); and T and H (corresponding to the generated ). The other three ECAL clusters are not linked to any track or cluster and thus form three PF blocks on their own, corresponding to the generated pair of photons from the decay, and to the neutral kaon. Owing to the granularity of the CMS subdetectors, the majority of the PF blocks typically contain a handful of elements originating from one or few particle(s): the logic of the subsequent PF algorithm is therefore not affected by the particle multiplicity in the event and the computing time increases only linearly with multiplicity.

In each PF block, the identification and reconstruction sequence proceeds in the following order. First, muon candidates are identified and reconstructed as described in Section 0.4.2, and the corresponding PF elements (tracks and clusters) are removed from the PF block. The electron identification and reconstruction follows, as explained in Section 0.4.3, with the aim of collecting the energy of all bremsstrahlung photons. Energetic and isolated photons, converted or unconverted, are identified in the same step. The corresponding tracks and ECAL or preshower clusters are excluded from further consideration.

At this level, tracks with a \ptuncertainty in excess of the calorimetric energy resolution expected for charged hadrons (Fig. 8) are masked, which allows the rate of misreconstructed tracks at large \pt(Fig. 4) to be adequately reduced. In multijet events, 0.2% of the tracks are rejected by this requirement, on average. About 10% of these rejected tracks originate from genuine high-\ptcharged hadrons, with a \ptestimate incompatible with the true \ptvalue. Their energies are measured in that case more accurately in the calorimeters than in the tracker. The remaining elements in the block are then subject to a cross-identification of charged hadrons, neutral hadrons, and photons, arising from parton fragmentation, hadronization, and decays in jets. This step is described in Section 0.4.4.

Hadrons experiencing a nuclear interaction in the tracker material create secondary particles. These hadrons are identified and reconstructed as summarized in Section 0.4.5. When an incoming track is identified, it is used to refine the reconstruction outcome, but is otherwise ignored in the track-cluster link algorithm as well as in the particle reconstruction algorithms described in Sections 0.4.2 to 0.4.4.

Finally, when the global event description becomes available, \iewhen all blocks have been processed and all particles have been identified, the reconstructed event is revisited by a post-processing step described in Section 0.4.6.

0.4.2 Muons

In the PF algorithm, muon identification proceeds by a set of selections based on the global and tracker muon properties. Isolated global muons are first selected by considering additional inner tracks and calorimeter energy deposits with a distance to the muon direction in the plane smaller than 0.3. The sum of the \ptof the tracks and of the of the deposits is required not to exceed 10% of the muon \pt. This isolation criterion alone is sufficient to adequately reject hadrons that would be misidentified as muons, hence no further selection is applied to these muon candidates.

Muons inside jets, for example those from semileptonic heavy-flavour decays or from charged-hadron decays in flight, require more stringent identification criteria. Indeed, for charged hadrons misidentified as muons \egbecause of punch-through, the PF algorithm will tend to create additional spurious neutral particles from the calorimeter deposits. Unidentified muons, on the other hand, will be considered to be charged hadrons, and will tend to absorb the energy deposits of nearby neutral particles.

For nonisolated global muons, the tight-muon selection [35] is applied. In addition, it is required either that at least three matching track segments be found in the muon detectors, or that the calorimeter deposits associated with the track be compatible with the muon hypothesis. This selection removes the majority of high-\pthadrons misidentified as muons because of punch-through, as well as accidental associations of tracker and standalone muon tracks.

Muons that fail the tight-muon selection due to a poorly reconstructed inner track, for example because of hit confusion with other nearby tracks, are salvaged if the standalone muon track fit is of high quality and is associated with a large number of hits in the muon detectors (at least 23 DT or 15 CSC hits, out of 32 and 24, respectively). Alternatively, muons may also fail the tight-muon selection due to a poor global fit. In this case, if a high-quality fit is obtained with at least 13 hits in the tracker, the muon is selected, provided that the associated calorimeter clusters be compatible with the muon hypothesis.

The muon momentum is chosen to be that of the inner track if its is smaller than 200\GeV. Above this value, the momentum is chosen according to the smallest probability from the different track fits: tracker only, tracker and first muon detector plane, global, and global without the muon detector planes featuring a high occupancy [35].

The PF elements that make up these identified muons are masked against further processing in the corresponding PF block, \ieare not used as building elements for other particles. As discussed in Sections 0.4.4 and 0.4.6, muon identification and reconstruction is not complete at this point. For example, charged-hadron candidates are checked for the compatibility of the measurements of their momenta in the tracker and their energies in the calorimeters. If the track momentum is found to be significantly larger than the calibrated sum of the linked calorimeter clusters, the muon identification criteria are revisited, with somewhat looser selections on the fit quality and on the hit or segment associations.

0.4.3 Electrons and isolated photons

Electron reconstruction is based on combined information from the inner tracker and the calorimeters. Due to the large amount of material in the tracker, electrons often emit bremsstrahlung photons and photons often convert to pairs, which in turn emit bremsstrahlung photons, etc. For this reason, the basic properties and the technical issues to be solved for the tracking and the energy deposition patterns of electrons and photons are similar. Isolated photon reconstruction is therefore conducted together with electron reconstruction. In a given PF block, an electron candidate is seeded from a GSF track, as described in Section 0.3.2, provided that the corresponding ECAL cluster is not linked to three or more additional tracks. A photon candidate is seeded from an ECAL supercluster with larger than 10\GeV, with no link to a GSF track.

For ECAL-based electron candidates and for photon candidates, the sum of the energies measured in the HCAL cells with a distance to the supercluster position smaller than 0.15 in the () plane must not exceed 10% of the supercluster energy. To ensure an optimal energy containment, all ECAL clusters in the PF block linked either to the supercluster or to one of the GSF track tangents are associated with the candidate. Tracks linked to these ECAL clusters are associated in turn if the track momentum and the energy of the HCAL cluster linked to the track are compatible with the electron hypothesis. The tracks and ECAL clusters belonging to identified photon conversions linked to the GSF track tangents are associated as well.

The total energy of the collected ECAL clusters is corrected for the energy missed in the association process, with analytical functions of and . These corrections can be as large as 25% at where the tracker thickness is largest, and at low . This corrected energy is assigned to the photons, and the photon direction is taken to be that of the supercluster. The final energy assignment for electrons is obtained from a combination of the corrected ECAL energy with the momentum of the GSF track and the electron direction is chosen to be that of the GSF track [40].

Electron candidates must satisfy additional identification criteria. Specifically, up to fourteen variables—including the amount of energy radiated off the GSF track, the distance between the GSF track extrapolation to the ECAL entrance and the position of the ECAL seeding cluster, the ratio between the energies gathered in HCAL and ECAL by the track-cluster association process, and the KF and GSF track and numbers of hits—are combined in BDTs trained separately in the ECAL barrel and endcaps acceptance, and for isolated and nonisolated electrons.

Photon candidates are retained if they are isolated from other tracks and calorimeter clusters in the event, and if the ECAL cell energy distribution and the ratio between the HCAL and ECAL energies are compatible with those expected from a photon shower. The PF selection is looser than the requirements typically applied at analysis level to select isolated photons. The reconstruction of less energetic or nonisolated photons is discussed in Section 0.4.4.

All tracks and clusters in the PF block used to reconstruct electrons and photons are masked against further processing. Tracks identified as originating from a photon conversion but not used in the process are masked as well, as they are typically poorly measured and likely to be misreconstructed tracks. The distinction between electrons and photons in the PF global event description can be different from a selection optimized for a specialized analysis. To deal with this complication, the complete history of the electron and photon reconstruction is tracked and saved, to allow a different event interpretation to be made without running the complete PF algorithm again.

0.4.4 Hadrons and nonisolated photons

Once muons, electrons, and isolated photons are identified and removed from the PF blocks, the remaining particles to be identified are hadrons from jet fragmentation and hadronization. These particles may be detected as charged hadrons (, , or protons), neutral hadrons (\eg or neutrons), nonisolated photons (\egfrom decays), and more rarely additional muons (\egfrom early decays of charged hadrons).

The ECAL and HCAL clusters not linked to any track give rise to photons and neutral hadrons. Within the tracker acceptance (), all these ECAL clusters are turned into photons and all these HCAL clusters are turned into neutral hadrons. The precedence given in the ECAL to photons over neutral hadrons is justified by the observation that, in hadronic jets, 25% of the jet energy is carried by photons, while neutral hadrons leave only 3% of the jet energy in the ECAL. (This fraction is reduced by one order of magnitude for taus, for which decays to final states with neutral hadrons are Cabibbo-suppressed to a branching ratio of about 1%.) Beyond the tracker acceptance, however, charged and neutral hadrons cannot be distinguished and they leave in total 25% of the jet energy in the ECAL. The systematic precedence given to photons for the ECAL energy is therefore no longer justified. For this reason, ECAL clusters linked to a given HCAL cluster are assumed to arise from the same (charged- or neutral-) hadron shower, while ECAL clusters without such a link are classified as photons. These identified photons and hadrons are calibrated as described in Sections 0.3.5 and 0.3.5. The estimated true energy of each identified particle, needed for the determination of the calibration coefficients, is taken to be the raw calorimetric energy, \ie for photons, for hadrons inside the tracker acceptance, and for hadrons outside the tracker acceptance. The HF EM and HF HAD clusters are added to the particle list as HF photons and HF hadrons without any further calibration.

Each of the remaining HCAL clusters of the PF block is linked to one or several tracks (not linked to any other HCAL cluster) and these tracks may in turn be linked to some of the remaining ECAL clusters (each linked to only one of the tracks). The calibrated calorimetric energy is determined with the procedure described in Section 0.3.5 from the energy of the HCAL cluster and the total energy of the ECAL clusters, under the single charged-hadron hypothesis. The true energy, needed to determine the calibration coefficients and , is estimated to be either the sum of the momenta of the tracks, or the sum of the raw ECAL and HCAL energies, whichever is larger. The sum of the track momenta is then compared to the calibrated calorimetric energy in order to determine the particle content, as described below.

If the calibrated calorimetric energy is in excess of the sum of the track momenta by an amount larger than the expected calorimetric energy resolution for hadrons, the excess may be interpreted as the presence of photons and neutral hadrons. Specifically, if the excess is smaller than the total ECAL energy and larger than 500\MeV, it is identified as a photon with an energy corresponding to this excess after recalibration under the photon hypothesis, as described in Section 0.3.5. Otherwise, the recalibrated ECAL energy still gives rise to a photon, and the remaining part of the excess, if larger than 1\GeV, is identified as a neutral hadron. Each track gives rise to a charged hadron, the momentum and energy of which are directly taken from the corresponding track momentum, under the charged-pion mass hypothesis.

If the calibrated calorimetric energy is compatible with the sum of the track momenta, no neutral particle is identified. The charged-hadron momenta are redefined by a fit of the measurements in the tracker and the calorimeters, which reduces to a weighted average if only one track is linked to the HCAL cluster. This combination is particularly relevant when the track parameters are measured with degraded resolutions, \egat very high energies or at large pseudorapidities. It ensures a smooth transition between the low-energy regime, dominated by the tracker measurements, and the high-energy regime, dominated by the calorimetric measurements. The resulting energy resolution is always better than that of the calorimetric energy measurement, even at the highest energies.

In rare cases, the calibrated calorimetric energy is significantly smaller than the sum of the track momenta. When the difference is larger than three standard deviations, a relaxed search for muons, which deposit little energy in the calorimeters, is performed. All global muons remaining after the selection described in Section 0.4.2, and for which an estimate of the momentum exists with a relative precision better than 25%, are identified as PF muons and the corresponding tracks are masked. The redundancy of the measurements in the tracker and the calorimeters thus allows a few more muons to be found without increasing the misidentified muon rate. If the track momentum sum is still significantly larger than the calorimetric energy, the excess in momentum is often found to arise from residual misreconstructed tracks with a uncertainty in excess of 1\GeV. These tracks are sorted in decreasing order of their uncertainty and are sequentially masked either until no such tracks remain in the PF block or until the momentum excess disappears, whichever comes first. Less than 0.3 per mil of the tracks in multijet events are affected by this procedure. In general, after these two steps, either the compatibility of total calibrated calorimetric energy with the reduced sum of the track momenta is restored, or a calorimetric energy excess appears. These cases are treated as described above.

The event of Fig. 2 is interpreted by the PF algorithm as follows. The three ECAL clusters E, E, and E, are within the tracker acceptance, and thus no link with any HCAL cluster is created. As they are not linked to any track either, the three corresponding PF blocks give rise to one photon each. The first two correspond to the photons from the generated decay, and the third one to the energy deposited in the ECAL by the generated , which is therefore misidentified by the algorithm and calibrated as a photon. The fourth PF block consists of the two tracks T and T, the ECAL cluster E, and the two HCAL clusters H and H. The track T is initially linked to E, as well as to the two HCAL clusters. Only the link to the closest HCAL cluster, H, is kept. Similarly, only the link of T to H is kept. The clusters H and E, and the track T give rise to a charged hadron, corresponding to the generated , the direction of which is that of T. The calibrated calorimetric energy is obtained under the charged-hadron hypothesis, from the E and H raw energies, with an estimate of the true hadron energy given by the momentum of T. As the calibrated energy is found to be compatible with the momentum of T, no neutral particle is identified and the charged hadron energy is obtained from the weighted average of the track momentum and the calibrated calorimetric energy. Similarly, the cluster H and the track T give rise to a second charged hadron, corresponding to the generated .

0.4.5 Nuclear interactions in the tracker material

A hadron interaction in the tracker material often results in the creation of a number of charged and neutral secondary particles originating from a secondary interaction vertex. One such secondary vertex is reconstructed (Section 0.3.1) and identified (Section 0.4.1) on average in a typical top-quark pair event. The secondary particles, whether or not the secondary vertex is identified, are reconstructed as charged particles (mostly charged hadrons, but also muons and electrons), photons, and neutral hadrons by the PF algorithm, as explained in Sections 0.4.2 to 0.4.4.

When the secondary charged-particle tracks are linked together by an identified nuclear-interaction vertex, the secondary charged particles are replaced in the reconstructed particle list by a single primary charged hadron. Its direction is obtained from the vectorial sum of the momenta of the secondary charged particles, its energy is given by the sum of their energies (denoted ), and its mass is set to the charged-pion mass. The nuclear-interaction vertex may also include an incoming track, not used so far in the PF reconstruction. The direction of the primary charged hadron is taken in that case to be that of the incoming track. If, in addition, the momentum of the incoming track is well measured, it is used to estimate the energy of undetected secondary particles, reconstructed neither as secondary charged particles nor as neutral particles. The energy of the primary charged hadron is then estimated as {linenomath}

(9)

The small fraction of undetected energy in this expression is obtained from the simulation of single charged-hadron events.

0.4.6 Event post-processing

Although the particles reconstructed and identified by the algorithms presented in Sections 0.4.1 to 0.4.5 are the result of an optimized combination of the information from all subdetectors, a small, but nonzero, probability of particle misidentification and misreconstruction cannot be avoided. In general, these individual particle mishaps tend to average out and are hardly noticeable when global event quantities are evaluated. In some rare cases, however, an artificially large missing transverse momentum, \ptmiss, is reconstructed in the event. This large \ptmiss, most often caused by a misidentified or misreconstructed high-\ptmuon, may lead the event to be wrongly selected by a large set of new physics searches, and therefore needs to be understood and corrected. The strategy for the post-processing algorithm consists of three steps: the high-\ptparticles that may lead to a large artificial \ptmissare selected; the correlation of the particle transverse momentum and direction with the \ptmissamplitude and direction is quantified; the identification and the reconstruction of these particles are a posteriori modified, if this change is found to reduce the \ptmissby at least one half.

The first cause of muon-related artificial \ptmissis the presence of genuine muons from cosmic rays traversing CMS in coincidence with an LHC beam crossing. These cosmic muons are identified when their trajectories are more than 1\unitcm away from the beam axis, and are removed from the particle list if the measured \ptmissis consequently reduced. Muons from semileptonic decays of b hadrons also can, albeit rarely, be reconstructed more than 1\unitcm away from the beam axis and therefore be considered by this rejection algorithm. In these semileptonic decays, however, the direction of the missing momentum caused by the accompanying neutrino is strongly correlated with the muon direction, and the removal of the muon would further increase this missing momentum instead of reducing it. As the direction of the rest of the \ptmissin these rare events, if any, is uncorrelated with that of the b hadron, such muons are in practice always kept in the particle list.

The second cause of muon-related artificial \ptmiss, still from genuine muons, is a severe misreconstruction of the muon momentum. Such a misreconstruction is identified by significant differences between the available estimates of the muon momentum (Section 0.4.2). Large differences may be caused by a wrong inner track association, an interaction in the steel yoke, a decay in flight, or substantial synchrotron radiation. In this case, the choice of the momentum done by the PF algorithm is reviewed for muons with \GeV. If the \ptmissis reduced by at least half, the momentum estimate that leads to the smallest \ptmissvalue is taken.

The third cause of muon-related artificial \ptmissis particle misidentification. For example, a punch-through charged hadron can be misidentified as a muon. In that case, an energetic neutral hadron, resulting from the energy deposited by the charged hadron in the calorimeters, is wrongly added to the particle list and leads to significant \ptmissin the opposite direction. If both the muon momentum and the neutral hadron energy are larger than 100\GeV, the neutral hadron is removed from the particle list, the muon is changed to a charged hadron, and the charged-hadron momentum is taken to be that of the inner track, provided that it allows the \ptmissto be reduced by at least one half.

An energetic tracker or global muon (\GeV) can also fail the strict identification criteria of Section 0.4.2 and still be missed by the recovery algorithm of Section 0.4.4, because it overlaps with an energetic neutral hadron with similar energy. In that case, the muon candidate is misidentified as a charged hadron in the course of the PF reconstruction, and the neutral hadron disappears in the process, leading to significant \ptmissin the same direction. These charged hadrons are turned into muons and a neutral hadron is added to the particle list with the associated calorimetric energy, if the \ptmissis reduced by at least half in the operation.

These criteria were originally designed to reduce the fraction of events with large \ptmissin standard model multijet events from data and simulated samples, in the context of a search for new physics in hadronic events with large \ptmissat \TeV. A systematic visual inspection of the events observed with unexpectedly large \ptmissvalues in these early data proved to be particularly instrumental in identifying undesired features, either in the software producing inputs to the PF algorithm, or in the PF algorithm itself, or even in the detector hardware. These shortcomings were taken care of immediately with software fixes or workarounds (either in the PF algorithm itself or in the post-processing step described above), which consequently improved the core response and resolution of the physics objects described in Section 0.5. Physics events with genuine \ptmiss, such as semileptonic \ttbarevents in data, or simulated processes predicted by new physics theories (supersymmetry, heavy gauge bosons, etc.), were checked to be essentially unaffected by the post-processing algorithm. The reason is twofold: On the one hand, the fraction of misreconstructed or misidentified muons is minute (typically smaller than 0.1 per mil) and on the other, the presence of genuine \ptmiss, uncorrelated with these reconstruction shortcomings, causes the already rare reassignments proposed by the post-processing algorithm not to reduce, in general, the observed \ptmissvalue.

0.5 Performance in simulation

The particles identified and reconstructed by the PF algorithm, described in Section 0.4, can be used straightforwardly in physics analyses. In the absence of pileup interactions—the case studied in this section—these particles are meant to match the stable particles of the final state of the collision.

In this section, the performance of the PF reconstruction is assessed with pp collision events generated with \PYTHIA8.205 [41, 42] at a centre-of-mass energy of 13\TeV. All events are processed by the CMS \GEANTfoursimulation without any pileup effects, and by the CMS reconstruction algorithms. The reconstructed particles are used to build the physics objects, namely jets, the missing transverse momentum \ptmiss, muons, electrons, photons, and taus. They are also used to compute other quantities related to these physics objects, such as particle isolation. These physics objects and observables are compared to the ones obtained from the stable particles produced by the event generator so as to evaluate the response, the resolution, the efficiency, and the purity of the PF reconstruction. To quantify the improvements from PF, these quantities are also evaluated for the physics objects reconstructed with the techniques used prior to the PF development. An example of such a comparison is given in Fig. 9, which displays a simulated dijet event. In this event, the jets of reconstructed particles are closer in energy and direction to the jets of generated particles than the calorimeter jets.

The comparison with the data recorded by CMS at a centre-of-mass energy of 8\TeVand the influence of pileup interactions on the PF reconstruction performance are presented in Section 0.6.


Figure 9: Jet reconstruction in a simulated dijet event. The particles clustered in the two PF jets are displayed with a thicker line. For clarity, particles with are not shown. The PF jet , indicated as a radial line, is compared to the of the corresponding generated (Ref) and calorimeter (Calo) jets. In all cases, the four-momentum of the jet is obtained by summing the four-momenta of its constituents, and no jet energy correction is applied.

0.5.1 Jets

The jet performance is quantified with a sample of QCD multijet events. Jets are reconstructed with the anti-\ktalgorithm (radius parameter [43, 44]. The algorithm clusters either all particles reconstructed by the PF algorithm (PF jets), or the sum of the ECAL and HCAL energies deposited in the calorimeter towers 222A calorimeter tower is composed of an HCAL tower and the 25 underlying ECAL crystals. (Calo jets), or all stable particles produced by the event generator excluding neutrinos (Ref jets). Particle-flow jets are studied down to a \ptof , while Calo jets with a \ptlower than 20\GeVare deemed unreliable and are rejected.

Each PF (Calo) jet is matched to the closest Ref jet in the plane, with . The limit of 0.1 for PF jets is justified by the jet direction resolution being twice as good for PF jets as it is for Calo jets, as can be seen in Figure 10. This choice results in a similar matching efficiency for both PF and Calo jets. The improved angular resolution for PF jets is mainly due to the precise determination of the charged-hadron directions and momenta. In calorimeter jets, the energy deposits of charged hadrons are spread along the direction by the magnetic field, leading to an additional degradation of the azimuthal angular resolution.

Figure 10: Jet angular resolution in the barrel (left) and endcap (right) regions, as a function of the \ptof the reference jet. The resolution is expressed in radians.

On average, 65% of the jet energy is carried by charged hadrons, 25% by photons, and 10% by neutral hadrons. The ability of the PF algorithm to identify these particles within jets is studied by comparing the jet energy fractions measured in PF jets to those of the corresponding Ref jet. The distribution of the ratio between the reconstructed and reference energy fraction is shown in Fig. 11 for charged hadrons, photons, and neutral hadrons in barrel jets. An important part of the \ptcarried by neutral hadrons is reconstructed as coming from photons because the energy deposits of neutral hadrons in the ECAL are systematically identified as photons for the reasons given in Section 0.4.4. However, around 80% of the neutral hadron energy is recovered, which is demonstrated by summing up the energy of reconstructed photons and neutral hadrons for Ref jets without photons. The remaining 20% of the energy is lost because the energy deposited by neutral hadrons in the ECAL is identified as originating from photons. It is therefore calibrated under the electromagnetic hypothesis to a scale that is underestimated by 20 to 40%, as indicated by the value of the calibration coefficient in Fig. 8, which would have been used under the hadron hypothesis.

Figure 11: Distribution of the ratio between the reconstructed and reference transverse momenta, , for charged hadrons (top left), photons (top right), neutral hadrons (bottom left), and for all neutral particles in Ref jets with no photon (bottom right). The Ref jet is required to have at least 10% of its \ptcarried by particles of type , and to be located in the barrel.

The raw jet energy response, defined as the mean ratio of the reconstructed jet energy to the reference jet energy, is shown in Fig. 12. The PF jet response is almost constant as a function of the jet \ptand is close to unity across the whole detector acceptance. A jet energy correction procedure is used to bring the jet energy response to unity, which removes any dependence on \ptand  [45]. After this correction, the jet energy resolution, defined as the Gaussian width of the ratio between the corrected and reference jet energies, is shown in Fig. 13.

Figure 12: Jet response as a function of for the range (top) and as a function of in the barrel (left) and in the endcap (right) regions.
Figure 13: Jet energy resolution as a function of in the barrel (left) and in the endcap (right) regions. The lines, added to guide the eye, correspond to fitted functions with ad hoc parametrizations.

The improvements in angular resolution, energy response, and energy resolution result mostly from a more precise and accurate measurement of the jet charged-hadron momentum in the PF algorithm. In Calo jets, the charged-hadron energy is measured by the ECAL and HCAL with a resolution of and is underestimated for three reasons. First, since low-\ptcharged hadrons are swept away by the magnetic field, their energy deposits typically remain unclustered or end up in a different jet. Second, hadrons with an energy lower than 10\GeVhave a low probability to be detected in the HCAL because of shower fluctuations and early showers in the ECAL. Third, because the deposits of charged and neutral hadrons in the ECAL cannot be separated from the electromagnetic deposits without the PF algorithm, they remain calibrated at the electromagnetic scale for the reasons given above. With the PF algorithm, on the other hand, charged hadrons are reconstructed with the right direction, the correct energy scale, and with a much superior resolution in angle and momentum.

The particle content of jets in terms of particle type and energy distribution is described by the fragmentation functions and depends on the flavour of the parton that initiated the jet. Gluon jets, especially, feature on average more low-energy particles than quark jets [46], which results in a lower jet energy response. Because the flavour of the parton that initiated the jet cannot be determined with sufficient confidence in most physics analyses, the same jet energy correction is applied to all jets, and the difference in response between quark and gluon jets is considered as a source of systematic uncertainty. The relative difference in response is shown in Fig. 14 for Calo and PF jets. For the reasons detailed above, the low-energy particles in gluon jets are more likely to be captured in PF jets, and the difference between quark and gluon jet energy response is therefore smaller than for Calo jets.

Figure 14: Absolute difference in jet energy response between quark and gluon jets as a function of for Calo jets (left) and PF jets (right).

0.5.2 Missing transverse momentum

The presence of particles that do not interact with the detector material, \egneutrinos, is indirectly revealed by missing transverse momentum, often referred to as missing transverse energy [47]. The raw missing transverse momentum vector is defined in such a way as to balance the vectorial sum of the transverse momenta of all particles,

(10)

The jet-energy-corrected missing transverse momentum,

(11)

includes a term that replaces the raw momentum of each PF jet with by its corrected value . As can be seen from Fig. 12, the PF response to jets is close to unity, which makes this correction term small.

Prior to the deployment of PF reconstruction, the missing transverse momentum was evaluated as

(12)

The first term, which corresponds to the raw calorimeter missing transverse momentum, balances the total transverse momentum vector measured by the calorimeters. In this term, the transverse momentum of a given cell is calculated under the assumption that the energy measured by the cell is deposited by a massless particle coming from the origin of the CMS coordinate system. The jet momentum correction term, computed with all Calo jets with , is substantial given the relatively low response of Calo jets. The second correction term accounts for the presence of identified muons with ; it is necessary because muons do not leave significant energy in the calorimeters.

The performance improvement brought by PF reconstruction is quantified with a sample of \ttbarevents by comparing and to the reference , calculated with all stable particles from the event generator, excluding neutrinos. The \ptmissresolution must be studied for events in which the \ptmissresponse has been calibrated to unity. The is therefore required to be larger than , a value above which the jet-energy corrections are found to be sufficient to adequately calibrate the PF and Calo \ptmissresponse. Figure 15 shows the relative resolution and the angular resolution, obtained with a Gaussian fit in each bin of .

Figure 15: Relative \ptmissresolution and resolution on the direction as a function of for a simulated \ttbarsample.

0.5.3 Electrons

The electron seeding and the subsequent reconstruction steps are described in Sections 0.3.2 and 0.4.3. In the reconstruction, electron candidates are only required to satisfy loose identification criteria so as to ensure high identification efficiency for genuine electrons, with the potential drawback of a large misidentification probability for charged hadrons interacting mostly in the ECAL. In this section, as is typically done in physics analyses, the electron identification is tightened with a threshold on the classifier score of a BDT trained for electrons selected without any trigger requirement [33].

The gain brought by the use of the tracker-based seeding in addition to the ECAL-based seeding is quantified in Fig. 16, for electrons in jets and for isolated electrons produced in the decay of heavy resonances. The left plot shows the reconstruction and identification efficiency for electrons in jets as a function of the hadron misidentification probability. Electrons and hadrons are selected from the same simulated sample of multijet events, with and . Electrons are additionally required to come from the decay of b hadrons. The electron efficiency is significantly improved, paving the way for b quark jet identification algorithms based on the presence of electrons in jets.

The absolute gain in efficiency for isolated electrons is quantified in the right plot for electrons from Z boson decays in a simulated Drell–Yan sample, and for two different working points. The first working point, used in the search for  [48, 49], provides very high electron efficiency in order to maximize the selection efficiency for events with four electrons. At this working point, the addition of the tracker-based seeding adds almost 20% to the identification efficiency of low-\ptelectrons. In the context of the analysis, in which all four electrons are required to have , the tracker-based seeding adds 7% to the selection efficiency of signal events. The second working point, typical of single-electron analyses, aims at reducing the large multijet background. In these analyses that only consider electrons with due to triggering requirements, the gain in signal efficiency is about 1%. For both working points, the addition of the tracker-based seeding increases the hadron misidentification probability by less than a factor of 1.2 for \ptlarger than , and by less than a factor of 2 for \ptbetween 5 and .

Figure 16: Left: Efficiency to reconstruct electrons from b hadron decays (signal) versus the probability to misidentify a hadron as an electron (background). The solid, long-dashed, and short-dashed lines refer to electrons and hadrons with \ptlarger than 15, within , and lower than , respectively. The curves correspond to a threshold scan on the BDT classifier score for ECAL-based seeded electrons and for tracker- or ECAL-based seeded electrons. Right: Absolute gain in reconstruction and identification efficiency provided by the tracker-based seeding procedure for two working points (WP) corresponding to different values of the threshold on the BDT classifier score. The solid line corresponds to the value used in the analyses and the dashed line to the value typically used in analyses of single-electron final states. In all cases, the classifier score of the BDT trained for electrons selected without any trigger requirement is used.

0.5.4 Muons

The PF muon identification, described in Section 0.4.2, is designed to retain prompt muons (from \egdecays of W and Z bosons or quarkonia states), muons from heavy hadrons (from decays of beauty or charm hadrons), and muons from light hadrons (from decays in flight of or K mesons), with the highest possible efficiency. On the other hand, it has to minimize the probability to misidentify a charged hadron as a muon, \egbecause of punch-through.

A Drell–Yan event sample is used to evaluate the prompt muon identification efficiency, while a muon-enriched multijet QCD sample is used for the other three types of muon candidates. Figure 17 compares the muon identification efficiency obtained with the PF algorithm to the efficiency of other algorithms available prior to the developments carried out for PF identification:

  • The soft muon identification aims to achieve efficient identification of muons from decays of quarkonia states. This selection requires a tracker muon with a tighter matching to the muon segment, with a pull below 3 in the and directions instead of a pull below 4 in the direction only as in the tracker muon selection. Additionally, the inner track must be reconstructed from at least five inner-tracker layers, including one pixel detector layer.

  • The tight muon identification specifically targets muons from and decays. This selection requires a global-muon track with a per degree-of-freedom lower than 10 and at least one hit in the muon detectors. In addition, the candidate should be a tracker muon with at least two matched muon segments in different muon stations and an inner track reconstructed from at least five inner-tracking layers, including one pixel detector layer.

The regular soft and tight ID criteria also feature an upper threshold on the muon-track impact parameter, aimed at rejecting muons from charged-hadron decays in flight. This requirement would defeat the purpose of PF identification, which aims at being as inclusive as possible for a truly global description of the event. As it also reduces the efficiency of the soft and tight ID criteria, it is not applied here for a fairer comparison. Because these two algorithms require the selected tracks to be tracker muons, the muon identification efficiency is displayed in Fig. 17 for tracker muons only. Muons reconstructed as global muons but not tracker muons are considered only by the PF muon identification, increasing the number of identified muons by about 2% over the whole spectrum (1% in the heavy-flavour category, 5% in the light-hadron category, and 5% in the misidentified-hadron category).

Figure 17: Efficiency for different algorithms (PF, soft, and tight) to identify a simulated muon track that has been reconstructed as a tracker muon, as a function of the \ptof the reconstructed track. From top left to bottom right the efficiency of the three identification algorithms is shown for prompt muons, for muons from heavy-flavour decays, for muons from light-flavour decays, and for misidentified hadrons.

The PF identification is the most efficient one for prompt muons. The soft identification is 0.5% more efficient on muons from semileptonic decays of heavy hadrons, but its much higher hadron misidentification rate (30% instead of 2%) makes this selection unusable for PF. The calorimeter deposits from a charged hadron misidentified as a muon are automatically identified as (spurious) neutral particles in the PF algorithm, leading to a potentially large overestimation of the corresponding jet energy. The PF muon identification, in this respect, strikes a balance between efficiency and misidentification rate for PF reconstruction and global event description.

0.5.5 Lepton isolation

Lepton isolation is the main handle for selecting prompt muons and electrons produced in the electroweak decay of massive particles such as or bosons and for rejecting the large number of leptons produced in jets through the decay of heavy-flavour hadrons or the decay in flight of charged pions and kaons. The isolation is quantified by estimating the total \ptof the particles emitted around the direction of the lepton. The particle-based isolation relative to the lepton \ptis defined as

(13)

where the sums run over the charged hadrons (), photons (), and neutral hadrons () with a distance to the lepton smaller than either 0.3 or 0.5 in the plane.

The performance of the particle-based isolation is studied for muons identified in simulated events. Figure 18 shows the efficiency to select signal prompt muons as a function of the probability to select background secondary muons. The performance of the particle-based isolation is compared to the performance of the detector-based isolation, computed from the \ptand energy of the neighbouring inner tracks and calorimeter deposits, respectively, as {linenomath}

(14)

The performance of the detector-based isolation is worse mainly because the \ptcarried by charged hadrons is counted twice, through the tracks and through the calorimeter deposits.

Figure 18: Isolation efficiency for muons from boson decays versus isolation efficiency for muons from secondary decays, as a function of the threshold on the isolation for the detector- and particle-based methods. All muons come from simulated events and are required to have a \ptlarger than 15\GeV. The efficiencies are shown for two choices of the maximum (isolation cone size): 0.3 and 0.5.

0.5.6 Hadronic decays

The decay produces either a charged lepton ( or ) and two neutrinos, or a few hadrons and one neutrino, with the branching fractions given in Table 0.5.6. Hadronic decays, denoted as , can be differentiated from quark and gluon jets by the multiplicity, the collimation, and the isolation of the decay products.

\topcaption

Branching fraction of the main (negative) decay modes [50]. The generic symbol represents a charged hadron, pion or kaon. In some cases, the decay products arise from an intermediate mesonic resonance. Decay mode Meson resonance 17.8 17.4 11.5 26.0 10.8 9.8 4.8 Other modes with hadrons 1.8 All modes containing hadrons 64.8

The PF algorithm is able to resolve the particles arising from the decay and to reconstruct the surrounding particles to determine its isolation, thereby providing valuable information for identification. The particles are used as input to the hadrons-plus-strips (HPS) algorithm [51] to reconstruct and identify PF candidates. This algorithm, presented in detail in Ref. [52], is seeded by jets of \GeVand reconstructed with the anti- algorithm (). The jet constituent particles are combined into candidates compatible with one of the main decay modes, , , , and . The decay mode is not considered owing to its relatively small branching fraction and high contamination from quark and gluon jets. Because of the large amount of material in the inner tracker (Fig. 3), photons from decays often convert before reaching the ECAL. The resulting electrons and positrons can be identified as such by the PF algorithm or, in case their track is not reconstructed, as photons displaced along the direction because of the bending in the 3.8\unitT magnetic field. Neutral pions are therefore obtained by gathering reconstructed photons and electrons located in a small window of size in the plane. Each candidate is then required to have a mass compatible with its decay mode and to have unit charge. Collimated candidates are selected by requiring all charged hadrons and neutral pions to be within a circle of radius in the plane called the signal cone. The size of the signal cone is, however, not allowed to increase above 0.1 at low \pt, nor to decrease below 0.05 at high \pt. It decreases with to account for the boost of the decay products. Finally, the highest selected candidate in the jet is retained. The four-momentum of the candidate is determined by summing the four-momenta of its constituent particles. Its absolute isolation is quantified as explained in Section 0.5.5 with all particles at a distance from the smaller than 0.5 apart from the ones used in the reconstruction of the itself, and without normalizing by the \tauh\pt. The loose, medium, and tight isolation working points are defined by requiring the absolute isolation to be smaller than 2.0, 1.0, and , respectively.

Before the advent of PF reconstruction, candidates were reconstructed as collimated and isolated calorimetric jets, called Calo  [53]. Their reconstruction is seeded by Calo jets reconstructed with the anti- algorithm () and matched with at least one track with \GeV. The region around the jet is chosen as the signal cone, and is expected to contain the charged hadrons and neutral pions from the decay. The signal cone must contain either one or three tracks, with a total electric charge equal to . Isolated candidates are selected with the requirements that no track with be found within an annulus of size centred on the highest \pttrack, and that less than of energy be measured in the ECAL within the annulus .

The performance of the HPS (PF) and Calo algorithms are compared in terms of identification efficiency, jet misidentification rate, and momentum reconstruction. Genuine with a \ptbetween 20\GeVand 2\TeVare obtained in the simulation from the Drell–Yan process and from the decay of a hypothetical heavy particle of mass 3.2\TeV. For the jet misidentification rate, a simulated QCD multijet sample covering the same \ptrange is used.

The probability for the HPS (PF) algorithm to assign the correct decay mode to the reconstructed and identified is shown in Table 0.5.6. The generated decay mode is typically found for about of the . The largest decay-mode migrations, of the order of 10–15%, affect candidates with a single charged hadron and are due to the reconstruction of an incorrect number of .

\topcaption

Correlation between the reconstructed and generated decay modes, for produced in simulated events. Reconstructed candidates are required to be matched to a generated , to be reconstructed with \GeVand under one of the HPS decay modes, and to satisfy the loose isolation working point. Generated Reconstructed 0.89 0.16 0.01 0.11 0.83 0.02 0.00 0.01 0.97

The performance of the momentum reconstruction from both the HPS (PF) and Calo algorithms is illustrated in Fig. 19. The left side of the figure shows the distribution of the ratio between the reconstructed and generated \pt. Up to a generated of , the HPS (PF) algorithm reconstructs the momentum with a much better accuracy and precision than the calorimeters. The asymmetry of the distribution is due to the cases in which some of the particles produced in the decay are left out because they would lead the to fail the collimation or mass requirements.

The is then reconstructed in a different decay mode and with a reduced momentum. When all reconstructed particles in the jet matching the are considered, the distribution is more symmetric but the resolution degrades, as some of the jet particles do not come from the decay. In these events, simulated without pileup interactions, the additional particles come from the underlying event and contribute less than 1\GeVon average to the jet energy. As a consequence, the mean response is slightly shifted above unity for a generated \ptbelow 100\GeV. For larger \pt, the absolute contribution from the underlying event becomes negligible and no shift can be observed. As the generated \ptincreases, the energy resolution of the HPS (PF) algorithm converges to that of the Calo algorithm because the calorimeters start to dominate the measurement of the momentum of charged hadrons. This effect occurs at a lower \ptfor than for jets because, for typical and jets at a given \pt, the jet \ptis shared among many more charged hadrons at a lower \ptthan in the case.

The right side of Fig. 19 shows the distributions obtained for quark or gluon jets misidentified as . In this case, the candidate is reconstructed with a fraction of the jet \ptas only a few jet particles can be selected by the HPS (PF) algorithm. For this reason, while genuine are reconstructed at the right momentum scale, misidentified candidates tend to be pushed to lower \pt. Therefore, the HPS (PF) algorithm reduces the probability for jets to pass the \ptthresholds applied at analysis level, which leads to a lower multijet background level than with the calorimeter-based reconstruction.

Figure 19: Ratio of reconstructed-to-generator level for genuine (left), and for quark and gluon jets that pass the identification criteria (right), for different intervals in generator level . In the PF case, the candidates are reconstructed by the HPS algorithm and required to pass the loose isolation working point. In the Calo case, they are reconstructed solely with the calorimeters and required to pass the identification criteria. The generator level \ptis taken to be either that of the or that of the jet. For comparison, the ratio is also shown for the closest PF jet in the plane.

The identification efficiency is defined as the probability to reconstruct and identify a matching a generated within . As a baseline, both the reconstructed and generated are required to have and . With the same selection, the jet misidentification rate is defined as the probability to reconstruct and identify a quark or gluon jet from the multijet sample as a . Figure 20 shows the efficiency as a function of the jet misidentification probability, for a varying threshold on the absolute isolation. With respect to Calo identification, the HPS (PF) algorithm achieves a reduction of the jet misidentification probability by a factor of for a given identification efficiency. For a given jet misidentification probability, the gain in efficiency ranges from 4 to 10%. The improvement in identification performance is due to three reasons. First, the decay-mode selection reduces the momentum of jets misidentified as . Second, with the PF reconstruction of the decay products, mass and collimation criteria can be used in addition to isolation criteria. Third, all the particles remaining after reconstruction are used to evaluate the particle-based isolation, while the detector-based isolation is computed without the tracks and the calorimeter energy deposits in the signal cone. Finally, the dependence of the identification efficiency and jet misidentification probability is shown in Fig. 21. As \ptrises above , the HPS (PF) algorithm ensures a constant efficiency together with a sharp decrease of the jet misidentification probability.

In summary, the PF reconstruction of the decay products and of the neighbouring particles has led to a sizeable improvement of the reconstruction and identification performance. This performance has been further refined for the data-taking period that started in 2015, for example with identification techniques based on machine learning that make use of additional information such as the impact parameter of charged hadrons and the neutral-pion energy profile with the strip [54].

Figure 20: Efficiency of the identification versus misidentification probability for quark and gluon jets. The efficiency is measured for produced at low \ptin simulated events (left), and at high \ptin the decay of a heavy particle events (right). The misidentification probability is measured for quark and gluon jets in simulated multijet events. The line is obtained by varying the threshold on the absolute isolation for PF identified with the HPS algorithm. On this curve, the three points indicate the loose, medium and tight isolation working points. The performance of the calorimeter-based identification is depicted by a square away from the line.
Figure 21: Identification efficiency for genuine (left), and misidentification probability for quark and gluon jets (right). Low-\pt are obtained from simulated events and high-\pt from simulated events. Quark and gluon jets are obtained from simulated QCD multijet events. The are required to be reconstructed by the HPS (PF) algorithm, to have \GeVand , and to satisfy the loose identification criteria.

0.5.7 Particle flow in the high-level trigger

The first level of the CMS trigger system [55], composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a fixed time interval of less than 4\mus. The high-level trigger (HLT) computer farm further decreases the event rate from around 100\unitkHz to about 1\unitkHz, before data storage for later offline reconstruction. The HLT event selection imposes requirements on the number of physics objects with \ptover a given threshold. The reconstruction of these objects at the HLT must be kept as close as possible to the offline reconstruction to limit the triggering inefficiency and the false trigger rate. As exemplified in Sections 0.5.1 to 0.5.6, the PF reconstruction provides physics objects with better resolution, efficiency, and purity than traditional reconstruction methods. For this reason, PF reconstruction is used in the vast majority of physics analyses in CMS, and also has been used at the HLT for optimal performance.

However, to cope with the incoming event rate, the online reconstruction of a single event at the HLT has to be done one hundred times faster than offline, within 140\unitms on average. Therefore, the reconstruction has to be simplified at the HLT. Offline, most of the processing time is spent reconstructing the inner tracks for the PF algorithm as explained in Section 0.3.1. At the HLT, the tracking is reduced to three iterations, dropping the time-consuming reconstruction of tracks with low \ptor arising from nuclear interactions in the tracker material. These modifications preserve the reconstruction efficiency for tracks with originating from the primary vertex or from the decay of a heavy-flavour hadron. After track reconstruction, a specific instance of the particle identification and reconstruction algorithm runs online, with only two minor differences with respect to the offline algorithm described in Section 0.4: the electron identification and reconstruction is not integrated in the PF algorithm, and the reconstruction of nuclear interactions in the tracker is not performed. These modifications lead to a slightly higher jet energy scale for jets featuring an electron or a nuclear interaction. For QCD multijet events enriched with high-\ptjets and simulated without pileup, the average time needed to perform the tracking is 0.6\units (52%) offline and 0.06\units (44%) at the HLT, where the percentages are given with respect to the total time spent in offline reconstruction and in HLT reconstruction, respectively, under the assumption that the HLT PF reconstruction is performed for every event. The average time needed for PF reconstruction is 0.07\units (6%) offline, and 0.03\units (24%) at the HLT, in the same conditions. Up to an average of 45 pileup interactions, the time spent for tracking and PF at the HLT is kept below 20% and 10% of the total HLT computing time, respectively.

The ability of the HLT PF reconstruction to reproduce the offline results is tested with jets and built from the reconstructed HLT particles, from a QCD multijet and a Drell–Yan sample, respectively. While HLT jets are reconstructed in the same way as offline, the reconstruction and identification proceeds differently, without decay mode reconstruction. The reconstruction is seeded by an HLT jet containing at least one charged hadron. The direction of the highest-\ptcharged hadron in the jet is used as the axis of a signal cone in which all neutral pions and up to two additional charged hadrons are collected to build the four-momentum. The charged particles in an annulus around the signal cone are used to quantify the isolation of the candidate. The selection at the HLT is looser than the one usually applied offline in order to preserve the overall selection efficiency in the analysis. For typical analyses based on a final state, requiring a loosely isolated at HLT in addition to an isolated muon reduces the background rate by a factor of about 20.

For offline jets and of various \pt, Fig. 22 shows the probability to detect a matching physics object at the HLT within , and with a \ptlarger than typical HLT thresholds, 40\GeVfor jets and 20\GeVfor . In the case of jets, this probability is compared to the one obtained for HLT calorimeter jets. The consistent use of PF jets at the HLT allows for a sharper jet triggering efficiency curve than with calorimeter jets. The reconstructed offline is required to satisfy the criteria of the loose isolation working point. At the HLT, the absence of decay mode identification and the use of a loose isolation working point ensure a high triggering efficiency. The sharp rise of the triggering efficiency curve at the threshold demonstrates the excellent agreement between the \ptreconstructed online and offline.

Figure 22: Left: Probability to find at HLT a jet with matching the jet reconstructed offline, as a function of the offline jet \pt. At the threshold, the curve is steeper for HLT PF jets (circles) than for HLT calorimeter jets (squares). Right: Probability to find a with at HLT matching the reconstructed and identified offline with the loose isolation working point.

0.6 Validation with data and pileup mitigation

The previous section describes how PF improves the performance of physics object reconstruction in simulated events. In this section, it is shown that the PF algorithm performs as well with events recorded during Run 1, the first data-taking period of the LHC. The performance of reconstruction, identification, and isolation algorithms is compared for events simulated and recorded under Run 1 pileup conditions. The PF algorithm was designed without taking pileup into account. This section describes how the performance of object reconstruction and identification is affected by pileup, and how the collection of reconstructed particles can be used to mitigate the effects of pileup.

The results in this section are based on LHC Run 1 data recorded in 2012 at a centre-of-mass energy of 8\TeVand corresponding to an integrated luminosity of \fbinv. During this data-taking period, about 20 pileup interactions occurred on average per bunch crossing. These interactions are spread along the beam axis around the centre of the CMS coordinate system, following a normal distribution with a standard deviation of about 5\unitcm. The number of pileup interactions can be estimated either from the number of interaction vertices reconstructed with charged-particle tracks as input, with a vertex reconstruction efficiency of about 70% for pileup interactions [45], or from a determination of the instantaneous luminosity of the given bunch crossing with dedicated detectors and, as additional input, the inelastic proton-proton cross section [56].

In the PF reconstruction, the particles produced in pileup interactions give rise to additional charged hadrons, photons, and neutral hadrons. These result in an average additional \ptof about 1\GeVper pileup interaction and per unit area in the plane. As a consequence, reconstructed particles from pileup affect jets, \ETmiss, the isolation of leptons, and the identification of hadronic decays. The measured energy deposits in the calorimeters used as input for particle reconstruction may also be directly affected by pileup interactions, including interactions from different bunch crossings. The impact of these contributions is small under the pileup conditions considered.

The primary vertices, which are separated spatially along the beam axis, are ordered by the quadratic sum of the \ptof their tracks, . The primary vertex with the highest is identified as the hard-scatter vertex, whereas the other vertices are considered as pileup vertices. Charged hadrons reconstructed within the tracker acceptance can be identified as coming from pileup by associating their track with a pileup vertex. If identified as coming from pileup, these charged hadrons are removed from the list of reconstructed particles used to form physics objects. This widely used algorithm is called pileup charged-hadron subtraction and denoted as CHS.

Photons and neutral hadrons as well as all reconstructed particles outside the tracker acceptance, however, cannot be associated with one of the reconstructed primary vertices with this technique. To mitigate the impact of these particles on jets, lepton isolation, and identification, the uniformity of the \ptdensity of pileup interactions in the plane allows the average \ptcontributions expected from pileup to be subtracted. The \ptdensity from pileup interactions can be calculated with jet clustering techniques [57, 58, 45], with the list of all reconstructed particles as input. As an alternative, this contribution can be estimated locally, \egaround a given lepton, from the expected ratio of the neutral to the charged energy from pileup, typically 0.5. After the end of Run 1, advanced pileup mitigation techniques have been explored [59, 60]. While not used extensively for analyses based on Run 1 data, these techniques become increasingly important with the larger number of pileup interactions observed during the LHC Run 2.

Since the results in this section are based on data taken in 2012 and corresponding simulated events, a few details of the physics object reconstruction are different from the choices discussed in the previous section, \egthe value of the radius parameter for jet clustering. Like in Section 0.5, these results are derived for the objects and algorithms used in most CMS analyses, \iejets, \MET, muons, lepton isolation, and reconstructed hadronic decays. Results on electron reconstruction and identification can be found in Ref. [33].

0.6.1 Jets

Jets are reconstructed either from all reconstructed particles (PF jets) or from all reconstructed particles except charged hadrons associated with pileup vertices (PF+CHS jets). Unless noted otherwise, jets are reconstructed with the anti- algorithm with a radius parameter . The corrections for the difference in response between reconstructed and generated particle jets (Ref jets) are determined separately for PF jets and PF+CHS jets. The expected average contribution from pileup is estimated with and the jet area [58] as inputs, and is subtracted from the reconstructed jet. This correction is about three times smaller for PF+CHS jets since CHS removes most of the charged hadrons from pileup, which account roughly for two thirds of the pileup contribution. Additional corrections are applied to the observed events to account for residual differences between data and simulation [45].

Figure 23: Jet energy composition in observed and simulated events as a function of \pt(top left), (top right), and number of pileup interactions (bottom). The top panels show the measured and simulated energy fractions stacked, whereas the bottom panels show the difference between observed and simulated events. Charged hadrons associated with pileup vertices are denoted as charged PU hadrons.

The jet energy contributions from different types of particles are measured with the tag-and-probe technique [61] in back-to-back dijet events recorded by requiring at least one jet at the HLT. The two jets with highest \ptin a given event must be separated by an angle larger than 2.8\unitrad in the plane transverse to the beam axis. Events with additional jets with \GeVand are rejected to avoid biases from large parton radiation. The tag jet is required to be in the barrel region and to correspond to the jet that triggered the data acquisition. The energy contributions are measured from the probe jet, whereas the value of the jet \ptis taken from the tag jet. This procedure ensures that correlations of the jet energy fractions, \egwith upward fluctuations of the observed jet \pt, do not bias the measurement of these fractions. Figure 23 shows a comparison of the dependence of the PF jet composition on jet \pt, jet , and the estimated number of pileup interactions between events observed in data and events simulated with \PYTHIA 6.4 [41]. The number of pileup interactions is estimated from the number of clusters reconstructed in the silicon pixel detectors [62]. The composition as a function of jet \ptis given for central jets (). As opposed to the simulation results without pileup presented in Section 0.5, the measured jets have a significant energy contribution emerging from pileup. As described in Section 0.3.1, the tracking efficiency drops within the densely populated jet core for high-\ptjets, leading to a reduction of the fraction of charged hadrons at high \pt. The observed and simulated energy fractions agree within 1% for \GeV, and within 2% above. The relative contribution from charged hadrons associated with pileup vertices is largest for low-\ptjets and becomes negligible in the \TeVrange, as the contribution from pileup is expected to be fully uncorrelated with the hard scatter. The composition with respect to is shown for jets with \ptbetween 56 and 74\GeV. The simulated and observed fractions agree at the level of 1% in the tracker acceptance and at the level of 2% for .

The energy fractions as a function of the number of pileup interactions for central jets () with \ptbetween 56 and 84\GeVshow a stable growth in the contribution of charged hadrons from pileup vertices. The relative contributions from photons, neutral hadrons, and the sum of charged hadrons and charged hadrons from pileup vertices remain constant with increasing pileup. This behaviour is due to the similar composition of QCD jets in the given \ptrange and pileup in terms of the energy fractions from charged hadrons, neutral hadrons, and photons, which constitute about 99% of the jet energy on average. More details on the measurements of the jet composition are given in Ref. [45].

Figure 24: Jet \ptresolution for PF+CHS jets (open markers) and PF jets (full markers) under three different pileup conditions (left), and jet energy resolution parameters (right). The jet \ptresolution is shown as a function of . The jet energy resolution parameters (Eq. (15)) are shown as a function of the number of pileup interactions times the jet area for PF jets and PF+CHS jets. The three resolution parameters are determined in bins of for various radius parameters , and then averaged in bins of .

To investigate the impact of pileup on the jet energy resolution, the resolution for central jets is displayed in the left panel of Fig. 24 as a function of for simulated events under three different pileup conditions. The resolution is defined as the width of a normal distribution obtained from a fit to the ratio of reconstructed and Ref jet \pt. While the impact of pileup on the resolution for jets with larger than \GeVis small, the relative \ptresolution degrades significantly for lower \pt. The application of CHS improves the jet energy resolution for these lower-\ptjets. The improvement becomes larger for a higher number of pileup interactions. As expected, the jet energy resolution is nearly identical for PF and PF+CHS jets if no pileup is present. The small difference (1% at low \pt) can be attributed to the jet energy corrections that were obtained under the assumption that some amount of pileup is present. Within this difference, this observation confirms that CHS does not remove charged hadrons from the hard interaction, which would lead to a degradation of the jet energy resolution in the absence of pileup.

To understand the jet energy resolution in more detail, the relative jet energy resolution is parameterized as the quadratic sum of a pileup and noise term, a stochastic term, and a constant term,

(15)

The absolute contribution from pileup does not depend on the jet \ptand is hence only expected to affect the pileup and noise term of the relative energy resolution. Because of the uniform distribution of pileup particles in the plane, the pileup contribution to the jet energy is proportional to the product of the number of pileup interactions and the jet area, , which implies that the contribution to the jet energy resolution scales with in the limit of a large number of particles from pileup. The resolution parameters are fitted in bins of for jets clustered with various radius parameters , covering different areas in the plane, and then averaged over bins of . The resulting parameters are shown in the right panel of Fig. 24 as a function of . Both the constant and stochastic terms remain roughly constant as a function of and are, as expected in the case that CHS only removes charged hadrons from pileup, of similar magnitude for PF and PF+CHS jets. The combined pileup and noise term is parameterized as , where is an additional empirical noise term. Allowing to become negative improves the description of the resolution for small numbers of pileup interactions. The application of CHS reduces the pileup and noise term by almost a factor of two, consistent with the removal of two thirds of particles from pileup in the tracker volume. More details on measurements of the jet energy resolution including a detailed discussion of the jet energy resolution parameters and a validation with observed data are given in Ref. [45].

Figure 25: Ratio of PF jet multiplicity with and without application of CHS, for hard jets, pileup jets, and soft jets, as a function of the reconstructed jet pseudorapidity. The uncertainty bands include both statistical uncertainties and uncertainties in the jet energy corrections.

Pileup not only degrades the jet energy resolution, but can also lead to the emergence of additional jets with a \ptof a few tens of \GeV, in the following denoted as pileup jets. These jets result from the overlap of two or more low-\ptjets from different pileup interactions, hence their \ptspectrum falls more steeply than the one of regular QCD jets [63]. The effect of CHS on the rate of pileup jets is studied in simulated QCD multijet events for reconstructed jets with \GeV. Only events in which the \ptsum of the two highest-\ptjets j and j is between 200 and 300\GeVare considered. All reconstructed jets are tentatively matched to a Ref jet built from the generated particles from the hard scatter, with \GeVand a distance in the plane smaller than 0.25. Jets that cannot be matched to a Ref jet are classified as pileup jets. If j and j are matched, they are classified as hard jets. All other jets are classified as soft jets. The ratio of the numbers of PF+CHS and PF jets with \GeVis shown in Fig. 25 as a function of jet for these three classes of jets. In the tracker acceptance, CHS reduces the number of pileup jets by 85% without affecting the multiplicity of either hard or soft jets. Advanced information on the use of PF reconstruction for pileup mitigation can be found in Ref. [60].

0.6.2 Missing transverse momentum

The performance of reconstruction is assessed with a sample of observed events selected in the dimuon final state, dominated by events with a  boson decaying to two muons [47]. The data set is collected with a trigger requiring the presence of two muons passing \ptthresholds of 17 and 8\GeV, respectively. The two reconstructed muons must fulfil \GeVand , satisfy isolation requirements, and have opposite charge. Events where the invariant mass of the dimuon system is outside the \GeVwindow are rejected.

The expression of PF , defined in Section 0.5.2, includes a correction term that accounts for the response of the jets in the final state, which also takes into account the expected contributions from pileup discussed in the previous section. Here, two additional terms are introduced: The first one corrects for the presence of many low-energy particles from pileup interactions, and the second one for an observed asymmetry in the reconstructed PF  distribution due to a shift in PF  along the detector and axes. This asymmetry is caused, amongst other reasons, by a shift between the centre of the CMS coordinate system and the beam axis. Figure 26 shows the spectrum of PF \ptmissin the event sample. The simulation describes the observed distribution over more than four orders of magnitude. The systematic uncertainty in the prediction includes contributions from uncertainties in the muon energy scale, the jet energy scale, the jet energy resolution, and the energy scale of low-energy particles. A more detailed discussion of the uncertainties is given in Ref. [47].

Figure 26: Spectrum of PF in a data set [47]. The observed data are compared to simulated , diboson (VV), and plus single top quark events. The lower panel shows the ratio of data to simulation, with the uncertainty bars of the points including the statistical uncertainties of both observed and simulated events and the grey uncertainty band displaying the systematic uncertainty in the simulation. The last bin contains the overflow.

The hadronic recoil , defined as the vector sum of the transverse momenta of all reconstructed particles excluding the two muons from the boson decay, is used as a probe for the \ptmissdetermination. With the boson transverse momentum denoted as , momentum conservation in the transverse plane implies . Muons are reconstructed with considerably higher precision than the hadronic recoil. The precision of the \ptmissreconstruction is therefore dominated by the precision with which the hadronic recoil is reconstructed. This precision is also representative of the resolution with which is reconstructed in events with prompt neutrinos, \egin decays. The precision of the hadronic recoil reconstruction can be measured directly in events under the assumption that there is no true source of missing transverse momentum. The parallel () and perpendicular () components of the hadronic recoil are defined with respect to in the transverse plane. At high , the resolution of is dominated by that of the jets recoiling against the direction of the boson momentum, whereas is more affected by random detector noise and by fluctuations of the underlying event.

Several algorithms were developed to mitigate the deterioration of the resolution with increasing pileup [47]. Among those, the so-called No-PU PF algorithm calculates as a weighted sum of the different contributions to the event: charged particles and neutral particles within jets identified as originating from the primary interaction vertex, charged particles and neutral particles within jets identified as originating from pileup vertices, other charged particles associated with the primary interaction vertex, other charged particles not associated with the primary interaction vertex, and other neutral particles. The weights optimizing the \ptmissresolution are found to be 1.0 except for a weight of 0.6 in the case of isolated neutral particles. The MVA PF algorithm combines the same inputs using a multivariate (MVA) regression technique to correct both the direction and the magnitude of the hadronic recoil.

Figure 27: Comparison of the average response of the parallel recoil component, , for the PF , No-PU PF , and MVA PF (denoted as MVA Unity PF \ETslash) algorithms as a function of , as determined in events.
Figure 28: Comparison of the resolutions of the parallel (left) and perpendicular (right) recoil components for the PF , No-PU PF , and MVA PF algorithms as a function of the number of reconstructed vertices in events [47]. The upper frame of each figure shows the resolution in observed events; the lower frame shows the ratio of data to simulation.

The response of the algorithms is defined as the ratio of the average magnitude of the parallel recoil component and the magnitude of the  boson transverse momentum, , displayed in Fig. 27 as a function of . For \GeV, the response agrees with unity within 5% for the PF and MVA PF algorithms, whereas a response near unity is only reached at \GeVfor the No-PU PF algorithm. The resolution of the hadronic recoil is assessed with a parametrization of the or distributions by a Voigtian function, defined by the convolution of a Breit–Wigner and a Gaussian function. The resolution of each recoil component is obtained from the full width at half maximum of the Voigtian function divided by 2.35. The event sample is divided according to vertex multiplicity, and a fit to a Voigtian function is performed in each bin. The resulting resolution curves of and are shown in Fig. 28 as a function of the number of reconstructed vertices in the event. The resolutions for both No-PU PF and MVA PF reveal a considerably reduced dependence on the number of reconstructed vertices with respect to PF , with an improvement of the resolution of each recoil component of almost a factor of two for 20 reconstructed vertices.

0.6.3 Muons

The performance of the PF muon identification is probed in samples of prompt muons from  boson decays with a tag-and-probe technique. Events are recorded with triggers requiring a single muon with \ptthresholds depending on the instantaneous luminosity. The tag muons are well-identified muons matched to the muons identified at trigger level, whereas the probes are muon candidates reconstructed with only the inner tracker to avoid any potential bias of the measurement from the muon subdetectors [35]. This procedure measures the efficiency to reconstruct a muon track in the muon detectors, to link it with the inner track, and for this muon to be identified by the PF algorithm.

Figure 29 (top left) compares the identification efficiencies measured in data and simulation as a function of muon \ptfor muons with \GeVfrom  boson decays. Only muons in the central barrel region with are considered. Overall, there is an excellent agreement of observed and simulated efficiencies, and the data confirm that prompt muons are identified by the PF algorithm with an efficiency close to 100%. The efficiencies in data and simulation agree well within % for \GeV. A similar agreement is displayed in Fig. 29 (top right) as a function of . The muon identification efficiency is only marginally affected by pileup, as shown in Fig. 29 (bottom), which displays the efficiency as a function of . Hence, no dedicated pileup mitigation strategies are deployed for muon identification.

Figure 29: Efficiency of the PF muon identification for muons from  boson decays as a function of \pt(top left), (top right), and (bottom). The efficiency is measured for data and simulation with a tag-and-probe technique. The uncertainty band includes the dominant source of systematic uncertainty, which comes from imperfections in the parametrization of the signal and background dimuon mass distributions.

0.6.4 Lepton isolation

Since the calculation of lepton isolation involves summing the \ptvalues of charged hadrons, photons, and neutral hadrons, lepton isolation is sensitive to pileup interactions, which give rise to additional reconstructed particles inside the isolation cone. For simplicity, the focus in this section is on muon isolation. Electron isolation is calculated and verified with similar techniques.

To mitigate the deterioration of the isolation efficiency due to pileup, the isolation as defined in Eq. (13) is complemented in two ways. First, only charged hadrons associated with the hard-scatter vertex (HS) are considered. Second, the expected contributions from pileup are subtracted from the \ptsums of neutral hadrons and photons. The pileup-mitigated absolute isolation for muons is defined as {linenomath}

(16)

The expected contribution of photons and neutral hadrons from pileup is estimated from the scalar sum of the transverse momenta of charged hadrons in the cone that are identified as coming from pileup vertices, . This sum is multiplied by the factor , which corresponds approximately to the ratio of neutral particle to charged hadron production in inelastic proton-proton collisions, as estimated from simulation. The relative lepton isolation is defined as .

The efficiency of the muon isolation is measured in a sample of muons from  boson decays with a tag-and-probe technique. Events are selected according to the same criteria as for the measurement of the muon identification efficiency discussed in Section 0.6.3. In addition, since the goal of lepton isolation is to identify prompt muons, the tight muon identification criteria described in Section