Measurement of production in association with a Z boson at the CEPC††thanks: Supported by National Natural Science Foundation of China (11475190) and National Natural Science Foundation of China (11575005)
The Circular Electron-Positron Collider (CEPC) is a future Higgs factory proposed by the Chinese high energy physics community. It will operate at a center-of-mass energy of 240-250 GeV and is expected to accumulate an integrated luminosity of 5 ab with ten years of operation. At CEPC, Higgs bosons are dominantly produced from associated process. Vast amount of Higgs events collected will enable precise studies of its properties including Yukawa couplings to massive particles. With GEANT4-based simulation of detector effects, we study CEPC feasibility on measuring Higgs boson decaying into a pair of muons. The results with or without information from Z boson decay products are provided, which show a signal significance of over 10 standard deviations can be achieved and the H-- coupling can be measured within accuracy.
iggs, CEPC, Yukawa Coupling
3.66.Fg, 14.80.Bn, 13.66.Jn
The discovery of the Higgs-like boson completes the particle table of the Standard Model (SM) of particle physics. Up-to-date LHC measurements all indicate that the Higgs boson is indeed highly SM like [1, 2, 3, 4, 5, 6]. In the SM, Higgs couplings to massive particles are proportional to their mass (square). Hence the event rate with Higgs couplings to the first and second generation of massive fermions can be very small, making them difficult to measure at the LHC. The Circular Electron-Positron Collider (CEPC) , however, is designed to run around GeV with an instantaneous luminosity of 2 , and will deliver 5 of integrated luminosity with ten years of running. The huge amount of data will enable precise measurement of the Higgs to light fermions branching ratios and determine associated Yukawa couplings, including H--, which is crucial to validate consistency of the SM Higgs mechanism since any deviation indicates the existence of new physics.
Searches for the production have been performed at ATLAS and CMS experiments with Run-I and Run-II data [9, 10, 11]. The most stringent observed (expected) upper limit on the cross-section times branching ratio is found to be 2.8 (2.9) times the SM prediction . Projections have also been made at High Luminosity-LHC assuming an integrated luminosity of 3000 fb collected by the ATLAS or CMS detector, which can lead to a signal significance of about 7  with the H-- coupling determined with an accuracy of around 20% . Studies have also been performed for the International Linear Collider (ILC). Considering a center mass energy of 250 GeV and an integrated luminosity of 250 fb, the signal is dominated by the Higgs-strahlung from a Z boson and the signal significances for the sub-processes with Z boson decays into and q are found to be 1.8 and 1.1 , respectively . Assuming polarized beams and collisions at a center mass energy 1 TeV with an integrated luminosity of 500 fb, the signal is dominated by the WW-fusion process and a sensitivity of 2.75 can be achieved .
At the CEPC, the signal production is dominated by the Higgs-strahlung from a Z boson. We perform a feasibility study based on events generated at leading order accuracy with initial state radiation (ISR), parton shower, hadronization and detector effects simulated.
Considering that 70% of the Z bosons decay hadronically and 20% decay invisibly, we focous on two scenarios, one for Z boson inclusive decay and the other for hadronic decay. The first case maximally exploited the statistics of the produced events and the second category takes advantage of the major part of the decay kinematics. For both cases, we first perform a cut-based analysis and then improve the measurement using a Boosted Decision Tree (BDT) technique.
This paper is organized as follows. Section 2 describes event generation and simulation. Section 3 presents results for the inclusive measurement. Section 4 presents results for the Zq decay channel. Section 5 summarizes the paper.
2 Monte Carlo Simulation
At 250 GeV CEPC, Higgs bosons are mainly produced through Higgs-strahlung, i.e. . With an integrated luminosity of 5000 fb, about 230 of our signal events can be produced. The expected background to the signal production includes 2-fermion processes , where can be any SM fermion other than the top quark, and 4-fermion processes, which can be mediated through associated ZZ, WW, ZZ, WW production and a single Z boson production. All Monte Carlo (MC) events are generated with Whizard V1.9.5  event generator at parton level with ISR and interference effects included. The generated events are interfaced to Pythia 6  for parton shower and hadronization simulation. Detector effects are simulated with the CEPC detector implemented with Mokka/GEANT4 [18, 7, 19]. The detector is assumed to have a similar structure as the International Large Detector (ILD) [20, 21] at the ILC . At the CEPC, the muon identification efficiency is expected to be over 99.5% for larger than 10 GeV, and with excellent resolution of . The fully simulated events are reconstructed with a particle-flow algorithm ArborPFA . More details about the CEPC sample set can be found in reference .
The major SM backgrounds, including all the 2-fermion processes(, where refers to all lepton and quark pairs except ) and 4-fermion processes(, , or , single ). The initial states radiation (ISR) and all possible interference effects are taken into account in the generation automatically. The classification for four fermions production, is referred to LEP , depending crucially on the final state. For example, if the final states consist of two mutually charge conjugated fermion pairs that could decay from both and intermediate state, such as , this process is classified as “ or ” process. If there are together with its parter neutrino and an on-shell boson in the final tate, this type is named as “single ”. Meanwhile, if there are a electron-positron pair and a on-shell boson in the final state, this case is named as “single ”. Detailed information on the 2-fermion and 4-fermion samples used in our analyses are listed in Tables Appendices A and Appendices A.
3 Inclusive analysis
A recoil mass method enables a measurement of the production without measuring the associated Z boson decay. We define the recoil mass as
where is the center of mass energy, and correspond to the reconstructed mass and energy of the Higgs boson. The ZH () events form a peak in the distribution at the Z boson mass window.
We select two muons with largest transverse momenta and consider selections on the following kinematic variables: invariant mass of the di-muon system , recoil mass of the di-moun system , transverse momentum of the di-muon system , third component of the di-muon momentum , energy of di-muon system , and angular variables , , , , and , where represents angle between Z boson and muon leptons.
Distribution of the , , , , , in the inclusive analysis, after the preselection (2 well identified muons) and 120130 GeV requirements. All the distributions are normalized to 10.
3.1 Cut-count analysis
The event numbers under selection flow are summarized in Table 3.1. The two mass windows , are set in accord with the signal signature. The , are set to reduce the ZZ, where one of the Z boson decays to , and Drell-Yan Z background. The Higgs and Z boson decays can lead to different , distributions due to the spin-dependence of the couplings and the parity violation of the Weak interaction. selection is chosen to supress the 2f background.
[Yield] Signal and background number of events under selection flow for the inclusive analysis. The simulation corresponds to CEPC at GeV with an integrated luminosity of . Category signal ZZ WW ZZorWW SingleZ 2f Preselection 207.3 311312 129869 501590 63658 1740371 120130 189.7 5479 17126 57405 1868 52525 90.893.4 118.4 1207 868 2115 164 1157 2564 109.8 1009 725 1772 126 452 -5656 107.1 969 687 1726 120 420 0.38 -0.38 65.2 464 49 196 53 159 -0.996 65.0 462 46 196 52 99 efficiency 31.3%
An unbinned maximum likelihood fit is performed on distribution. The signal is parameterized by a crystal ball function, with parameters fixed by simulated events. The background is parametrized by a second order Chebychev function.
The invariant mass spectrum of di-muon system in the inclusive analysis . The dotted points with error bars represent data from CEPC simulation.The red-solid and green-dashed lines correspond to the signal and background contributions and the solid-blue line represents the post-fit value of the total yield.
Figure 3.1 shows the post-fit result of the invariant mass distribution of the dimuon system. The fitted number of signal event is . At 68% confidence level, an accuracy from -17% to 18% on the signal strength can be achieved based on a likelihood scan. The signal under the peak - GeV leads to a high significance of 8.8 , via simple couting , with and represent signal and background yields.
3.2 BDT optimization
We have also exploited the Toolkit for Multivariate Analysis (TMVA)  for further background rejection, where the method of Gradient Boosted Decision Trees (BDTG) is adopted. After fixing the range of the invariant mass and the recoil mass as mentioned above, 5 variables are taken as inputs to TMVA, including , and . The choice of these variables are based on many tests and importance ranking. The resulted BDT response distribution can be seen in Figure 3.2, where the agreement between training and testing samples shows no obvious overtaining. We then take the final event selections as: BDTG response 0.369, 2064 GeV and -0.996. A maximum likelihood fit is performed on the resulted invariant mass of the di-muon system. The signal and background probability functions are parametrized in the same form as in the previous cut-count study.
The BDT response distribution(top) and the post-fit result with BDT improvement(below).
Figure 3.2 shows the BDT response distribution and the post-fit result of . The fitted number of the signal is . At 68% confidence level, an accuracy from -16% to 17% on the signal strength can be achieved based on a likelihood scan. The signal under the peak - GeV leads a significance of 10.9 .
4 Z analysis
Among all Z boson decay modes, hadronic channel is most promising due to its large branching fraction (%). The exclusive method of kt algorithm for collisions in the Fastjet  is used to reconstruct two jets with the particles expect the chosen and , and the jets are sorted by energy. We perform an analysis on the Z production. Apart from previously mentioned variables related to the system, we further exploit the following selections on jets: third component of di-jet system momentum , recoil mass of the di-jet system mass of jets and invariant mass of the di-jet system .
Distributions of the , in Z analysis. And the distributions are normalize to 10.
4.1 Cut-count analysis
A cut-count analysis is performed for the exclusive analysis. The event flow under selections are summarized in Table 4.1. Selections on single and di-jet masses eliminates most background without hard jets. Recoil mass cut forther reduces the Z()Z(q) background.
The cut-chain with cut-base method in the Z analysis.
Category signal ZZ WW ZZorWW SingleZ 2f Preselection 156.3 390775 183751 463361 101164 0 120130 141.6 3786 181 227 244 0 4.2 2.8 133.0 3216 111 0 9 0 76.0 127.5 2917 2 0 8 0 90.993.5 75.2 893 0 0 0 0 2064 74.5 777 0 0 0 0 -5858 74.5 748 0 0 0 0 -0.98 0.98 74.2 747 0 0 0 0 efficiency 47.5%
As in the inclusive channel, we perform a likelihood fit to extract the signal yield and strength parameter. Quality of the fit is demonstrate in Fig. 4.1. The signal yield from the fit is . The signal strength can be determined with an uncertainty from -16% to 17%, at 68% confidence level. The signal significance under the peak 124-125 GeV is found to be 10.8.
The invariant mass spectrum of di-muon system in the analysis. The dotted points with error bars represent data from CEPC simulation. The red-solid and green-dashed lines correspond to the signal and background contributions and the solid-blue line represents the post-fit value of the total yield.
4.2 BDT improvement
In order to achieve highest significance, we perform a two step multivariate analysis. The first step exploit a MLP (Multilayer Perceptron) method to suppress the fully leptonic WW and ZZ backgrounds. After applying 90 GeV, 4 variables including , and are considered as inputs for the MLP. The effectiveness of this MLP is shown in Fig. 4.2. After requiring MLP response to be greater than 0.71, we exploit BDTG to further reduce the backgounds from semileptonic ZZ and WW. In this second step, variables , , , , , , are taken as inputs.
The MLP result and the overtraining test in the analysis.
The BDT response(top) and the final fit result(below) in the Z channel analysis
After the two step multivariate analysis, we require BDTG response-0.13, 90.4 93 GeV and 2864 GeV. Finally, we perform a likelihood fit to extract the signal yield and strength parameter, as shown in Fig. 4.2. The signal yield from the fit is . Based on a likelihood scan, the signal strength can be determined with an uncertainty from -16% to 17%, at 68% confidence level. The significance of the signal in the peak region 124-125 GeV is found to be 10.8.
Feasibility of measuring at the CEPC is studied considering a center of mass energy GeV collision and 5000 fb integrated liminosity. The measurement is perfomed in two complementary channels: ZH production without measuring the Z boson decay and ZH production with the Z boson hadronically decay. For each decay channel, a cut-count analysis is tested and followed with an improvement using multivariate techniques. Similar results are obtained from two channels. Over 10 significance can be reached for the signal process. Accuracy of the signal strength can be measured with 14% uncertainty and the associated H-- coupling can be restricted to 10% level. The results are comparable to the High-Luminosity LHC.
Acknowledgements.The authors would like to thank Xin Mo, Dan Yu and Yuqian Wei for useful discussions. This work is supported in part by the National Natural Science Foundation of China, under Grants No. 11475190 and No. 11575005, by the CAS Center for Excellence in Particle Physics (CCEPP), and by CAS Hundred Talent Program (Y3515540U1)
[Monte Carlo purities in the single lepton sample]The information of the two fermions background samples Process Final states Events expected uu 9995.35 50476527 dd 9808.71 49533965 cc 9974.20 50369725 ss 9805.39 49517234 bb 9803.04 49505372 qq 49561.30 250284565 e2e2 4967.58 25086253 e3e3 4374.94 22093447 bhabha 24992.21 126210660
[Monte Carlo purities in the single lepton sample]The information of the four fermions background samples