# AI-optimized detector design for the future

Electron-Ion Collider: the dual-radiator RICH case

###### Abstract

Advanced detector R&D requires performing computationally intensive and detailed simulations as part of the detector-design optimization process. We propose a general approach to this process based on Bayesian optimization and machine learning that encodes detector requirements. As a case study, we focus on the design of the dual-radiator Ring Imaging Cherenkov (dRICH) detector under development as part of the particle-identification system at the future Electron-Ion Collider (EIC). The EIC is a US-led frontier accelerator project for nuclear physics, which has been proposed to further explore the structure and interactions of nuclear matter at the scale of sea quarks and gluons. We show that the detector design obtained with our automated and highly parallelized framework outperforms the baseline dRICH design. Our approach can be applied to any detector R&D, provided that realistic simulations are available.

1,2]E. Cisbani 3]A. Del Dotto 4,5,*]C.Fanelli\note[*]Corresponding author 5]M. Williams 6]M. Alfred 4]F. Barbosa 7]L. Barion 8]V. Berdnikov 9]W. Brooks 10]T. Cao 7]M. Contalbrigo 11]S. Danagoulian 12]A. Datta 13]M. Demarteau 14]A. Denisov 4]M. Diefenthaler 14]A. Durum 12]D. Fields 4]Y. Furletova 15]C. Gleason 16]M. Grosse-Perdekamp 17]M. Hattawy 18]X. He 19]H. van Hecke 4]D. Higinbotham 8]T. Horn 17]C. Hyde 15]Y. Ilieva 8]G. Kalicy 11]A. Kebede 20]B. Kim 19]M. Liu 4]J. McKisson 9]R. Mendez 21]P. Nadel-Turonski 8]I. Pegg 4]D. Romanov 18]M. Sarsour 19]C.L. da Silva 22]J. Stevens 18]X. Sun 18]S. Syed 23]R. Towell 24]J. Xie 25]Z.W. Zhao 4]B. Zihlmann 4]C. Zorn \affiliation[1]INFN, Sezione di Roma, 00185 Rome, Italy \affiliation[2]Istituto Superiore di Sanità, 00161 Rome, Italy \affiliation[3]Laboratori Nazionali di Frascati, Via Enrico Fermi 40, 00044 Frascati, Italy \affiliation[4]Jefferson Lab, Newport News, VA 23606 \affiliation[5]Laboratory for Nuclear Science, Massachusetts Institute of Technology, Cambridge, MA 02139 \affiliation[6]Howard University, Washington, DC 20059 \affiliation[7]INFN, Sezione di Ferrara, 44100 Ferrara, Italy \affiliation[8]Catholic University of America, Washington, DC 20064 \affiliation[9]Universidad Técnica Federico Santa María, Valparaíso, Chile \affiliation[10]University of New Hampshire, Durham, NH 03824 \affiliation[11]North Carolina A&T State University, Greensboro, NC 27411 \affiliation[12]University of New Mexico, Albuquerque, NM 87131 \affiliation[13]Oak Ridge National Lab, Oak Ridge, TN 37830 \affiliation[14]Institute for High Energy Physics, Protvino, Russia \affiliation[15]University of South Carolina, Columbia, SC 29208 \affiliation[16]University of Illinois, Urbana-Champaign, IL 61801 \affiliation[17]Old Dominion University, Norfolk, VA 23529 \affiliation[18]Georgia State University, Atlanta, GA 30303 \affiliation[19]Los Alamos National Lab, Los Alamos, NM 87545 \affiliation[20]City College of New York, New York, NY 10031 \affiliation[21]Stony Brook University, Stony Brook, NY 11794 \affiliation[22]College of William & Mary, Williamsburg, VA 2318 \affiliation[23]Abilene Christian University, Abilene, TX 79601 \affiliation[24]Argonne National Lab, Argonne, IL 60439 \affiliation[25]Duke University, Durham, NC 27708 \emailAddcfanelli@mit.edu \keywordsElectron Ion Collider, detector design, Bayesian optimization, machine learning.

## 1 Introduction

The future Electron-Ion Collider (EIC) [accardi2016electron, nas2018] will be a unique experimental facility that will open up new frontiers for exploring Quantum Chromodynamics (QCD), e.g. the 3D structure of nucleons and nuclei in terms of quarks and gluons, and the behaviour of nuclear matter at extreme densities. A key challenge for EIC detectors involves particle identification (PID)—especially hadron identification—which is required for studying many of the most important experimental processes at the EIC. For example, the simulated phase space for pions and kaons in deep inelastic scattering (DIS) processes shows the need of hadronic PID over a momentum range up to 50 GeV/c in the hadron endcap, up to 10 GeV/c in the electron endcap, and up to 5–7 GeV/c in the central barrel [eichandbook_2019].

The purpose of the EIC-PID consortium is to develop an integrated PID system that satisfies the requirements imposed by the EIC science program, and that is compatible with the detector concepts developed at the two candidate sites for the EIC, namely Brookhaven National Laboratory (BNL) and the Thomas Jefferson National Accelerator Facility (JLab). The current baseline design of the EIC-PID system, shown in Fig. 1, includes a dual-radiator RICH (dRICH) and a modular-aerogel RICH (mRICH) that uses a Fresnel lens placed in the electron endcap [del2017design, wong2017modular]. In addition, the Detection of Internally Reflected Cherenkov light (DIRC) detector [eicdirc] is located in the barrel region, where a fast time-of-flight (TOF) system is foreseen to provide PID for low-momenta particles.

In the following,
we propose a general approach to optimizing the dRICH design based on Bayesian optimization and machine learning that encodes detector requirements like mechanical constraints.
We show that the detector design obtained with our automated and highly parallelized framework outperforms the baseline^{1}^{1}1Throughout this article, we refer to the dRICH design prior to this work [del2017design], as the baseline design. dRICH design.
Furthermore, our approach can be applied to any detector R&D, provided that realistic simulations are available.

## 2 Dual-radiator RICH

The goal of the dRICH detector is to provide full hadron identification (/K/p better than apart) from a few GeV/c up to 50 GeV/c in the (outgoing) ion-side endcap of the EIC detector (see Fig. 1), covering polar angles up to 25 degrees; it will also offer e/ separation up to about 15 GeV/c as a byproduct [del2017design]. The dRICH concept was inspired by the HERMES [akopov2002hermes] and LHCb (RICH1 in Run 1) [lhcbdetector, adinolfi2013performance] dual-radiator RICH detectors. The baseline design consists of two radiators (aerogel and CF gas) sharing the same outward-focusing spherical mirror and highly segmented ( mm pixel size) photosensors located outside of the charged-particle acceptance. The baseline configuration is the result of simulation analyses taking into account the geometrical and physical constraints; it can be summarized by the following key parameters: i) maximum device length 1.65 m; ii) aerogel radiator refractive index n(400 nm) = 1.02 and thickness 4 cm; iii) CF gas tank length 1.6 m; iv) polar angle coverage ; v) mirror radius 2.9 m.

In this study, we benefited from the experience of several groups that have built similar devices in the past [akopov2002hermes, adinolfi2013performance, alves2008lhcb, nobrega2003lhcb], and also from the CLAS12 RICH work which is in progress [pereira2016test]. The dRICH baseline detector (see Fig. 2) has been simulated within a Geant4/GEMC framework [gemc]. The Geant4 simulation is based on realistic optical properties based on aerogels tested and characterized by the CLAS12 RICH collaboration [pereira2016test, clas12aerogel]. Absorption length and Rayleigh scattering are included in the simulation, the latter being one of the main sources of background, along with optical dispersion (the spectrum of the Rayleigh scattering is , hence this contribution becomes relevant for photon wavelengths below nm).

Photons produced in the aerogel with wavelengths below 300 nm are removed, imitating the effects of an acrylic filter that will separate the aerogel from the gas, for both filtering and avoiding chemical degradation of the aerogel. The mirror reflectivity is assumed to be 95% and uniform.
The dRICH is in a non-negligible magnetic field and the charged-particle tracks are bending as they pass through the Cherenkov radiators, providing an additional source of uncertainty in the Cherenkov ring reconstruction. The size of this effect is proportional to the path length within the Cherenkov radiators, and therefore, it is important for gas radiator. The magnetic field used in the simulation is that of the JLab detector design at 3T central field.^{2}^{2}2The solenoid is still under development and its central magnetic field is now expected to be smaller than 3 T. This value is, therefore, conservative for the dRICH simulation results. The impact of the magnetic field is confined to the gas angular resolution at large polar angle (right plot of Fig. 3). Our proposed optimization method and its applicability do not depend on the strength of the magnetic field.
The optical sensor area in such a mirror-focusing design can be rather compact, and can be placed in the shadow of a barrel calorimeter, outside of the radiator acceptance.
The pixel size of the photosensors has been assumed to be 3 mm; the quantum efficiency curve of the multi-anode PMT Hamamatsu-H12700-03 has also been assumed, which is similar to that of any potential photosensor suitable for sustaining both relatively high magnetic field and irradiation expected in the dRICH detection region.

The reconstruction of the Cherenkov angle is based on the inverse ray tracing algorithm used by the HERMES experiment (see Ref. [akopov2002hermes] for details on the algorithm). Figure 3 shows the main error contributions to the single photoelectron (p.e.) angular resolution of the two radiators as a function of the charged particle polar angle, see also Ref. [del2017design].

## 3 Methodology and Analysis

Determining how much space to allocate for various detector components, what kind of sensors to use, and which configuration provides the best performance, are often dilemmas that need to be resolved without increasing costs. In addition, for multipurpose experiments like the EIC, and optimizing the detector design requires performing large-scale simulations of many important physics processes. Therefore, learning algorithms (e.g., AI-based) can potentially produce better designs while using less resources. Nowadays there are many AI-based global optimization procedures (e.g. reinforcement learning, evolutionary algorithm). Among these learning algorithms, Bayesian Optimizers (BOs) [jones1998efficient, snoek2012practical] have become popular due to being agnostic to the black-box function to optimize, which can be noisy and non-differentiable. These features make BOs particularly well suited for optimizing the EIC detector design, and, in general, BOs could potentially be deployed for a variety of critical R&D efforts in the near future.

BOs search for the global optimum over a bounded domain of black-box functions, formally expressed as .^{3}^{3}3Here, we assume one is searching for a global minimum.
The aim of a BO is to keep the number of iterations required to identify the optimal value relatively small.
The BO approach has been applied to solve a wide range of problems in different areas from robotics to deep learning [brochu2010tutorial, eric2008active, brochu2010bayesian, lizotte2007automatic, srinivas2012information, hutter2011sequential, bergstra2013hyperopt].
For example, in experimental particle physics it has been used to tune the parameters of Monte Carlo generators [ilten2017event].
When applied to detector design, each point explored corresponds to a different detector configuration.
The function can be thought of as a figure of merit that has to be optimized (e.g., a proxy of the PID performance). Typically Gaussian processes [williams2006gaussian] are used to build a surrogate model of , but other regression methods, such as decision trees, can also be used. For example, Gradient Boosted Regression Trees (GBRT) is a flexible non-parametric statistical learning technique used to model very-expensive-to-evaluate functions. The model is improved by sequentially evaluating the expensive function at the next candidate detector configurations (design points), thereby finding the minimum with as few evaluations as possible.
A cheap utility function is considered, called the acquisition function, that guides the process of deciding the next points to evaluate.
The utility function should balance the trade-off of exploiting the regions near the current optimal solution, and exploring regions where the probabilistic model is highly uncertain.

We have explored the use of BOs in the optimization of the design of the dRICH.
Each point in the parameter space corresponds to a different detector design,
and consists of a vector of values for the parameters in the Table 1.
Running the Geant4 simulation
and subsequent analysis of a single detector design point takes about 10 minutes on a single physical core.^{4}^{4}4Model name: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz, two threads are run per core. A grid search with just 8 dimensions even run on 50 cores in parallel looks unfeasible due to the “curse of dimensionality” [bellman1957dynamic].
Several open-source tools are available for Bayesian optimization (see, e.g., Ref. [snoek2012practical]); the results shown in the following are based on the scikit-learn package [skopt].

In this work, eight parameters are considered when optimizing the dRICH design, inspired by the previous studies done in Ref. [del2017design]: the refractive index and thickness of the aerogel radiator; the focusing mirror radius, its longitudinal (which determines the effective thickness of the gas) and radial positions (corresponding to the axis going in the radial direction in each of the six mirror sectors, see Fig. 2); and the 3D shifts of the photon sensor tiles with respect to the mirror center on a spherical surface, which to some extent determines the sensor area and orientation relative to the mirror. These parameters, reported in Table 1, cover rather exhaustively the two major components of the dRICH: its radiators and optics. They have been chosen to improve the dRICH PID performance, under the constraint that it is possible to implement any values resulting from the optimization with (at worst) only minor hardware issues to solve. We assume 100 as the minimum feasible tolerance on each spatial alignment parameter, whereas for the aerogel, we assume 1 on the thickness and 0.2% on the refractive index. A relevant parameter has been essentially neglected in the optimization: the gas refractive index, whose tuning would require a pressurized detector making this choice hardly convenient. We also postpone the optimization of the TOF-aerogel transition region, since at the moment these are two separate detectors with different simulation frameworks. The parameter space can be extended once detailed results from prototyping and tests will be available.

parameter | description | range [units] | tolerance [units] |
---|---|---|---|

R | mirror radius | [290,300] [cm] | 100 [m] |

pos r | radial position of mirror center | [125,140] [cm] | 100 [m] |

pos l | longitudinal position of mirror center | [-305,-295] [cm] | 100 [m] |

tiles x | shift along x of tiles center | [-5,5] [cm] | 100 [m] |

tiles y | shift along y of tiles center | [-5,5] [cm] | 100 [m] |

tiles z | shift along z of tiles center | [-105,-95] [cm] | 100 [m] |

n | aerogel refractive index | [1.015,1.030] | 0.2% |

t | aerogel thickness | [3.0,6.0] [cm] | 1 [mm] |

Since the aim of the design optimization is to maximize the PID performance of the dRICH, it is natural to build the objective function using the separation power between pions and kaons, defined as

(1) |

where and are the mean Cherenkov angles for kaons and pions, respectively, obtained from the angular distributions reconstructed by Inverse Ray Tracing [akopov2002hermes]. Here, is the mean number of detected photo-electrons and the single photo-electron angular resolution; the reconstructed angles are approximately Gaussian distributed, hence corresponds to the averaged RMS of the above mean angles, i.e. the ring angular resolution.

In order to simultaneously optimize the combined PID performance of both the aerogel and gas parts in the dRICH, two working points have been chosen based on the performance of the baseline design. In particular, we chose one momentum value with separation each for the aerogel and the gas: 14 GeV/c and 60 GeV/c, each close to the end of the curves reported in Fig. 3 for the aerogel and gas, respectively. The goal here is to optimize the quantities () corresponding to the working points and . One choice of the figure of merit which proved to be effective is the harmonic mean

(2) |

The harmonic mean tends to favor the case where the PID performance (i.e. ) is high in both parts of the detector.
The optimization consists of finding the maximum of the figure of merit (FoM), as defined in (2).^{5}^{5}5BOs in skopt search for a minimum, hence we change the sign of (2).
Samples of pions and kaons have been produced with Geant4 [gemc] using particle guns with central values of the momentum corresponding to 15 and 60 GeV/c. In order to cover a larger region of the phase-space in the forward sector of the detector, the polar angles have been spanned uniformly from 5 to 15 degrees.

Figure 4 shows the posterior distribution after calls projected in 2-dimensional subspaces of the design parameter space. These plots illustrate the possible correlations among the parameters. The optimal point in each subspace is marked with a red dot. Notice that the black points, corresponding to the points evaluated by the BO in its ask-and-tell procedure, tend to form basins of attraction around the minimum. Recall that the black-box function we are optimizing is noisy in the Geant4 simulation (see also Appendix B for a detailed study on statistical uncertainties). Plots of the partial dependence of the objective function on each parameter are displayed in Fig. 4. Given a function of variables, the partial dependence of on the -th variable is defined as [friedman2001greedy]

(3) |

The above sum runs over a set of random points from the search space, and the partial dependence can be considered as a method for interpreting the influence of the input feature on after averaging out the influence of the other variables.

The search for the optimal FoM with the BO is compared to a standard random search (RS) in Fig. 6 (left).
A simple optimization technique like random search can be fast but as pointed out by Ref. [bergstra2012random] sequential model-based optimization methods—particularly Bayesian optimization methods—are more promising because they offer principled approaches to weighting the importance of each dimension.
Figure 6 (right) shows a comparison of the CPU time required for the two procedures.
In both cases, to make a comparison using the same computing resources, the exploration of the parameter space at each call has been distributed among (=20) physical cores^{6}^{6}6It should be obvious that the duration of each call for a random search is independent of the number of simulations running in parallel on different cores. each evaluating a different point of the design space.
As the well known main drawback of the BO is that it scales cubically with the number of explorations, which is mainly due to the regression of the objective function through GPs (in Sec. 4, we discuss an improved strategy). Despite this, it typically converges to the optimum in a smaller number of iterations (and as the number of observations increase, it remains in the basin of attraction). A list of hyperparameters used in this framework can be found in Table 3.

Table 2 summarizes the results of the optimization procedure based on the figure of merit in Eq. (2). Figure 7 shows the trade-off between the two regions in terms of PID performance found during the optimization process, where initially both aerogel and gas increase in separation power, and eventually after a certain number of calls a gain in the performance of the gas corresponds to a loss in the performance of the aerogel part. Another interesting feature suggested by the results in Fig. 4 and Table 4: we can increase the refractive index of the aerogel only if we increase its thickness and maintain a sufficiently high enough yield of Cherenkov photons.

FoM (h) | @ 14 GeV/c | @ 60 GeV/c | |
---|---|---|---|

BO | 3.23 | 3.16 | 3.30 |

legacy | 2.9 | 3.0 | 2.8 |

symbol | description | value |
---|---|---|

T | maximum number of calls | 100 |

M | points generated in parallel (GP) | 20 |

N | pions (and kaons) per sample | 200 |

kappa | controls variance in predicted values | 1.96 |

xi | controls improvement over previous best values | 0.01 |

noise | expected noise (relative) | 2.5% |

parameter | results | tolerance region |
---|---|---|

R | 300.0 | (296.2,300.0) [cm] |

pos r | 130.5 | (127.8,132.7) [cm] |

pos l | -299.8 | (-302.0,-298.0) [cm] |

tiles x | -2.8 | (-4.7,0.4) [cm] |

tiles y | 5.0 | (-5.0,5.0) [cm] |

tiles z | -95.00 | (-96.85,-95.00) [cm] |

n | 1.030 | (1.025, 1.030) |

t | 6.0 | (4.6, 6.0) [cm] |

Since BO provides a model of how the FoM depends on the parameters, it is possible to use the posterior to define a tolerance on the parameters. Domain knowledge is required to decide how large of a change in the FoM is irrelevant, then the FoM space can be scanned to define how much each parameter can vary before seeing a relevant change. To this end, we use the proxy , where and are the expected value and standard deviation on the FoM as a function of provided by the model, and, in particular, is the optimal value found by the BO for the considered parameter. The tolerance interval is defined as the region where , this condition meaning a variation which is not statistically significant. The values reported in Table 4 are obtained following the above strategy.

The results discussed in the previous sections for the dRICH are very promising: Fig. 8 summarizes the overall improvement in PID performance over a wide range of momentum obtained with the Bayesian optimization compared to the baseline curve based on [del2017design]. At least 3 separation is achieved in the aerogel region for P 13.5 GeV/c (compared to the baseline 12.5 GeV/c), whereas in the gas region the same separation is obtained for P 64 GeV/c (compared to 58 GeV/c). Both the baseline and the optimized curves have been calculated using charged tracks in the angular region (5,15) deg. As expected, once the optical properties of the aerogel are fixed (i.e. the refractive index dispersion curve, strictly connected with the single photon chromatic error), there is less room for a purely geometrical optimization of the system compared to the gas where the emission error is the dominant one and it is instead purely geometric.

## 4 Conclusions and Future Directions

We present for the first time an implementation of BO to improve the design of future EIC detectors, and we provide a detailed procedure to do this in a highly parallelized and automated way. We show in this study that the PID capabilities of the dRICH detector are substantially improved using our approach. In addition, it is possible to estimate the expected tolerances, within which any variation of the parameters does not alter the detector performance. More generally, real-world costs of the components could be included in the optimization process, either by extending the FoM or by exploring a Pareto optimization with multiple objective functions [ngatchou2005pareto].

Currently, there are many ongoing efforts to simulate and analyse EIC detector designs, and the approach developed here—being completely general—can be employed for any such study. There are a variety of ways in which this study could be improved, which will be investigated in the near future. For example, in Appendix C, we introduce criteria for determining when to stop the optimization procedure, to avoid wasting resources if a suitable optimum has already been found. This work does not take into account the possible interplay between the dRICH and the other detectors. There is work ongoing on designing the TOF detector at very good timing resolution, and this will have an impact on the PID performance at low momentum. Therefore, a multi-detector design optimization is a possible future direction of this work. Another important aspect involves the choice of optimizing all the parameters together versus in blocks. For example, in the Pythia BO tune [ilten2017event], better precision was obtained by optimizing the parameters in 3 blocks rather than all of them together. This worked because many observables were unaffected by several Pythia parameters. Once multi-detector optimizations are considered, it may be prudent to adopt a similar approach here. In addition, one can certainly test different figures of merit; however, the figure of merit defined in Eq. (2) is well suited to optimizing the dRICH design, see Appendix A. Novel Python frameworks for Bayesian optimizations like GPflowOpt [knudde2017gpflowopt] could improve the timing performance. This package is based on the popular GPflow library for Gaussian processes, leveraging the benefits of TensorFlow [abadi2016tensorflow] including parallelization. Recently, optimization packages have been developed to rely on accelerated CUDA/GPU implementations. For example, Deep Networks for Global Optimization (DNGO) [snoek2015scalable], where neural networks (NNs) are used as an alternative to GPs resulting in an improved scalability from cubic to linear as a function of the number of observations needed to explore the objective-function. The choice of hyperparameters of the NN is not defined a priori and in principle this could also be optimized with BO techniques to maximize the validation accuracy and inference time.

In conclusion, AI-based tools have been introduced for the optimization of the dRICH configuration; the preliminary results clearly show a substantial improvement in performance. These same tools can be extended and applied to other detectors and possibly to the entire experiment, making the EIC R&D one of the first programs to systematically exploit AI in the detector-design phase.

We thank R. Yoshida for encouraging this work. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Nuclear Physics under contracts DE-AC05-06OR23177, DE-FG02-94ER40818. For this work, CF was supported by the EIC fellowship award at Jefferson Lab.

## Appendix A Angular dispersion in RICH detectors

The error on the Cherenkov angle is given by several contributions, ; the error on the measured Cherenkov angle of a single photon is given by the quadratic sum of all the relevant contributions:

(4) |

The single terms of the sum can be defined once the radiator and the geometry of the RICH detector are defined (see [nappi2005ring, ypsilantis1994theory] for a compendium). Concerning the dRICH reflecting configuration we are considering the relevant contributions are those in Fig. 3 (upper panel). To the four contributions in Fig. 3 an uncertainty (in mrad) on the angle of the charged hadrons tracks can be considered, for conventional tracking systems this will add an additional uncertainty of about 0.5 mrad on .

### Chromatic error

The most important aspect in the choice of a Cherenkov radiator for a RICH detector is the index of refraction of the material and its characteristic optical dispersion. Namely, the first term in 4 is due to the variation of the refractive index of the materials traveled by the photon having is own energy (or wavelength ). In the case of a dual-radiator (let us assume aerogel and gas) this error consists of two contributions for the first radiator: the uncertainty on the photon emission wavelength and the uncertainty on the refraction between aerogel and gas surface. Therefore we have:

(5) |

where

(6) |

with the refractive index of the aerogel.

Finally we have for the emission contribution to the chromatic error

(7) |

and

(8) |

where the variation of the refractive index of the aerogel can be inferred by recent measurements on the latest generation aerogel (i.e. the one tested by the CLAS12 RICH collaboration [pereira2016test]) (the result is perfectly consistent with that in [nappi2005ring] using instead of ). The uncertainty due to the refraction of the photons between the two materials is in general one order of magnitude lower than the chromatic aberration; nevertheless it has to be considered and somehow corrected in any kind of geometric reconstruction algorithm. For the second radiator (the gas in this case) only the first contribution is present.

## Appendix B Noise Studies

We investigated and characterized the statistical noise present on the objective function during the optimization process. Notice that Bayesian optimization takes into account this noise as an external parameter, e.g. skopt [skopt] allows to deal with this noise in different ways, for example by adding it (assumed to be ideal gaussian) to the Matérn kernel [williams2006gaussian]. From simulations we have been able to determine the relative uncertainty on the figure of merit, which as expected scales (e.g. 2% if ).

The relative fluctuations of the terms and (Fig. 9) contribute roughly equally in Eq. (2) as one can expect from the statistical errors propagation; they are largely independent on their absolute values (as demonstrated by the left plot in Fig. 10) and therefore on the charged particle momenta (as long as they are above Cherenkov threshold). In the results shown in Fig. 9 and 10, we neglected the uncertainty on the difference between the mean angles, which is smaller than the uncertainty on and .

## Appendix C Early stopping criteria

Each call is characterized by design points (generated in parallel), each being a vector in the parameters space defined in Tab. 1, with -dimensions. At each call , we can calculate the average point (vector) from the vectors (where ), as:

(9) |

For a large number of calls (after a burn-in period chosen to be equal to 10 calls), the design point expressed by Eq. (9) tends to a global optimum steered by the BO.

In the following we will use the notation to express the element of the point, where and . At each call, we can also define the sample variance on each component of Eq. (9), in the following way:

(10) |

Pre-processing of data consists in masking for each component the outliers distant by more than 3 from the average component. They correspond to very large explorations in the optimization process.
Therefore, the effective number of points for each component and at each call becomes .
and we can redefine the masked average and variance expressed in Eq. (9) (10) after removing these outliers.
To simplify the notation, we will dub the masked mean and variance on every component as and ^{7}^{7}7Standard deviations smaller than tolerances are replaced with the latter in Eqs. (11),(13)..

This allows to define the standardized variable:

(11) |

Notice that Eq. (11) is defined comparing the values in the previous () and current () calls^{8}^{8}8Rigorously if 40 typically one refers to small sampling theory and a t-distibution should be considered. The stopping criteria have been implemented in a flexible way that allows to use either a standardized Z or t-distribution depending on the number of physical cores available..
Now at each call and for each component, we can test the null hypothesis that and belong to a population with same mean , that is they are converging in the component to the same value.
We do two-tailed p-value tests at 0.3% significance level (corresponding approximately to 3) for each component () in the parameter space.

(12) |

The -functions in Eq. (12) represent boolean operators, that is each is 1 when the null is confirmed on the component. In addition, it is possible to build a Fisher statistic,

(13) |

which can be used to determine whether or not the variance is significantly larger than at a certain significance level, e.g., 1%. This is done for all the components, similarly to what already discussed for (11), and enters as a multiplicative condition in (12). If all of them are 1 then the overall function becomes 1 too (in that case meaning the x stopper is activated). The stopping criteria on at the call corresponds to p-value tests all confirming the hypothesis as well. In order to stop the BO, the other condition (y stopper is activated) has to be satisfied along with Eq. (12).

(14) |

As for the value of the figure of merit , we consider at each call the best optimal values (e.g. ) obtained up to that call, and calculate the difference between the worst (largest) and the best (smallest)^{9}^{9}9In a minimization problem the lowest of the minima is considered the best value. as

(15) |

The statistic Eq. (15) vary typically rather smoothly if q5, and we can apply a cut on the relative variation between two successive calls, requiring

(16) |

where is the arithmetic mean between and . The threshold values used to activate the above booleans have been chosen based on a toy model with the same number of parameters to tune. In conclusion, when the two conditions (12) and (16) are both activated or the maximum number of calls reached then the search is stopped. It should be clear that the empirical stopping criteria expressed in Eq. (12) and (16) are not sufficient to exclude the presence of a local optimum. Nevertheless Bayesian optimization is one of the most suitable tools to find a global optimum in a bounded region and we made dedicated studies in Appendix B to check the consistency of the values of our cuts when the stoppers are triggered with the expectations (e.g.,we know that the relative statistical uncertainty on the harmonic mean for a number of tracks equal to 400 is about 2%). We can also compare the results obtained using a regressor other than GP (e.g. GBRT, extra trees (ET), random forest (RF) regressors etc. [skopt]), as explained in the text. A simple random search is fast enough to run and can provide useful hints if the candidate point is far from the optimum.

Notice that (11) could produce potential issues if the RMS in the denominator is large compared to numerator. The following combined requirements prevent from these issues: (a) minimum number of burn-in calls, (b) all the booleans are true in Eqs. (14) and (16), (c) additional request that the above conditions (a) and (c) are true for a call and check that this holds in the successive call before activate the early stopping.

Another parameter to define is the maximum number of calls that stops the search in case the other stopping criteria did not trigger.
We consider the heuristic formula determined by [ilten2017event]^{10}^{10}10An hand-waving estimate for the number of calls needed when running simulations in parallel is obtained by scaling that number by . as a possible lower bound.
Clearly the total number of calls can be set to any larger value provided enough computing/time resources.