Constraining the Parameters of High-Dimensional Models with Active Learning

Constraining the Parameters of High-Dimensional Models with Active Learning

Sascha Caron Institute for Mathematics, Astro- and Particle Physics IMAPP, Radboud Universiteit, Nijmegen, The Netherlands Nikhef, Amsterdam, The Netherlands    Tom Heskes Data Science Institute for Computing and Information Sciences (iCIS), Radboud University, Nijmegen, The Netherlands    Sydney Otten Institute for Mathematics, Astro- and Particle Physics IMAPP, Radboud Universiteit, Nijmegen, The Netherlands GRAPPA, University of Amsterdam, The Netherlands    Bob Stienen b.stienen@science.ru.nl Institute for Mathematics, Astro- and Particle Physics IMAPP, Radboud Universiteit, Nijmegen, The Netherlands
July 5, 2019
Abstract

Constraining the parameters of physical models with parameters is a widespread problem in fields like particle physics and astronomy. The generation of data to explore this parameter space often requires large amounts of computational resources. A reduction of the relevant physical parameters hampers the generality of the results. In this paper we show that this problem can be alleviated by the use of active learning. We illustrate this with examples from high energy physics, a field where computationally expensive simulations and large parameter spaces are common. We show that the active learning techniques query-by-committee and query-by-dropout-committee allow for the identification of model points in interesting regions of high-dimensional parameter spaces (e.g. around decision boundaries). This makes it possible to constrain model parameters more efficiently than is currently done with the most common sampling algorithms. Code implementing active learning can be found on GitHub .

I Introduction

With the rise of computational power seen over the last decades, science has gained the power to evaluate predictions of new theories and models at unprecedented speeds. Determining the output or predictions of a model given a set of input parameters often boils down to running a program and waiting for it to finish. The same is however not true for the inverse problem: determining which (ranges of) input parameters a model can take to produce a certain output (e.g., finding which input parameters of a universe simulation yield a universe that looks like ours) is still a challenging problem. In fields like high energy physics and astronomy, where high-dimensional models are widespread, determining which model parameter sets are still allowed given experimental data is a time-consuming process that is currently often approached by looking only at lower-dimensional simplified models. This not only still requires large amounts of computational resources, in general it also reduces the range of possible physics the model is able to explain.

Figure 1: With active learning new data points can be sampled in regions of interest, like for example a decision boundary in a classification problem. The figure shows how the initial estimation (dashed-dotted red line) of the decision boundary (dashed black line) is located at the location where the classification of new points is most uncertain. By iteratively sampling new points (crosses) in this most uncertain region and determining a new estimation of the decision boundary, the estimation of the boundary will get increasingly more accurate, as can be seen in the picture for 3 iterations (solid blue line).

In this paper we approach this problem by exploring the use of active learning settles:2010 (); Seung:1992:QC:130385.130417 (); Cohn1994 (), an iterative method that applies machine learning to guide the sampling of new model points to specific regions of the parameter space. Active learning reduces the time needed to run expensive simulations by evaluating points that are expected to lie in regions of interest. As this is done iteratively, this method increases the resolution of the true boundary with each iteration. For classification problems this results in the selection (i.e. the sampling) of points around – and thereby a better resolution on – decision boundaries, as can be seen in Figure 1. In this paper we investigate a technique called query-by-committee Seung:1992:QC:130385.130417 (), which allows for usage of active learning in high-dimensional parameter spaces.

The paper is structured as follows: in Section II we explain how active learning works. In Section III we show applications of active learning to determine decision bounds of a model in the context of high energy physics, working in model spaces of a 19-dimensional supersymmetry (SUSY) model111Supersymmetry (SUSY) is a theory that extends the current theory of particles and particle interactions by adding another space-time symmetry. It predicts the existence of new particles which could be measured in particle physics experiments, if supersymmetry is realised in nature.. We conclude the paper in Section IV with a summary and future research directions.

Ii Active Learning

Simulations are nowadays widespread in science. However, as these can be computationally expensive to run, exploring the output space of these simulations can be a costly endeavour. Approximations of simulations can however be constructed in the form of machine learning estimators, which are typically quick to evaluate. Active learning leverages this speed, exploiting the ability to quickly estimate how much information can be gained by querying a specific point to the true (expensive) labeling procedure.

Active learning works as an iterative sampling technique. In this paper we specifically explore a technique called pool-based sampling settles:2010 (), of which a diagrammatic representation can be found in Figure 2. In this technique an initial data set is sampled from the parameter space and queried to the labeling procedure (also called the oracle). After retrieving the new labels one or more machine learning estimators are trained on the available labeled data. This estimator (or set of estimators) can then provide an approximation of the boundary of the region of interest. We gather a set of candidate (unlabeled) data points, which can for example be sampled randomly or be generated through some simulation, and provide these to the trained estimator. The output of the estimator can then be used to identify which points should be queried to the oracle. For a classification problem this might for example entail finding out which of the candidate points the estimator is most uncertain about. As only these points are queried to the oracle, it will not spend time on evaluating points which are not expected to yield significant information about our region of interest. The selected data points and their labels are then added to the total data set. This procedure of creating an estimator, collecting points, finding the most interesting points with respect to the region of interest, labeling them and adding them to the data set can be repeated to get an increasingly better estimation of the region of interest and be stopped when e.g. the collected data set reaches a certain size or when the performance increase between iterations becomes smaller than a predetermined size.

It should be noted that the active learning procedure as described above has hyperparameters: the size of the initial dataset, the size of the pool of candidate data points and the number of candidate data points queried to the oracle in each iteration. Finding the optimal configuration for the active learning procedure requires a dedicated search. As we intend to show the added benefit of active learning and not what the absolute best performance of active learning is, we did not perform an extensive grid search for the optimisation. Instead we performed a small random search on the hyperparameters of the experiments in Section III and selected the best configuration for all experiments. For completeness a discussion on the hyperparameters can be found in Appendix A. We do want to note that in any active learning configuration we experimented with, active learning always performed at least equally as good as random sampling.

Select initial data

Label data

Add to dataset

Train ML estimator(s)

Sample candidate points

Select most interesting

Finish sampling

Figure 2: Diagrammatic representation of active learning. Data is sampled and used to create a data set. This data is used to train an ML estimator (or a committee of estimators), which is used to get an approximation of the labeling on newly sampled data. From this new data the points with the highest uncertainty in their labeling are selected for sampling via the true sampling procedure and added to the data set. This process can be repeated until enough data is collected.

In Figure 2 arguably the most important step is to select those points that ought to be queried to the labeling procedure from a large set of candidate data points. As the problems we look at here are classification problems, the closeness to the boundary can be estimated by the uncertainty of the trained estimator on the classification of the model point.

This uncertainty can for example be obtained from an algorithm like Gaussian Processes gaussianprocesses (), which has already been successfully applied in high energy physics to steer sampling of new points around 2-dimensional exclusion boundaries algp (). Due to the computational complexity of this algorithm it is however limited to low-dimensional parameter spaces, as it scales at best with the number of data points squared 2018arXiv180911165G (). Because of this, we investigate specifically the query-by-committee and query-by-dropout-committee scheme.

ii.1 Query-by-Committee (QBC)

By training multiple machine learning estimators on the same data set, one could use their disagreement on the prediction for a data point as a measure for uncertainty. Points with a high disagreement in their predictions are expected to provide the highest information gain. This method is called query-by-committee (QBC) Seung:1992:QC:130385.130417 (). To create and enhance the disagreement among the committee members in uncertain regions the training set can be changed for each estimator (e.g. via bagging bagging ()) or by varying the configuration of the estimator (e.g. when using a committee of neural networks, each of these could have a different architecture or different initial conditions), such that we get a reasonable amount of diversity in the ensemble.

The disagreement among the estimators can for example be quantified by the standard deviation. For binary classification problems it can even be done by taking the mean of the outputs of the set of estimators. If the classes are encoded as 0 and 1, a mean output of 0.5 would mean maximal uncertainty, so an uncertainty measure for estimators could for example be

(1)

An uncertainty of 1.0 would indicate maximum uncertainty.

The advantage of the QBC approach is that it is not bound to a specific estimator. If one were to use estimators of which the training scales linearly with the number of data points , the active learning procedure would have a computational complexity of for each iteration. This allows for the use of large amounts of data, as is needed in high-dimensional parameter space.

ii.2 Query-by-Dropout-Committee (QBDC)

The committee can also be built by using a technique called Monte Carlo dropout 2015arXiv150602142G (). This technique uses a neural network with dropout layers dropout () as the machine learning estimator. These dropout layers are normally used to prevent overtraining (i.e. increased performance on the training set at the cost of a reduction in performance on general data sets) by disabling a fraction of the neurons in the preceding layer of the network at random at each evaluation of input data during training. In this way it cannot learn to rely entirely on specific features and correlations in the input data, resulting in more robustness during inference. The dropout is then typically disabled when actually used to create predictions on unseen data, so that the full network is used for inference.

In Monte Carlo dropout, on the other hand, these layers are left enabled during evaluation of input data, even after training, making the output of the network vary in each evaluation. The prediction for a constant input will therefore change for each evaluation and the number of times a prediction is made can then be interpreted as the number of members in a committee of a QBC approach. The advantage here however is that only a single network has to be trained dropout_based_al (); 2015arXiv151106412D (); 2018arXiv181103897P (). Due to the use of Monte Carlo dropout, this method is called Query-by-Dropout-Committee (QBDC).

Iii Applications in HEP

In this section active learning as a method is investigated using data sets from high energy physics. The experiments investigated here are all classification problems, as these have a clear region of interest: the decision boundary. It should be noted that the methods explored here also hold for regression problems with a region of interest (e.g. when searching for an optimum). Although active learning can also be used to improve the performance of a regression algorithm over the entire parameter space, whether or not this works is highly problem and algorithm dependent, as can for example be seen in ref. Schein2007 ().

iii.1 Increase resolution of exclusion boundary

As there are no significant experimental signals found in “beyond the standard model” searches that indicate the presence of unknown physics, the obtained experimental data is used to find the region in the model parameter space that is excluded – or not-excluded yet – by experiment. Sampling the region around this boundary in high-dimensional spaces is highly non-trivial with conventional methods due to the curse of dimensionality.

We test the application of active learning on a 19-dimensional model of new physics (the 19-dimensional pMSSM Martin:1997ns ()) as a method to tackle this problem. This test is related to earlier work on the generalisation of high-dimensional results, which resulted in SUSY-AI Caron:2016hib (). In that work the exclusion information on model points as determined by the ATLAS collaboration Aad2015 () was used; the same data is used in this study. We investigate three implementations of active learning: two Random Forest set ups, one with a finite and the other with an infinite pool, and a setup with a QBDC. The performance of each of these is compared to the performance of random sampling, in order to evaluate the added value of active learning. This comparison is quantified by using the following steps:

  1. Call max_performance the maximum reached performance for random sampling;

  2. Call the number of data points needed for random sampling to reach max_performance;

  3. Call the minimum number of data points needed for active learning to reach max_performance;

  4. Calculate the performance gain through

    (2)

The configurations of the experiments were explicitly made identical and were not optimized on their own. The results of the experiments are therefore not able to identify which setup works best and only serve to investigate whether, and if so by how much, each of these techniques outperforms random sampling in constraining parameters in high-dimensional models.

iii.1.1 Random Forest with a finite pool

Just as for SUSY-AI we trained a Random Forest classifier on the public ATLAS exclusion data set Aad2015 () (details on the configuration of this experiment can be found in Appendix B). This data set was split into three parts: an initial training set of 1,000 model points, a test set of 100,000 model points and a pool of the remaining model points. As the labeling of the points is 0 for excluded points and 1 for allowed points, after each training iteration the 1,000 new points with their Random Forest prediction closest to 0.5 (following the QBC scheme outlined in Section II.1) are selected from the pool and added to the training set. Using this now expanded dataset a new estimator is trained from scratch. The performance of this algorithm is determined using the test set.

This experiment is also performed with all points selected from the pool at random, so that a comparison of the performance of active learning and random sampling becomes possible. The results of both experiments are shown in Figure 3. The bands around the curves in this figure indicate the range in which the curves for 7 independent runs of the experiment lie. The figure shows that active learning outperforms random sampling initially, but after a while random sampling catches up in performance. The decrease in accuracy of the active learning method is caused by an overall lack of training data. After having selected approximately 70,000 points via active learning, new data points are selected further away from this boundary, causing a relative decrease of the weight of the points around the decision boundary, degrading the generalisation performance.

Based on Figure 3 the performance gain of active learning over random sampling in the early stages of learning, up to a train size of 50,000, – as described by Equation 2 – lies in the range 3.5 to 4.

Figure 3: Accuracy development on model exclusion of the 19-dimensional model for new physics (pMSSM) for random sampling and active learning using a random forest as algorithm and a finite pool. True labeling was provided by ATLAS Aad:2015baa (). Active learning quickly starts outperforming the random sampling. The decline in accuracy for active learning, starting from a training size of 60,000, is caused by the limited size of the pool and the fact that the region around the pool is depleted from data around the decision boundary. The bands around the curves show the range in which all curves of that colour lie when the experiment was repeated 7 times.

iii.1.2 Random Forest with an infinite pool

We replace the finite ATLAS data pool with a sampling procedure in which new points are sampled from a uniform prior of the training volume of SUSY-AI. Although in each iteration only a limited set of candidate points is considered, the fact that this set is sampled anew in each iteration guarantees that the decision boundary is never depleted of new candidate points. Because of this, the pool can be considered infinite. In contrast to the experiment in Section III.1.1, where labeling (i.e., excluded or allowed) was readily available, determining true labeling on these newly sampled data points would be extremely costly. Because of this SUSY-AI Caron:2016hib () was used as a stand-in for this labeling process222Since SUSY-AI has an accuracy of 93.2% on the decision boundary described by the ATLAS data Aad2015 (), active learning will not find the decision boundary described by the true labeling in the ATLAS data. However, as the goal of this example is to show that it is possible to find a decision boundary in a high-dimensional parameter space in the first place, we consider this not to be a problem.. Since we are training a Random Forest estimator, we retrained SUSY-AI as a neural network, to make sure the trained Random Forest estimator would not be able to exactly match the SUSY-AI model, as this would compromise the possibility to generalise the result beyond toy examples like this one. The accuracy of this neural network was comparable to the accuracy of the original SUSY-AI. Details on the technical implementation can be found in Appendix B.

Figure 4: Accuracy development on model exclusion of the 19-dimensional model for new physics (pMSSM) for random sampling and active learning using a random forest as algorithm and an infinite pool. True labeling was provided by a machine learning algorithm trained on model points and labels provided by ATLAS Aad:2015baa (). Here active learning is vastly superior over random sampling, yielding a gain in computational time of a factor of 5 to 6. The bands around the curves show the range in which all curves of that colour lie when the experiment was repeated 7 times.

The accuracy development as recorded in this experiment is shown in Figure 4. The bands again correspond to the ranges of the accuracy as measured over 7 independent runs of the experiment. The gain of active learning with respect to random sampling (as described by Equation 2) is 5 to 6. The overall reached accuracy is however lower than in Figure 3, but note that this experiment stopped when a total of points as sampled, compared to the points in the previous experiment.

iii.1.3 QBDC with an infinite pool

To test the performance of QBDC, the infinite pool experiment above was repeated, but now with a QBDC setup. The technical details of the setup can be found in Appendix B. The accuracy development plot resulting from the experiment can be seen in Figure 5. The bands around the lines representing the accuracies for active learning and random sampling indicate the minimum and maximum gained accuracy for the corresponding data after running the experiment 7 times. The performance gain (as defined in Equation 2) for active learning in this experiment lies in the range 3 to 4. QBDC sampling is approximately times faster than ensemble sampling with committee members for a fixed number of samples, as only one network has to be trained. However, as active learning outperforms random sampling by a factor of 3 to 4, it depends on how expensive training of the estimator is in comparison to how much computational time is gained.

Figure 5: Accuracy development on model exclusion of the 19-dimensional model for new physics (pMSSM) for random sampling and active learning using a dropout neural network with infinite pool. True labeling was provided by a machine learning algorithm trained on model points and labels provided by ATLAS Aad:2015baa (). The gain of active learning with respect to random sampling (as described by Equation 2) is 3 to 4. The bands show the range in which all curves of that colour lay when the experiment was repeated 7 times.

Compared to Figure 3 and 4 the accuracies obtained in Figure 5 are significantly higher. This can be caused by the fact that the model trained to quantify the performance more strongly resembles the oracle (both of them are neural networks with a similar architecture), or that the neural network is inherently more capable of capturing the exclusion function. In the two earlier experiments the trained models were Random Forests that tried to replicate the true ATLAS exclusion function and the SUSY-AI neural network respectively.

iii.2 Identifying uncertain regions and steering new searches

Instead of using active learning e.g. to iteratively increase the resolution on for example a decision boundary, the identification of uncertain regions of the parameter space on which active learning is built can also be used to identify regions of interest.

For example, in high energy physics one could train an algorithm to identify model points around the exclusion boundary in a high-dimensional model. These model points could then be used as targets for new searches or even new experiments. This is an advantage over the conventional method of trying to optimise a 2-dimensional exclusion region in a plot, as this method works over the full dimensionality of the model, which thereby can respect a more detailed account of the underlying theory that is being tested for. One could even go a step further by reusing the same pool for these search-improvement studies, so that regions of parameter space that no search has been able to exclude can be identified. Analogous to this one could also apply this method to find targets for the design of a new experiment.

To test the application of this technique in the context of searches for new physics we trained a neural network on the publicly available ATLAS exclusion data on the pMSSM-19 Aad2015 (), enhanced with the 13 TeV exclusion information as calculated by Barr:2016inz (). The technical setup is detailed in Appendix B. We sampled model points in the SUSY-AI parameter space Caron:2016hib () using a spectrum generator (SOFTSUSY 4.1.0 softsusy ()) and selected 1,000 points with the highest uncertainty following the QBDC technique outlined in Section III.1.3.

Figure 6 shows the sampled model points in the gluino mass - LSP mass projection. As the LSP mass was not directly one of the input parameters, the fact that the selected points are nevertheless well-sampled in the region of the decision boundary, we conclude that the active learning algorithm did successfully find the decision boundary in the 19-dimensional model.

We conclude this section by noting that in all the active learning experiments in this section new points were selected exclusively with active learning. In more realistic scenarios the user can of course use a combination of random sampling and active learning, in order not to miss any features in parameter space that were either unexpected or not sampled by the initial dataset.

Figure 6: The model points that were selected in a pool-based sampling projected on the gluino mass () - LSP mass () plane. The algorithm did not have direct access to the variables on the axes, but was nevertheless able to sample points in the region around the decision boundary, indicated by the solid black line in the figure. The dashed red lines indicate the boundary of the model, outside of which no data points will be sampled (caused by the fact that no supersymmetric particle can be lighter than the LSP).

Iv Conclusion

In this paper we illustrated the possibility to improve the resolution of regions of interest in high-dimensional parameter spaces. We specifically investigated query-by-committee and query-by-dropout-committee as a tool to constrain parameters and the possibility to improve the identification of uncertain regions in parameter space to steer the design of new searches. We find that all active learning strategies presented in this paper query the oracle more efficiently than random sampling, up to a factor of 6.

One of the limiting factors of the techniques as presented in this paper is the fact that still a pool of candidate points needs to be sampled from the parameter space. If sampling candidate points randomly yields too few points of high enough interest, generative models can be used to sample candidate points more specifically.

Code showing the implementation of the three investigated active learning techniques is made public on GitHub 333https://github.com/bstienen/active-learning.

Acknowledgements

This research is supported by the Netherlands eScience Center grant for the iDark: The intelligent Dark Matter Survey project.

Appendix A Active learning hyperparameters

The active learning procedure as implemented for this paper has three hyperparameters:

  • size_initial: The size of the data set used at the start of the active learning procedure;

  • size_sample: The size of the pool of candidate data points to be sampled in each iteration

  • size_select: The number of data points to select from the pool of candidate data points and query to the oracle.

Which settings are optimal depends on the problem at hand, although some general statements can be made about the possible values for these hyperparameters. To illustrate this we performed a hyperparameter optimisation for the experiment in Section III.1.2, although it should be noted that this optimisation was performed only for illustration purposes and was not used to configure the experiments in this paper.

The size_initial for example configures how well the first trained machine learning estimator approximates the oracle. If this approximation is bad, the first few sampling iterations will sample points in what will later turn out to be uninteresting regions. A higher value for size_initial would therefore be preferable over a smaller value, although this could diminish the initial motivation for active learning: avoiding having to run the oracle on points that are not interesting with respect to a specific goal.

The size_sample parameter however will have an optimum: if chosen too small the selected samples will be more spread out and possibly less interesting points will be queried to the oracle. If chosen too high on the other hand the data could be focused in a specific subset of the region of interest because the trained estimator happens to have a local minimum there. The existence of an optimal value for size_sample can be seen in Figure 7.

Figure 7: The dependence of the accuracy in the last iteration of the active learning procedure on the number of candidates in each iteration. The error bars indicate the range within which the accuracies over 7 runs lie. As described in the text, an optimum value can be observed, although it should be noted that this value also depends on the number of data points selected in each iteration.

It should be noted that the location of the optimum does not only depend on size_sample, but also on size_select. If one were to set size_select to 1, the size of the candidate pool is best as large as possible, in order to be sure that the selected point is really the most informative one you can select. This would avoid the selection of clustered data points, but this comes at the cost of having to run the procedure for more iterations in order to get the same size for the final data set. This would however be very expensive if the cost for training the ML estimator(s) is very high. The dependence of the accuracy on these two variables is shown in Figure 8, in which the accuracy gained in the last step of the active learning procedure is shown for different configurations of these two parameters. The script to generate this figure can be found on GitHub .

Figure 8: The dependence of the accuracy in the last iteration of the active learning procedure on the number of candidates and the number of points selected for querying to the oracle in each iteration. The last iteration was defined as the last iteration before 100,000 data points were selected, meaning that a setup with size_select equal to 500 had more iterations than a setup with a size_select of 7500 for example.

Appendix B Experiment configuration

All networks were trained using Keras keras () with a Tensorflow tensorflow () backend linked to CUDA cuda (). For the Random Forest implementation scikit-learn scikit-learn () was used.

Increase resolution of exclusion boundary

The configuration of the active learning procedure can be found in Table 1. The experiments are denoted by the section in this paper in which they were covered.

III.A.1 III.A.2 III.A.3
Initial dataset 10,000
Step size 2,500
#candidates remaining pool 100,000
Maximum size until pool empty 100,000
Committee size 100 25
#iterations 7
#test points 1,000,000
Table 1: Configuration for the active learning procedures in Section III.1.
Random Forest with a finite pool

The trained Random Forest classifier followed the defaults of scikit-learn scikit-learn (): it consisted out of 10 decision trees with gini impurity as splitting criterion.

Random Forest with an infinite pool
Layer type Config. Output shape Param. #
Input (None,19) 0
Dense 500 nodes (None, 500) 10,000
Activation selu (None, 500) 0
Dense 100 nodes (None, 100) 50,100
Activation selu (None, 100) 0
Dense 100 nodes (None, 100) 10,100
Activation selu (None, 100) 0
Dense 50 nodes (None, 50) 5,050
Activation selu (None, 50) 0
Dense 2 nodes (None, 2) 102
Activation softmax (None, 2) 0
Total params: 75,352
Table 2: Network architecture for the oracle in the “Random Forest with an infinite pool” and the “QBDC with an infinite pool” experiments.

For active learning we trained a Random Forest randomforest () classifier that consisted out of 100 decision trees with gini impurity as splitting criterion. All other settings were left at their default values.

As the oracle we used a neural network with the architecture in Table 2. This network was optimised using Adam ADAM () on the binary cross entropy loss. The network was trained using the ATLAS pMSSM-19 dataset Aad:2015baa () for 300 epochs with the EarlyStopping EarlyStopping () callback using a patience of 50.

QBDC with an infinite pool
Layer type Config. Output shape Param. #
Input (None,19) 0
Dense 500 nodes (None, 500) 10,000
Activation relu (None, 500) 0
Dropout 0.2 (None, 500) 0
Dense 100 nodes (None, 100) 50,100
Activation relu (None, 100) 0
Dropout 0.2 (None, 100) 0
Dense 100 nodes (None, 100) 10,100
Activation relu (None, 100) 0
Dropout 0.2 (None, 100) 0
Dense 50 nodes (None, 50) 5,050
Activation relu (None, 50) 0
Dropout 0.2 (None, 50) 0
Dense 2 nodes (None, 2) 102
Activation softmax (None, 2) 0
Total params: 75,352
Table 3: Network architecture for the “QBDC with an infinite pool” experiment.

The network architecture for the trained neural network used for active learning can be found in Table 3. The active learning network was optimized using Adam ADAM () on a binary cross-entropy loss. It was fitted on the data in 1000 epochs, a batch size of 1000 and the EarlyStopping EarlyStopping () callback using a patience of 20. The neural network from the infinite pool experiment described above is also used in this experiment.

Identifying uncertain regions and steering new searches

The network architecture for the trained neural network can be found in Table 3. The network was optimized using Adam ADAM () on a binary cross entropy loss. It was fitted on the data in 1000 epochs, a batch size of 1000 and with the EarlyStopping EarlyStopping () callback using a patience of 50..

The network was trained on the z-score normalised ATLAS dataset Aad:2015baa () of 310,324 data points, of which 10 % was used for validation.

References

  • (1) Georges Aad et al. Summary of the ATLAS experiment’s sensitivity to supersymmetry after LHC Run 1 — interpreted in the phenomenological MSSM. JHEP, 10:134, 2015.
  • (2) Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  • (3) B. C. Allanach. SOFTSUSY: a program for calculating supersymmetric spectra. Comput. Phys. Commun., 143:305–331, 2002.
  • (4) Alan Barr and Jesse Liu. First interpretation of 13 TeV supersymmetry searches in the pMSSM. 2016.
  • (5) Leo Breiman. Random forests. Machine Learning, 45(1):5–32, Oct 2001.
  • (6) Breiman, L. Bagging Predictors. 1996.
  • (7) Sascha Caron, Jong Soo Kim, Krzysztof Rolbiecki, Roberto Ruiz de Austri, and Bob Stienen. The BSM-AI project: SUSY-AI–generalizing LHC limits on supersymmetry with machine learning. Eur. Phys. J., C77(4):257, 2017.
  • (8) François Chollet et al. Keras. https://keras.io, 2015.
  • (9) David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, May 1994.
  • (10) K. Cranmer, L. Heinrich, and G. Louppe. ”levelset estimation with bayesian optimisation”. https://indico.cern.ch/event/702612/contributions/2958660/. Accessed: 2019-02-05.
  • (11) Melanie Ducoffe and Frederic Precioso. QBDC: Query by dropout committee for training deep supervised architecture. arXiv e-prints, page arXiv:1511.06412, Nov 2015.
  • (12) Y. Gal and Z. Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ArXiv e-prints, June 2015.
  • (13) Jacob R. Gardner, Geoff Pleiss, David Bindel, Kilian Q. Weinberger, and Andrew Gordon Wilson. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. arXiv e-prints, page arXiv:1809.11165, September 2018.
  • (14) D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. ArXiv e-prints, December 2014.
  • (15) Stephen P. Martin. A Supersymmetry primer. pages 1–98, 1997. [Adv. Ser. Direct. High Energy Phys.18,1(1998)].
  • (16) N. Morgan and H. Bourlard. Advances in neural information processing systems 2. chapter Generalization and Parameter Estimation in Feedforward Nets: Some Experiments, pages 630–637. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990.
  • (17) John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scalable parallel programming with cuda. Queue, 6(2):40–53, March 2008.
  • (18) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  • (19) Remus Pop and Patric Fulop. Deep Ensemble Bayesian Active Learning : Addressing the Mode Collapse issue in Monte Carlo dropout via Ensembles. arXiv e-prints, page arXiv:1811.03897, Nov 2018.
  • (20) Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.
  • (21) Andrew I. Schein and Lyle H. Ungar. Active learning for logistic regression: an evaluation. Machine Learning, 68(3):235–265, Oct 2007.
  • (22) Burr Settles. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1):1–114, 2012.
  • (23) H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pages 287–294, New York, NY, USA, 1992. ACM.
  • (24) Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 06 2014.
  • (25) The ATLAS collaboration. Summary of the atlas experiment’s sensitivity to supersymmetry after lhc run 1 — interpreted in the phenomenological mssm. Journal of High Energy Physics, 2015(10):134, Oct 2015.
  • (26) Evgenii Tsymbalov, Maxim Panov, and Alexander Shapeev. Dropout-based active learning for regression. In Wil M. P. van der Aalst, Vladimir Batagelj, Goran Glavaš, Dmitry I. Ignatov, Michael Khachay, Sergei O. Kuznetsov, Olessia Koltsova, Irina A. Lomazova, Natalia Loukachevitch, Amedeo Napoli, Alexander Panchenko, Panos M. Pardalos, Marcello Pelillo, and Andrey V. Savchenko, editors, Analysis of Images, Social Networks and Texts, pages 247–258, Cham, 2018. Springer International Publishing.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
366159
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description