Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits
Approximate Bayesian computation is an established and popular method for likelihood-free inference with applications in many disciplines. The effectiveness of the method depends critically on the availability of well performing summary statistics. Summary statistic selection relies heavily on domain knowledge and carefully engineered features, and can be a laborious time consuming process. Since the method is sensitive to data dimensionality, the process of selecting summary statistics must balance the need to include informative statistics and the dimensionality of the feature vector. This paper proposes to treat the problem of dynamically selecting an appropriate summary statistic from a given pool of candidate summary statistics as a multi-armed bandit problem. This allows approximate Bayesian computation rejection sampling to dynamically focus on a distribution over well performing summary statistics as opposed to a fixed set of statistics. The proposed method is unique in that it does not require any pre-processing and is scalable to a large number of candidate statistics. This enables efficient use of a large library of possible time series summary statistics without prior feature engineering. The proposed approach is compared to state-of-the-art methods for summary statistics selection using a challenging test problem from the systems biology literature.
Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits
Prashant Singh Division of Scientific Computing Department of Information Technology Uppsala University SE-751 05, Uppsala Sweden email@example.com Andreas Hellander Division of Scientific Computing Department of Information Technology Uppsala University SE-751 05, Uppsala Sweden firstname.lastname@example.org
noticebox[b]Preprint. Work in progress.\end@float
The use of modeling and simulation techniques to supplement real world experiments and observations often involves fitting the parameters of a stochastic simulator or model to observed data. Once the simulator has been tuned to agree with observed data, it can be used to study the corresponding natural process and to generate hypotheses. This parameter inference problem is routinely encountered in various fields of science and engineering [9, 3, 12].
Likelihood-based inference  is an intuitive approach that involves deriving a likelihood function that describes the probability of the data, given the parameters. However, due to the complexity of simulators and models involved, it is often not possible to derive an analytical form of the likelihood function. For stochastic models, such as models of gene regulation in systems biology , the likelihood function can be formulated but the cost of computing it is prohibitive. In such situations, approximate Bayesian computation (ABC)  is a popular methodology for likelihood-free approximate inference. ABC-based approaches solve the inference problem by computing the deviation between simulated data and the observed data, and accepting or rejecting candidate parameters based on the deviation as measured by a chosen distance function. In the simplest form, this process is repeated until samples have been accepted, where is a user-defined constant. The inferred parameters are then reported as the mean of the parameters corresponding to accepted samples.
The probability of generating samples that are accepted diminishes with increase in data dimensionality . A common way of alleviating this problem is by making use of summary statistics that efficiently capture the important information and patterns within the simulated result (e.g., a lengthy time series) as a low-dimensional representation (e.g., temporal mean). The distance function then operates upon the summary statistic(s) instead of the complete simulation results. Popular summary statistics include simple statistics such as mean and higher-order moments, median, mode and frequency, but can also be more complex and be based on problem specific knowledge.
The choice of a summary statistic, given a black-box parameter inference problem is not straightforward. While a large pool of automatically extracted statistics associated with time series analysis  increases the chance that informative statistics are included, the variance of the estimator increases rapidly with the number of used statistics, degrading the performance of ABC. However, using a small number of chosen statistics increases the risk of poor performance for the specific problem. In absence of problem-specific information, a concrete and principled approach is needed to automatically select the most applicable summary statistic for parameter inference from a larger pool (tens to hundreds) of candidate statistics. This paper formulates the problem of dynamically finding the most informative summary statistic in each iteration of the parameter inference process as a multi-armed bandit (MAB) problem , wherein selection of the appropriate statistic corresponds to pulling the optimal arm in a bandit setting in each iteration of the algorithm, thereby maximizing the reward or minimizing the regret (minimizing the cumulative distance function value corresponding to the sequence of arm pulls). To the best of the authors’ knowledge, this is the first approach exploring summary statistic selection in a MAB context. In contrast to many popular methods for summary statistics selection , the proposed method is computationally inexpensive and does not require any a priori pre-processing.
The paper is organized as follows. Section 2 introduces ABC, discusses the significance of summary statistics and surveys related work. Section 3 concretely defines the problem and describes the novel MAB-inspired approach towards parameter inference. Section 4 demonstrates the efficiency and robustness of the proposed method by inferring parameters of a complex stochastic biochemical reaction network from systems biology. Section 5 presents a detailed analysis of the proposed method and identifies future research directions. Section 6 concludes the paper.
2 Approximate Bayesian computation and summary statistic selection
Let be a function representing the simulation-based model that maps model parameters and certain random variables to responses . The random variables are part of the simulator and represent the inherent stochasticity of the process, while represents the control parameters of the process. For a fixed set of parameters , the responses fluctuate randomly over multiple simulations owing to the stochasticity represented by .
Due to the presence of , it is implicitly possible to define a random variable corresponding to the simulation model [16, 28]. For observed data , and a given parameter combination , the probability of assuming a value in a -neighborhood around equals the probability of sampling values of lying in the neighborhood . Let be the likelihood corresponding to parameters , then,
As the size of the neighborhood approaches 0, where is a constant of proportionality that depends on . In case of the random variable being discrete, .
Let be the prior distribution of parameters . The posterior distribution can be estimated by accepting samples with probability proportional to . The accepted samples correspond to simulated output being equal to a sample from the fixed dataset.
However, for complex problems encountered in the real world, is very often exceedingly small. Therefore, the acceptance condition is relaxed to accept samples within a distance , as computed according a chosen distance function as,
This forms the basis on which ABC techniques estimate the likelihood of the fixed dataset, given the parameter values. The popular ABC rejection sampling method [34, 28] involves iteratively sampling and simulating candidate parameter combinations , with being accepted according to Eq. 2. The accepted samples form the set .
The distance function is a crucial part of the approximation, and consequently, of the parameter inference process. The fixed data and simulated responses are reduced to one or more summary statistics or high-level features, that capture certain types of behavior present within data. Popular summary statistics include mean, variance, entropy, autocorrelation, etc. Equation 2 is modified to incorporate the use of summary statistic as
The use of summary statistics is also a part of the approximation, in addition to the relaxed distance computation in Eq. 2. By reducing the complete datasets to a handful of features, there is a risk of losing important information relevant to effective parameter inference . However, using too many summary statistics may lead to large approximation errors due to the ‘curse of dimensionality’ [7, 32]. Indeed, Barber et al.  studied the mean squared error (MSE) of a Monte Carlo estimate obtained by the rejection sampling algorithm. They showed that with optimal hyperparameter tuning of the ABC rejection sampler and under the considered regularity conditions, MSE was with being the number of summary statistics used, and being the number of simulated datasets. The result holds for large and close to 0. As a consequence, selection of appropriate summary statistics is an extremely crucial task for high-quality parameter inference [20, 29, 6, 33]. Prangle  divides methods into the categories of subset selection, projection, and auxiliary likelihood. The following subsections briefly review each of these categories.
2.1 Subset selection
In subset selection methods, a subset of statistics is selected, that typically optimizes some criterion on training data. The training data is simulated beforehand. The method of approximate sufficiency  adds/removes a candidate summary statistics to/from a subset, and measures the resulting effect on the ABC posterior. This requires approximating the posterior using ABC. The authors note that implementation of the method is not obvious in higher dimensional parameter spaces . Nunes and Balding  propose a method to find a that minimizes the entropy of the resulting ABC posterior. Entropy is used as a measure of informativeness of the posterior, with lower entropy being more informative. Blum et al.  argue that lower values of entropy may not always correspond to more accurate inference. Barnes et al.  define sufficient statistics as summary statistics that maximize mutual information between and . They derive an expression for necessary conditions for sufficiency of as the Kullbeck-Leibler (KL) divergence of from being [6, 32]. This provides a test for adding a candidate statistic to an existing subset of statistics to obtain . is more informative than if the estimated KL divergence of from is above a specified threshold. Setting the threshold, and finding a sufficient subset requires multiple ABC runs [6, 32].
All approaches mentioned above require multiple ABC runs as pre-processing in order to obtain an informative subset of summary statistics from a larger candidate statistics pool. Regularization approaches [10, 36, 11] do not require ABC runs, but require a training set. A linear regression estimator is trained that maps covariates to responses . Variable selection is then performed to find an informative subset. In summary, subset selection methods are interpretable, but can be extremely computationally expensive . For a deeper discussion, the reader is referred to Prangle .
Projection methods start with a set of candidate summary statistics , and aim to find an informative lower dimensional projection of , e.g., a linear transformation. This requires a training set of simulated values, and application of dimensionality reduction techniques such as partial least squares , linear regression  and boosting . Partial least squares is well-known, but lacks theoretical support for use in ABC and is reported to perform poorly . Linear regression and boosting have been shown to perform well . Projection methods are computationally cheaper than subset selection methods as they avoid repeated subset calculations. They also explore a larger space of summaries. However, as new statistics lie in a space different from original summary statistics, projection methods are less interpretable than subset selection methods.
2.3 Auxiliary likelihoods
Auxiliary likelihood-based methods use performance of summary statistics on simpler, similar problems as an indicator of which statistics are likely to perform well for a complex problem at hand. Essentially, a simpler and tractable likelihood is specified, and used to derive summary statistics [34, 21]. Auxiliary likelihood-based methods rephrase the search for informative summary statistics, as a search for informative auxiliary likelihoods. Unlike subset selection and projection methods, the need for multiple ABC runs as pre-processing or training data can be eliminated if domain knowledge of similar, simpler problems is available. However, in absence of such knowledge, as is often the case when using black-box simulators, training data will be needed to construct auxiliary likelihoods. A deeper discussion can be found in Prangle .
3 Dynamic summary statistics selection using multi-armed bandits
Let represent the parameters of the simulator, and let be a pool of candidate summary statistics. Consider a given parameter inference problem to be solved using the ABC rejection sampling algorithm. The problem is described by the simulator , the observed data set , the prior , the distance function , the desired number of accepted samples , the acceptance threshold and a pool of candidate summary statistics . Algorithm 1 outlines the proposed dynamic approach. The difference from standard ABC rejection sampling is the use of a method, , that in each iteration identifies the summary statistic to use from the global pool , with the aim of minimizing the number of simulations needed to attain accepted samples forming . Here, may make use of past values of the distances calculated for each summary statistic.
It should be noted that dynamic selection (and the resulting variability) of the selected statistic between rejection sampling iterations has consequences. The traditional rejection sampler identifies a distribution over the static set of supplied summary statistics, while Algorithm 1 identifies a distribution over well performing summary statistics. The traditional rejection sampler does not incorporate summary statistic selection, while Algorithm 1 performs on-the-fly selection. Towards this end, the distances corresponding to all candidate summary statistics are normalized in .
The method is constructed here by solving a multi-armed bandit (MAB) problem. The MAB problem deals with the trade-off that an agent faces between exploring the given environment to obtain new information, and exploiting existing knowledge to select the future course of action. The problem was introduced by Robbins , and has extensively been used since in a variety of applications , such as clinical trials, engineering design, recommender systems, etc.
A MAB problem derives its name from the problem setting of a gambler playing a slot machine in a casino. The slot machine consists of multiple arms, and the gambler aims to maximize the amount of money he collects in successive arm pulls of the machine. In the stochastic formulation , the problem consists of probability distributions corresponding to arms, with associated means and variances . Each instance of an arm pull results in receiving a reward , where is the index of the pulled arm. The probability distributions are initially unknown, and the goal is to infer the distribution with the highest expected value, while also maximizing the rewards .
Let be the user-supplied set of summary statistics. Each summary statistic corresponds to an arm in the multi-armed bandit setting, with probability distribution and corresponding mean . Selection of, or pulling the arm is associated with reward sampled from . We assume , where is a chosen distance function that calculates the variation between a simulated value, and a value from the fixed dataset in terms of summary statistic . The distances corresponding to each summary statistic are normalized to lie in . Let be the number of simulations used to accumulate accepted samples , and therefore, the number of arm pulls. For a fixed value of , the arms must be pulled in a sequence that maximizes the accumulated reward at the end of pulls. However, considering the ABC rejection sampling algorithm, the goal is to achieve accepted samples using as few simulations as possible. This also corresponds to selecting an arm such that the distance between simulated and observed data is minimized, thereby maximizing the reward. Let be the reward for pulling the arm at the iteration. In order to maximize the accumulated reward, the quantity must be maximized, where . Considering stochasticity, the expected total reward is maximized,
where for , is the sequence of selected summary statistics or arm pulls. Maximizing the cumulative mean estimated reward corresponds to minimizing cumulative distance between and . The intuition is that this will lead to quick convergence of the ABC rejection sampler (in terms of number of simulations used), and will also improve the quality of parameter inference as each iteration will consist of selection of the most appropriate summary statistic for inference (corresponding to the least estimated distance).
Several strategies exist for solution of MAB problems. Popular strategies include -greedy  and its variants (-first and -decreasing), upper confidence bound (UCB) , and Thomson sampling (TS) . For a detailed review of algorithms for solution of MABs, the reader is referred to [25, 43]. The -first strategy  has been used for the purpose of experiments in this paper. It should be noted that the framework proposed herein is independent of any particular strategy, and in principle any algorithm that solves a MAB can be used as the method in Algorithm 1. The following text discusses the -first strategy, and motivates its choice.
3.1 The -first strategy
The -greedy family of strategies is the most widely used and the simplest class of MAB algorithms. The -first strategy is a variant of the -greedy strategy that starts with a pure exploration phase where a random arm is chosen for the first fraction of pulls. Thereafter, a pure exploitation phase ensues where the arm corresponding to the highest mean estimated reward is pulled. The value of is user-defined. It has been shown within a PAC framework that a total of random arm pulls are sufficient to find an -optimal arm with probability at least () [39, 19]. Empirical comparisons between various MAB strategies suggest that simpler algorithms like -greedy outperform more sophisticated, theoretically sound approaches [25, 39] on a majority of problems.
The intuition behind using the -first strategy as a MAB solution lies in the fact that in the initial iterations of rejection sampling, the behavior of the various summary statistics is unknown. Therefore, it makes sense to perform pure exploration in order to gauge the efficacy and behavior of different statistics. As sampling proceeds, and the interplay of the distance function and various summary statistics is observed, and a more pragmatic approach can be followed in the form of exploitation. The formulation of exploration and exploitation in terms of the summary statistic selection problem is described below.
3.2 Exploration and exploitation
The classical formulation of exploration as per the -first strategy is to select a summary statistic at random for the first iterations of the ABC rejection sampler, where is the total number of simulations or arm pulls allowed. The ABC rejection sampling problem setting typically does not include , but instead includes the desired number of accepted samples . It follows from the definition of and , that for successful parameter inference, . The fraction of exploration and exploitation iterations can also be defined in terms of , i.e., explore for iterations of rejection sampling, and exploit until samples have been accepted.
Exploitation typically corresponds to selecting an arm with the highest mean estimated reward. The rewards are defined in terms of distances between the simulated and observed datasets calculated using the distance function . Let be a matrix storing the rewards calculated for each arm (summary statistic) pulled in all past iterations . Let the vector denote the column-wise mean of . Then can be used to represent the mean estimated reward of each arm in the current exploitation iteration, based on previous iterations. The arm with the highest estimated reward, , is then chosen as the appropriate summary statistic for the current exploitation iteration. The statistic is then evaluated to compute distances, and rejection sampling proceeds according to Algorithm 1. The matrix is updated to include the reward corresponding to .
The scalability of the proposed methodology is demonstrated on a challenging inference problem from molecular systems biology. We also compare and contrast it to two popular subset selection methods, namely approximate sufficiency (AS) and minimizing the entropy (ME).
Discrete stochastic models of reaction networks based on the continuous-time Markov process formalism are frequently used to study gene regulatory networks . These models are simulated with the stochastic simulation algorithm (SSA) due to Gillespie , and the output of such simulations are statistically exact time series samples from the underlying Markov process model. Inference of the parameters of the models is highly challenging, since the stochastic nature of the simulator makes it computationally intractable to compute likelihoods for non-trivial models. As a consequence, ABC has gained popularity in this area [27, 38, 26].
A complex biochemical reaction network with oscillatory behavior  is used herein as a test problem. This network model involves chemical species undergoing chemical reactions, parameterized by reaction rate constants. The details of the chemical network such as its species and reaction definitions can be found in . The model is implemented in StochSS  and evaluated in Python using GillesPy . All experiments involve the observed data consisting of trajectories generated using GillesPy, with each trajectory spanning time steps. The 15-dimensional search space of parameters during inference runs is represented by the following ranges, where parameter names are consistent with the notation in ,
The parameter values for the observed data were fixed as , i.e., the approximate center of the intervals (4). These parameter values give rise to reliable, but noisy oscillations . Candidate pools of summary statistics are generated using the TSFRESH  time series feature extraction framework. TSFRESH supports extraction of more than features, or summary statistics. The abctools  R library is used for evaluating AS and ME subset selection methods.
4.1 Accuracy and scalability
The inference problem detailed in the previous section is solved for summary statistic pools of sizes varying from to using the proposed method, as well as with traditional ABC with subset selection based on approximate sufficiency (AS) and minimizing the estimated entropy (ME)  as a pre-processing step. The summary static pools were created by randomly selecting statistics to be generated by TSFRESH  out of a possible total of 700. The random selection tests the flexibility of the approach, and increasing subset sizes test the scalability of the proposed methodology. The ABC acceptance threshold was set to accept samples with distances corresponding to top of normalized distance values, i.e., , where . The desired number of accepted samples were set to in order to get a reliable estimate of the mean of inferred parameters. The total number of allowed simulations is set to , and in the MAB algorithm is set to to achieve equal balance between exploration and exploitation.
Table 1 shows the mean absolute error (MAE) in inferred parameters for varying size of summary statistics candidates pool (). As can be seen, all summary statistic selection methods achieve similar inference quality across pool sizes. However, both AS and ME failed to select a subset for due to out-of-memory exceptions. The MAB approach, however, outperforms AS and ME in terms of efficiency and scalability.
|Total time taken for parameter inference:|
|Time spent purely on summary statistic selection:|
Table 2 shows the total time taken to solve the inference problem, as well as the time purely spent on summary statistics selection. As can be seen, the MAB approach outperforms both subset selection methods across the board. The AS approach is more efficient than ME, but the cost of both methods grow very rapidly as becomes large. In contrast, the MAB approach successfully handles candidate pools as large as .
In order to validate the correctness of results obtained using the MAB-based summary statistic selection method, deeper analysis was performed on one of the sample runs, namely the case of . It was observed that three summary statistics were ranked highly during the exploitation phase. Two of them were based on mass quantiles that calculate the proportion of mass concentration to the left of a certain part of the time series. The third summary statistic was based on Fourier coefficients of the one-dimensional discrete Fourier transform using the Fast Fourier Transform (FFT) algorithm.
A separate ABC rejection sampling run was performed using only the highest ranked summary statistic. The run achieved the desired accepted samples after simulations, with MAE being . Using the top-ranked statistic should intuitively yield faster convergence and lower MAE than the MAB-based approach, since exploration is avoided. The results confirmed this intuition. This also highlights the fact that while the MAB approach works as a black-box inference method with no need for pre-processing, it can also be useful for feature engineering for subsequent, optimized runs.
Further, the top three summary statistics were used together in a separate run of ABC rejection sampling. The summary statistics were combined using the Euclidean norm of . The run fulfilled acceptance criteria within only simulations, with MAE being . The fast fulfillment points towards the informativeness of the top three features identified by the MAB-based approach. Contrarily, using three and ten randomly selected summary statistics for the purpose of comparison did not result in desired acceptance within the relatively large allowed budget of simulations. This points towards the importance of well chosen summary statistics towards effective parameter inference, especially for complex and high-dimensional inference problems.
5 Discussion and future work
The proposed approach was designed with the goal to meet the needs of highly challenging inference problems where hand-curated optimized summary statistics are not available a priori, but where large pools of possible statistics are readily available. As explained in Sec. 2, existing methods either require extensive pre-processing (subset selection methods), require problem specific knowledge or training data (auxiliary likelihood methods), or lose the interpretability of the statistics (projection methods). There have also been recent efforts to develop non-parametric ABC algorithms that do away with the need to manually select summary statistics . However, summary statistics ease analysis and understanding of complex problems, such as the test problem in this work. The MAB-based dynamic summary statistic selection method is designed keeping in view the considerations discussed above. Initial experiments in this paper have shown the proposed method to be highly promising. The proposed method was able to infer a relatively large number of parameters () in a complex non-linear model of a gene regulatory network, using a very small number of simulations. In contrast to the existing subset selection methods that were tested, the MAB approach scaled to pools with hundreds of statistics, although all methods managed to achieve high-quality inference for smaller pool sizes. The proposed method can also be used in more sophisticated ABC formulations like ABC-sequential Monte Carlo (ABC-SMC) .
A distinct advantage of the method is that it is completely black-box in nature and does not require any pre-processing. This makes it possible to use in a wide variety of applications with minimal modifications compared to standard ABC (selecting an appropriate value of ). It also makes large-scale automated summary statistic analysis possible, wherein the user does not have to hand-curate features to be tested. One can simply input hundreds of summary statistics, and only analyze in detail the ones used by the MAB solution during exploitation, for instance. As demonstrated here, this capability is very useful when high-throughput libraries for summary statistics, like TSFRESH, are available. The MAB-based method also maintains the interpretability of summary statistics. It merely acts as a filter to rapidly find the most ideal summary statistic for the inference problem. Moreover, the approach is computationally efficient and does not add substantial computational burden over and above rejection sampling.
Future work includes comparisons with highly-tuned ABC setups and methods in Sec. 2 that make use of domain-specific knowledge. Different formulations of exploration and exploitation will also be evaluated. Although the -first strategy is used in this work, the proposed framework is independent of a particular class of MAB solutions, and in future other strategies will also be explored.
A current characteristic of the proposed method is the fact that only one summary statistic is chosen from the candidates pool in each iteration. Although, during a complete run of Algorithm 1, a distribution over multiple statistics is considered. It may be the case that a handful of summary statistics taken together offer more information to the distance function than the single best summary statistic in each iteration. Future work includes exploring this question in detail, both theoretically and empirically.
A novel methodology for performing multi-statistic parameter inference using the approximate Bayesian computation rejection sampling algorithm was presented in this paper. The problem of dynamically selecting the most appropriate summary statistic in each iteration of rejection sampling was formulated as a multi-armed bandit problem. The method does not require any prior problem-specific knowledge or pre-processing, and is highly scalable and computationally efficient. The efficacy of the proposed method was demonstrated by inferring parameters of a large scale stochastic biochemical reaction network. The proposed approach was shown to efficiently handle appropriate summary statistic selection from a pool of hundreds of candidate statistics.
- Abel et al.  John H Abel, Brian Drawert, Andreas Hellander, and Linda R Petzold. Gillespy: A python package for stochastic model building and simulation. IEEE life sciences letters, 2(3):35–38, 2016.
- Aeschbacher et al.  Simon Aeschbacher, Mark A Beaumont, and Andreas Futschik. A novel approach for choosing summary statistics in approximate bayesian computation. Genetics, 192(3):1027–1047, 2012.
- Ashyraliyev et al.  Maksat Ashyraliyev, Yves Fomekong-Nanfack, Jaap A Kaandorp, and Joke G Blom. Systems biology: parameter estimation for biochemical models. The FEBS journal, 276(4):886–902, 2009.
- Auer et al.  Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
- Barber et al.  Stuart Barber, Jochen Voss, Mark Webster, et al. The rate of convergence for approximate bayesian computation. Electronic Journal of Statistics, 9(1):80–105, 2015.
- Barnes et al.  Chris P Barnes, Sarah Filippi, Michael PH Stumpf, and Thomas Thorne. Considerate approaches to constructing summary statistics for abc model selection. Statistics and Computing, 22(6):1181–1197, 2012.
- Beaumont et al.  Mark A Beaumont, Wenyang Zhang, and David J Balding. Approximate bayesian computation in population genetics. Genetics, 162(4):2025–2035, 2002.
- Beaumont et al.  Mark A Beaumont, Jean-Marie Cornuet, Jean-Michel Marin, and Christian P Robert. Adaptive approximate bayesian computation. Biometrika, 96(4):983–990, 2009.
- Beck and Arnold  James Vere Beck and Kenneth J Arnold. Parameter estimation in engineering and science. James Beck, 1977.
- Blum  Michael GB Blum. Choosing the summary statistics and the acceptance rate in approximate bayesian computation. In Proceedings of COMPSTAT’2010, pages 47–56. Springer, 2010.
- Blum et al.  Michael GB Blum, Maria Antonieta Nunes, Dennis Prangle, Scott A Sisson, et al. A comparative review of dimension reduction methods in approximate bayesian computation. Statistical Science, 28(2):189–208, 2013.
- Cash  Webster Cash. Parameter estimation in astronomy through application of the likelihood ratio. The Astrophysical Journal, 228:939–947, 1979.
- Chapelle and Li  Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pages 2249–2257, 2011.
- Christ et al.  Maximilian Christ, Andreas W Kempa-Liehr, and Michael Feindt. Distributed and parallel time series feature extraction for industrial big data applications. arXiv preprint arXiv:1610.07717, 2016.
- Csilléry et al.  Katalin Csilléry, Michael GB Blum, Oscar E Gaggiotti, and Olivier François. Approximate bayesian computation (abc) in practice. Trends in ecology & evolution, 25(7):410–418, 2010.
- Diggle and Gratton  Peter J Diggle and Richard J Gratton. Monte carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society. Series B (Methodological), pages 193–227, 1984.
- Drawert et al.  Brian Drawert, Andreas Hellander, Ben Bales, Debjani Banerjee, Giovanni Bellesia, Bernie J Daigle Jr, Geoffrey Douglas, Mengyuan Gu, Anand Gupta, Stefan Hellander, et al. Stochastic simulation service: bridging the gap between the computational expert and the biologist. PLoS computational biology, 12(12):e1005220, 2016.
- Elowitz et al.  Michael B Elowitz, Arnold J Levine, Eric D Siggia, and Peter S Swain. Stochastic gene expression in a single cell. Science, 297(5584):1183–1186, August 2002.
- Even-Dar et al.  Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Pac bounds for multi-armed bandit and markov decision processes. In International Conference on Computational Learning Theory, pages 255–270. Springer, 2002.
- Fearnhead and Prangle  Paul Fearnhead and Dennis Prangle. Constructing summary statistics for approximate bayesian computation: semi-automatic approximate bayesian computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3):419–474, 2012.
- Fu and Li  Yun-Xin Fu and Wen-Hsiung Li. Estimating the age of the common ancestor of a sample of dna sequences. Molecular biology and evolution, 14(2):195–199, 1997.
- Fulcher et al.  Ben D Fulcher, Max A Little, and Nick S Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. J. R. Soc. Interface, 10(83):20130048, 2013.
- Gillespie  Daniel T Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reacting systems. J. Comput. Phys., 22:403–434, 1976.
- Joyce and Marjoram  Paul Joyce and Paul Marjoram. Approximately sufficient statistics and bayesian computation. Statistical applications in genetics and molecular biology, 7(1), 2008.
- Kuleshov and Precup  Volodymyr Kuleshov and Doina Precup. Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028, 2014.
- Lenive et al.  Oleg Lenive, Paul D W Kirk, and Michael P H Stumpf. Inferring extrinsic noise from single-cell gene expression data using approximate bayesian computation. BMC Syst. Biol., 10(1):81, August 2016.
- Lillacci and Khammash  Gabriele Lillacci and Mustafa Khammash. The signal within the noise: efficient inference of stochastic gene regulation models using fluorescence histograms and stochastic simulations. Bioinformatics, 29(18):2311–2319, September 2013.
- Lintusaari et al.  Jarno Lintusaari, Michael U Gutmann, Ritabrata Dutta, Samuel Kaski, and Jukka Corander. Fundamentals and recent developments in approximate bayesian computation. Systematic biology, 66(1):e66–e82, 2017.
- Nunes and Balding  Matthew A Nunes and David J Balding. On optimal selection of summary statistics for approximate bayesian computation. Statistical applications in genetics and molecular biology, 9(1), 2010.
- Nunes and Prangle  Matthew Alan Nunes and Dennis Prangle. abctools: an r package for tuning approximate bayesian computation analyses. The R Journal, 7(2):189–205, 2015.
- Park et al.  Mijung Park, Wittawat Jitkrittum, and Dino Sejdinovic. K2-abc: Approximate bayesian computation with kernel embeddings. In Artificial Intelligence and Statistics, pages 398–407, 2016.
- Prangle  Dennis Prangle. Summary statistics in approximate bayesian computation. arXiv preprint arXiv:1512.05633, 2015.
- Prangle et al.  Dennis Prangle, Paul Fearnhead, Murray P Cox, Patrick J Biggs, and Nigel P French. Semi-automatic selection of summary statistics for abc model choice. Statistical applications in genetics and molecular biology, 13(1):67–82, 2014.
- Pritchard et al.  Jonathan K Pritchard, Mark T Seielstad, Anna Perez-Lezaun, and Marcus W Feldman. Population growth of human y chromosomes: a study of y chromosome microsatellites. Molecular biology and evolution, 16(12):1791–1798, 1999.
- Robbins  Herbert Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer, 1985.
- Sedki and Pudlo  MA Sedki and P Pudlo. Contribution to the discussion of fearnhead and prangle (2012). constructing summary statistics for approximate bayesian computation: Semi-automatic approximate bayesian computation. Journal of the Royal Statistical Society: Series B, 74:466–467, 2012.
- Severini  Thomas A Severini. Likelihood methods in statistics. Oxford University Press, 2000.
- Sunnåker et al.  Mikael Sunnåker, Alberto Giovanni Busetto, Elina Numminen, Jukka Corander, Matthieu Foll, and Christophe Dessimoz. Approximate bayesian computation. PLoS computational biology, 9(1):e1002803, 2013.
- Vermorel and Mohri  Joannes Vermorel and Mehryar Mohri. Multi-armed bandit algorithms and empirical evaluation. In European conference on machine learning, pages 437–448. Springer, 2005.
- Vilar et al.  José MG Vilar, Hao Yuan Kueh, Naama Barkai, and Stanislas Leibler. Mechanisms of noise-resistance in genetic oscillators. Proceedings of the National Academy of Sciences, 99(9):5988–5992, 2002.
- Watkins  Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. PhD thesis, King’s College, Cambridge, 1989.
- Wegmann et al.  Daniel Wegmann, Christoph Leuenberger, and Laurent Excoffier. Efficient approximate bayesian computation coupled with markov chain monte carlo without likelihood. Genetics, 182(4):1207–1218, 2009.
- Zhou  Li Zhou. A survey on contextual multi-armed bandits. arXiv preprint arXiv:1508.03326, 2015.