Tree pyramidal adaptive importance sampling
Abstract
This paper introduces TreePyramidal Adaptive Importance Sampling (TPAIS), a novel iterated sampling method that outperforms current stateoftheart approaches like deterministic mixture population Monte Carlo (DMPMC Elvira et al. (2017)), mixture population Monte Carlo (MPMC Cappé et al. (2008)) and layered adaptive importance sampling (LAIS Martino et al. (2017)).
TPAIS iteratively builds a proposal distribution parameterized by a tree pyramid, where each tree leaf spans a convex subspace and represents it’s importance density. After each new sample operation, a set of tree leaves are subdivided improving the approximation of the proposal distribution to the target density. Unlike the rest of the methods in the literature, TPAIS is parameter free and requires zero manual tuning to achieve its best performance.
Our proposed method is evaluated with different complexity randomized target probability density functions and also analyze its application to different dimensions. The results are compared to stateoftheart iterative importance sampling approaches and other baseline MCMC approaches using Normalized Effective Sample Size (NESS), JensenShannon Divergence, and time complexity.
1 Introduction
A central task in the application of probabilistic models is the estimation of latent or unknown variables from observed noisy data. Within the Bayesian framework, this involves combining prior knowledge about the latent variables with the information in the observations to obtain a joint posterior probability distribution. Inference using such models typically involves evaluating queries such as finding values that maximize the posterior (MAP queries), or computing the posterior probability of some variables given evidence on the others (conditional queries). Often, it involves computing expectations of some function of interest with respect to the posterior. In most realworld applications, exact inference is infeasible either because the dimensionality of the latent space is too high or because the posterior distribution has a highly complex form for which expectations are not analytically tractable.
In such situations, we resort to approximate inference. Historically, sampling methods  also known as Monte Carlo (MC) methods  have been the method of choice for such problems. The goal of MC methods is to generate random numbers from a target distribution of interest, . These serve as an approximate representation of and can be used to numerically compute approximations to desired quantities such as expectations. Sampling methods are asymptotically exact in that they can generate exact results given infinite computational resources. Within the rich variety of samplingbased methods, a particularly important class of methods are the Markov Chain Monte Carlo (MCMC) methods. In these, samples are not drawn directly from but instead from a Markov process whose stationary distribution is equal to . These methods have the elegant property that (under reasonable constraints) the statedistribution of the Markov process converges to irrespective of the starting distribution. Further, they scale very well with the dimensionality of the sample space. However, there are often significant challenges in practical implementation. One is that the samples are generated sequentially and hence these methods cannot be easily parallelized unless some underlying structure in either the distribution or the transition mechanism is assumed. More serious are the problems of burnin and mixing. Burnin time refers to the number of steps we must wait before being able collect samples from the chain. This happens because the initial state distribution is arbitrary and we need to wait until it comes close to . Mixing time of a Markov chain is the time until the Markov chain is close to . Long mixing times occur in multimodal distributions where the regions between modes are lowprobability. In such situations, the chain might have explored one region well, but it can take a long time for it to transitions between the modes.
An alternative to MCMC is the class of importance sampling (IS) methods. Historically, IS has been used to approximate expectations of the form , rather than generating samples from . Samples are drawn from an alternate distribution, , known as the proposal distribution from which it is easier to generate samples. Each sample is assigned an importance weight to correct the bias introduced by sampling from the wrong distribution. Unlike MCMC methods, there is no phenomenon of burnin; the sample generation process can be parallelized; and all the generated samples along with their weights are retained. The samples with their weights represent an approximation of the target distribution. The key challenge lies in choosing good proposal distributions. For poor choices of , the set of importance weights may be dominated by a few weights having large values, resulting in an effective sample size much smaller than the apparent sample size. This situation gets exacerbated at higher dimensions and can result in an exponential blowup in the number of samples required resulting in very poor sampling efficiency.
We present here, therefore, the Tree Pyramid Adaptive Importance Sampling (TPAIS) method that retains the desirable properties of Importance sampling (parallelizability, no burnin) while achieving a higher sampling efficiency relative to stateoftheart IS methods. We use a hierarchical datastructure called Tree Pyramids to describe the proposal distribution over the input probability space. Each node represents a convex Kdimensional subspace, with nodes further down the tree representing increasingly finer subspaces. Our algorithm adaptively divides the input space in a manner such that more samples are used to represent regions of higher probability mass, while fewer samples are used for lower mass regions. This enables us to efficiently approximate the target distribution . Unlike other stateoftheart IS methods, TPAIS is completely parameterfree and has anytime property that enables to make tradeoffs between computation time and approximation quality.
The paper is structured as follows: Section 2 provides an overview of sampling methods focusing on importance sampling algorithms related to our contribution and comments on their differences. Section 3 details the proposed sampling algorithm. Section 4 describes the methods used for evaluation and Section 5 shows the results and discussion. The paper concludes with the final remarks and future directions in Section 6.
2 Related work
Sampling methods rely on the law of large numbers to guarantee asymptotic convergence to the exact solution. The rate of convergence depends on the algorithm and the problem at hand, and often it is one of the factors that determines the selection of the sampling method to use for a specific application.
2.1 Direct sampling and Monte Carlo Markov Chain
When direct sampling from the target distribution is possible, a common option is to draw a set of samples and use them to build Monte Carlo approximations for the quantities of interest or a Kernel Density Estimate (KDE) that approximates the posterior. Sometimes, the target joint distribution can be represented as a product of factors. If these factors represent conditional distributions , where is the parent node of , then it is possible to sample sequentially from the conditional distributions using an approach known as Ancestral Sampling Bishop (2006). For more general factors, approaches such as Gibbs sampling can be adopted Geman and Geman (1984). As the complexity of the models increases, direct sampling might become challenging. For these cases, two closely related approaches exist: Importance Sampling (IS) and Approximate Sampling.
The performance of IS methods for high dimensional problems is not good and Monte Carlo Markov Chain (MCMC) or Variational Inference (VI) methods are used instead Geman and Geman (1984); Hastings (1970); Hoffman and Gelman (2014); Duane et al. (1987); Jaakkola (2001). Although in this paper we do not focus on those methods, one of our evaluation baselines is based on the MCMC MetropolisHastings algorithm Hastings (1970), and some adaptive IS methods include MCMC steps in their adaptation steps Martino et al. (2017). Among other problems, MCMC does not approximate the partition function and VI does not deal well with multimodal target distributions.
2.2 MultiAdaptive Importance sampling
As will be discussed in Section 3, the selection of the proposal distribution has a big impact on the algorithm performance. The criteria for proposal distribution selection, leads to different IS methods. A wellknown method is Rejection Sampling and its generalizations Casella et al. (2004). Although it is an exact simulation method, its rejection mechanism requires much more sampling operations to produce a useful amount of samples (lowvariance estimates) than other methods.
MultiIS methods use multiple proposal distributions to generate samples Veach (1997). Moreover, the different mixture components can be weighted and their weights tuned depending on previous samples Owen and Zhou (2000). IS is not naturally iterative, but in practice implementations are. The sequential generation of samples is exploited by adaptive IS methods to adapt the proposal distribution based on previous samples, making the proposal closer to the target, improving sampling efficiency Karamchandani et al. (1989).
Current stateoftheart methods combine the previous ideas into what is know as adaptive multipleIS algorithms. Different sampling strategies, weighting schemes, and most importantly adaptation algorithms, result in several methods with different performance and properties Martino et al. (2017); Cappé et al. (2008); CORNUET et al. (2012); Martino et al. (2015); Cappé et al. (2004b). A recent review that includes the aforementioned methods can be found in Bugallo et. al. Bugallo et al. (2017). In this paper, we propose a new adaptation algorithm and a proposal distribution that is parameterized by a tree pyramid that in combination, improve the performance over previous methods.
2.3 Stratified and quasiMonte Carlo methods
For low dimensional problems, there are deterministic procedures with better convergence rate and accuracy than MC and IS methods, examples are: generalized stratified sampling Wessing (2017), latin hypercube sampling Owen (2013) and other grid based methods Joshi et al. (2016). However, deterministic sampling strategies scale even worse than IS with dimensionality and do not adapt iteratively as new samples are generated. The algorithm proposed in this paper is at the intersection of stratified sampling and MultiAdaptive IS by using a nonuniform multiresolution tree structure to define the strata that parameterize the mixture components of the proposal distribution.
The usage of tree structures has been explored in the sampling literature, especially in the computer graphics application domain. Agarwal et. al. use a hierarchical representation of the image space to create different regular strata, the subdivisions are guided by a custom importance metric Agarwal et al. (2003). Clarberg et. al propose an approach based on wavelet function decomposition that is hierarchically constructed using a KDtree. After its construction, it is used to warp an initial grid of samples towards regions with more light Clarberg et al. (2005). More recently, Conty and Kulla have shown the computational benefits of a method with an adaptive tree strategy sampling that selects the light source used to compute a sample for a pixel value Estevez and Kulla (2018). Canevet et. al. inspired by MonteCarlo Tree Search, proposed an adaptive binary tree to reduce training time in neural networks, the idea focuses on generating samples from the training dataset that are more important, weighting them using statistics from the loss function Canevet et al. (2016). These approaches are related with the method proposed in this paper, but tailored for their application specific case.
3 Adaptive importance sampling with tree pyramids
As discussed in Section 1, Importance Sampling (IS) methods focus on approximating unknown values (of an arbitrary function ) such as expected value, variance, skewness, etc. of an unknown PDF . Those computations often involve intractable integrals like:
that are approximated with a Monte Carlo estimator built sampling from :
When direct sampling from is not possible, IS draws samples from a known proposal distribution and weights them by the ratio of likelihoods , namely the importance weight:
where the variance of the IS estimator is determined by:
Therefore, the selection of the proposal distribution has a huge impact on the sample efficiency by directly influencing the estimator variance. Usually the closer and are, the better the method performs. It is common for sampling methods to have an iterative nature (e.g. Monte Carlo Markov Chain). Adaptive methods, take advantage of previous samples drawn from to compute a new that is closer to to improve the estimator performance.
This section provides the details of the adaptive importance sampling method proposed in this paper that uses Tree Pyramids to describe the proposal distribution which focuses on generating samples in regions of high probability mass in order to efficiently approximate the target posterior PDF . After each sampling step, the tree is expanded to accommodate the new sample and improve the approximation of the proposal to the target distribution.
3.1 Tree pyramid parameterized proposal distribution
A Kdimensional treepyramid (KDTP) is a full tree where each node represents a Kdimensional convex subspace with center , radius , sample with corresponding importance weight , and a pointer to the set of children of . is given by , where is the radius of the root node and is the node level. The level is defined as the minimum number of edges that connect a node with the tree root. A node has either zero or children: . Each node The span of a tree is limited by the root radius. However, it is possible to dynamically reroot the tree to double the radius of the representable convex subspace in each dimension. The most common instances of KDTP are Full Binary Trees(), Quadtrees () and Octrees ().
Without limiting the generalization, we use the same radius for all dimensions making the subspaces hypercubes. This is not a limitation of the method as it is possible to represent the subspaces with different shapes by allowing dimensionwise radii. If required, for example by the nature of the data, the implementation of dimensionwise radii should be a straightforward extension to the proposed method by defining .
Creation and population of a KDTP is described by the algorithm that expands a node, namely subdivide one of the convex subspaces represented by a leaf node into its children, see Algorithm 1. An example of a 1D tree construction by node expansion is shown in Figure 1.
The probability distribution parameterized by a KDTP , is defined as a mixture model of distributions :
(1) 
where the number of mixture components is determined by the cardinality of the set of leaf nodes with the location of the kernels described by each leaf node center and the scale determined by its radius. For the experimental evaluation of the method, we plugged into Eq. 1 two types of distribution, a Uniform and a Multivariate Normal .
3.2 QuasiMonte Carlo adaptive importance sampling with tree pyramids
QuasiMonte Carlo methods lie between stratified and Monte Carlo sampling. In our method, the tree structure and the subdivision in convex subspaces is deterministic, while obtaining a sample from each hypercube is stochastic. This paper introduces the concept of nonuniform stratification that adapts the proposal distribution to the target distribution . See a sequence of sampling and adaptation steps in Figure 2.
Like other AIS algorithms, TPAIS can be divided in three major blocks: 1) Sampling, 2) weighting and 3) adaptation. Algorithm 2 describes the process of tree pyramid adaptive importance sampling that; 1) draws samples from the proposal distribution, parameterized by a tree pyramid, (lines 812). 2) Computes the importance weights (line 13), and 3) adapts the proposal after new samples are drawn (line 14).
The resulting proposal distribution (in this case parameterized by the resulting tree after samples) can be used as a generative model that approximates the target PDF. It is important to note that this sampling algorithm is anytime and there is no need to wait for samples to be generated to obtain a good approximation, the tree generation can be interrupted and the resulting set of samples and tree structure used as an approximation to the posterior PDF. In what follows, TPAIS (Algorithm 2) is described in more detail.
First, the tree is initialized by adding the root node at the center of the space with a radius that spans the desired sampling space determined by the space limits and (lines 23). An initial sample is drawn from the proposal and is added to the sample set (line 4). The importance weight of the initial sample is computed and the root node is updated with the sample and its weight (lines 56).
Repeat the next steps until the desired number of samples are obtained: 1) Sort the set of leaf nodes by importance density, computed as the product of the volume of the leaf node hypercube and its importance weight (lines 79). 2) Expand the first leaves (line 11), and 3) generate samples from the new mixture components induced by the new nodes (line 12), and compute their weights (line 13). Depending on the choice for sampling, weighting and resampling, different flavors of TPAIS can be implemented.
Standard and Deterministic Mixture weights
The algorithm described in Algorithm 2, corresponds to the standard TPAIS (sTPAIS) which is an implementation that follows the standard multiple importance sampling weighting Cappé et al. (2004a), where importance weights are computed using each individual mixture component as the proposal distribution:
Instead, weights can be computed using all the mixture components obtaining the Deterministic Mixture (DM) weighting. DM weights come at a more expensive computational cost (require the evaluation of all the proposals to compute weights) but are known to perform better in terms of variance Elvira et al. (2019). DMTPAIS can be implemented by plugging into lines 4 and 13 the following weighting:
In the special case of choosing a Uniform distribution to represent each of the subspace proposal distributions, the DM approach reduces to the simple case after normalizing the importance weights.
Leaf resampling
Another variation to the sTPAIS is the resampling strategy. As described in Algorithm 2, each time a sample is generated in a new subspace, it is never removed from the sample set or replaced by another sample. This can be problematic if the target distribution is peaked, early unlucky samples in low probability regions may span a large subspace requiring lots of samples before selecting that subspace for expansion. This problem can be addressed by implementing a resampling strategy. Before obtaining a new set of tree leaves (line 8), all the leaf nodes of the tree are resampled and weighted. This leads to a lower acceptance rate but reduces the variance and increases robustness to multimodal target distributions and unlucky samples. We believe other resampling strategies will lead to improved robustness and acceptance rate, which will be the topic of our future work.
Mixture TPAIS
Instead of ordering the tree leaves by importance density to select which one to expand, it is possible to generate the new samples directly from the mixture distribution parameterized by the tree pyramid. This sampling approach is similar to the Mixture Population Monte Carlo (MPMC Cappé et al. (2008)) approach, which, instead of generating one new sample from each individual component of the proposal, samples from the mixture itself. To implement this approach, the selection of the node to expand depends on a sample drawn from the weighted mixture model. The mixture weights depend on the normalized importance volume of the component:
Then, a sample from the weighted mixture model is obtained by Algorithm 3 in a twostep process. First, a random number is generated to select the sampling distribution (lines 36). Second, the sample is obtained by sampling from the selected mixture component. This approach replaces the sort operation in line 9 of Algorithm 2 by a sampling operation and adds a tree search operation to select which node to expand.
4 Evaluation
4.1 Baselines
We compare the presented TPAIS method and some of its variations to a variety of stateoftheart AIS methods: LAIS Martino et al. (2017), APIS Martino et al. (2015), MPMC Cappé et al. (2008), DMPMC Elvira et al. (2017). The comparison is extended beyond AIS methods and includes an MCMC baseline with the MetropolisHastings algorithm Hastings (1970) and MultiNested sampling Feroz et al. (2014) which is an evolution of Nested sampling Skilling (2006), an algorithm designed for evidence estimation, that addresses Nested Sampling problems with multimodal distributions.
4.2 Evaluation metrics
When the task at hand is to draw samples from a target distribution, the Normalized Effective Sample Size (NESS) is a popular measure of sampling efficiency for MCMC and IS methods. It can be interpreted as a factor that determines the number of samples required from the evaluated algorithm to generate an independent sample from the target distribution. This metric takes into account that samples in IS are biased by the proposal distribution, and MCMC samples are autocorrelated. In IS, an approximation is often defined as the inverse sum of the squared normalized importance weights:
in MCMC the definition of ESS is:
Where is the number of samples and is the correlation at the ith MCMC step. Although this metric has received some criticism, it is still widely used and it generally provides good performance Martino et al. (2017).
Besides ESS, we evaluate the approximation quality of the adapted proposal distributions obtained by AIS methods with a probability distribution similarity metric like the Jensen Shannon Divergence (a symmetric version of the more popular Kullback–Leibler divergence) shown in Eq. 3.
(2) 
(3) 
Unfortunately, JSD requires the computation of integrals of intractable densities. Because the parameter space we are considering is bounded, we could approximate them with enough precision by considering a small value of . However, it becomes rapidly intractable when the number of dimensions increases. To address this issue, we use a Monte Carlo estimate to the metrics by drawing uniformly distributed samples from P and Q and compute their empirical JSD metric, See Eq.4.
(4) 
The MCMC based baseline methods do not have a proposal distribution that is being adapted to approximate the target distribution, in order to evaluate how close the samples generated are to the target distribution, we use a Kernel Density Estimate (KDE) of the density using the generated samples.
Some sampling methods are based on accept/reject Hastings (1970); Skilling (2006) or have complex sampling mechanisms Feroz et al. (2014) that can yield good ESS with an increased computational cost, hence we include runtime to the evaluation metrics. This is of special interest in cases where the likelihood function computational cost is negligible and the sampling strategy becomes the computational bottleneck. An example evaluation with N=25 is depicted in Figure 3. The evaluation proceeds as follows:

Select a known PDF as the ground truth distribution.

Use the evaluated sampling method to draw samples.

If the IS method used is adaptive, the proposal distribution after drawing samples is used. If the evaluated method only generates samples and does not adapt the proposal, the drawn samples are used to compute a KDE approximate PDF .

Compute evaluation metrics: ESS(), elapsed time, and .
4.3 Ground truth distributions
Sampling methods performance often depends on the properties of the target distribution . Thus, we evaluate the methods using different ground truth parametric PDFs with varied number of modes and randomized moments, see some examples in Figure 4. We use Gaussian Mixture Models (GMMs) to parameterize the ground truth distributions as follows:
Where is a vector of means, is a covariance matrix and is the normalized mixture weight. The subindex is used to reference the mixture component of the components of dimensionality that compose the mixture model. Examples of 2D GMMs used for evaluation are shown in Figure 4.
For evaluating the sampling methods, we use three different types of GMMs with randomized first and second order moments. i) The first target distribution, referred as “normal”, has just one mixture component with and . ii) the second target distribution has 5 mixture components with and . iii) the third target distribution is a GMM version of the egg crate function with 4 equidistant modes per dimension and . An example of each ground truth distribution is depicted in Figure 4.
5 Results and discussion
In this evaluation we have considered the simple version of TPAIS with resampling which will be the flavor of TPAIS being used unless otherwise mentioned. For more results with other variations of the method we refer the reader to the additional material.
To obtain the results presented in this section, we have conducted the procedure detailed in Section 4 a hundred times for each combination of dimensionality and random sampled target PDF. The random seed was fixed to evaluate all the methods with the same target distributions.
In Figure 5 we show a visualization of the posterior approximation obtained with the different methods evaluated. It can be seen how for the shown cases the TPAIS approach provides a much closer approximation to the target distribution with the same number of samples. In the remainder of this section the evaluated metrics vs. number of samples are discussed.
One of the most impactful aspects in importance sampling performance is the similarity between the proposal distribution and the target distribution. We have evaluated the similarity using the JensenShannon Divergence detailed in Section 4. For lower dimensional (1D4D) problems, the left column of Figure 6 shows that our TPAIS method greatly outperforms the baselines by converging much faster to a better approximate solution in the three types of distributions evaluated. For higher dimensional cases, TPAIS performance is on par with other methods, see right column of Figure 6.
NESS measures the sample efficiency of a method, results for this metric are shown in Figure 7, TPAIS outperforms existing methods in terms of sample efficiency in low dimensions. In higher dimensions all the methods yield poor performance as can be seen in the right column of Figure 7. This low performance is an expected result, for higher dimensions density functions become more concentrated causing the importance weights to skyrocket consequently reducing the ESS.
Regarding the time complexity metric, Figure 8 shows how the evaluated methods scale with the number of samples. Although TPAIS implements a more sophisticated sampling method, the time required to generate N samples is comparable to the rest of the methods for a reasonable number of samples. A surprising result observed in higher dimensional problems (Figure 8 right), is the subpar performance of the MCMCMH approach, where it is supposed to scale well with dimensions. With the increase in dimensions, the target density is more concentrated causing a lot of the proposed MC moves to have very low acceptance probability and be rejected, dramatically increasing the time required to generate N samples. This is a well known problem that MCMC methods suffer from and can be alleviated by the finetuning the MC proposal distribution. It can be seen how the other methods are not impacted by this issue.
Discussion
The presented method provides a significant speedup sampling procedure by increasing the sampling efficiency (See. Figure 7). It might seem that the more complex formulation of the proposal distribution and its adaptation can impact the computational complexity of the sampling algorithm. However, although in some cases TPAIS requires more time to generate a specific number of samples (especially when the desired number of samples is high), its sampling efficiency allows the method to obtain a better proposal distribution approximation with less samples. For example, in the 1D GMM case, shown in Figure 6, TPAIS obtains with 100 samples better accuracy (i.e. lower JSD) that the other benchmarked methods after 1000 samples.
In contrast to other Quasi MonteCarlo approaches Wessing (2017); Joshi et al. (2016), our proposed algorithm does not lose the anytime property that MCMC methods have and the sampling process can be halted at anytime allowing for accuracytime tradeoffs.
The main drawback of the proposed approach is dimensionality, which limits its application from low to mid dimensional problems. Each new sample step subdivides the space into subspaces, constraining the number of samples that can be generated after each time step. For example, for a 9D problem, each sampling step needs to generate 512 samples which, depending on the application, might be overwhelming. On the other hand, this fact opens the door to parallel implementations that compute the likelihood of all the new samples concurrently further improving the number of samples per unit of time that this approach can deliver. In any case, this is a wellknown limitation of IS methods in general and TPAIS is not an exception.
Another limitation of TPAIS is the need of boundaries for the sampling space. This can cause to have regions of nonzero probability outside the approximated space, the probability mass outside the space is distributed over the sampling space resulting in density overestimation. However, in practice it is reasonable to assume known boundaries in the domain of the sampled function and in our evaluation we did not experience this issue to be a limitation.
Besides the sampling domain, the parameter free aspect of TPAIS is one of its strengths, all other methods compared have parameters that need to be tuned. It is known that proposal distribution design has a huge impact on performance, because parameters have an important impact on the adaptation of the proposal distribution, good parameter selection has a huge influence on the inference result. TPAIS circumvents that problem and enables the user to exploit its full potential without deep domain knowledge.
6 Conclusions
In this paper, we have presented a new adaptive importance sampling algorithm that parameterizes the proposal distribution using tree pyramids that structure the proposal in partitioned convex subspaces. The exhaustive evaluation with a variety of complex target posteriors has shown the accuracy and sample efficiency of the proposed approach compared to several wellknown and stateofthe art approaches.
We presented the first steps of a promising sampling scheme that combines quasiMonte Carlo techniques with adaptive importance sampling. More research in the resampling strategies, kernel selection and representation of the subspaces can lead to improved sample efficiency and better approximations. A future extension is to explore a Raoblackwellized representation of subspaces instead of discarding previous samples.
The convergence speed of this sampling technique and its high NESS may enable the application of IS for realtime inference in applications with midtolow dimensionality like object 6D pose estimation.
Extended stateoftheart comparison results
Extended TPAIS flavors comparison results
References
 (2003) Structured Importance Sampling of Environment Maps. In ACM SIGGRAPH 2003 Papers, SIGGRAPH ’03, New York, NY, USA, pp. 605–612. External Links: Document, ISBN 1581137095, Link Cited by: §2.3.
 (2006) Pattern recognition and machine learning. springer. Cited by: §2.1.
 (201707) Adaptive Importance Sampling: The past, the present, and the future. IEEE Signal Processing Magazine 34 (4), pp. 60–79. External Links: Document, ISSN 10535888 Cited by: §2.2.
 (2016) Importance Sampling Tree for Largescale Empirical Expectation. Proceedings of The 33rd International Conference on Machine Learning 48, pp. 1454–1462. External Links: ISBN 9781510829008, Link Cited by: §2.3.
 (2004) Population Monte Carlo. Journal of Computational and Graphical Statistics 13 (4), pp. 907–929. External Links: Document, ISSN 10618600 Cited by: §3.2.1.
 (2004) Population Monte Carlo. Journal of Computational and Graphical Statistics 13 (4), pp. 907–929. External Links: Document, ISSN 10618600 Cited by: §2.2.
 (2008) Adaptive importance sampling in general mixture classes. Statistics and Computing 18 (4), pp. 447–459. External Links: Document, 0710.4242, ISSN 09603174, Link Cited by: Tree pyramidal adaptive importance sampling, §2.2, §3.2.3, §4.1.
 (2004) Generalized acceptreject sampling schemes. In A Festschrift for Herman Rubin, Lecture Notes–Monograph Series, Vol. Volume 45, pp. 342–347. External Links: Document, Link Cited by: §2.2.
 (2005) Wavelet importance sampling. ACM Transactions on Graphics 24 (3), pp. 1166. External Links: Document, ISSN 07300301, Link Cited by: §2.3.
 (2012) Adaptive multiple importance sampling. Scandinavian Journal of Statistics 39 (4), pp. 798–812. Cited by: §2.2.
 (1987) Hybrid Monte Carlo. Physics Letters B 195 (2), pp. 216–222. External Links: Document, ISSN 03702693, Link Cited by: §2.1.
 (2017) Improving population Monte Carlo: Alternative weighting and resampling schemes. Signal Processing 131 (Mc), pp. 77–91. External Links: Document, arXiv:1607.02758v1, ISSN 01651684, Link Cited by: Tree pyramidal adaptive importance sampling, §4.1.
 (201902) Generalized multiple importance sampling. Statist. Sci. 34 (1), pp. 129–155. External Links: Document, Link Cited by: §3.2.1.
 (201808) Importance Sampling of Many Lights with Adaptive Tree Splitting. Proc. ACM Comput. Graph. Interact. Tech. 1 (2), pp. 25:1–25:17. External Links: Document, ISSN 25776193, Link Cited by: §2.3.
 (2014) Importance Nested Sampling and the MULTINEST Algorithm. Arxiv astro physics. External Links: Document, 1306.2144v2, ISSN 00144754, Link Cited by: §4.1, §4.2.
 (198411) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI6 (6), pp. 721–741. External Links: Document, ISSN Cited by: §2.1, §2.1.
 (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 (1), pp. 97–109. External Links: Document, ISSN 00063444, Link Cited by: §2.1, §4.1, §4.2.
 (2014) The nouturn sampler: adaptively setting path lengths in hamiltonian monte carlo.. Journal of Machine Learning Research 15 (1), pp. 1593–1623. Cited by: §2.1.
 (2001) Tutorial on variational approximation methods. Advanced mean field methods: theory and practice, pp. 129. Cited by: §2.1.
 (2016) Improving grid based bayesian methods. arXiv preprint arXiv:1609.09174. Cited by: §2.3, §5.
 (1989) Adaptive importance sampling. In Structural Safety and Reliability, pp. 855–862. Cited by: §2.2.
 (201705) Layered adaptive importance sampling. Statistics and Computing 27 (3), pp. 599–623. External Links: Document, ISSN 15731375, Link Cited by: Tree pyramidal adaptive importance sampling, §2.1, §2.2, §4.1.
 (2017) Effective sample size for importance sampling based on discrepancy measures. Signal Processing 131, pp. 386–401. External Links: Document, ISSN 01651684, Link Cited by: §4.2.
 (2015) An adaptive population importance sampler: Learning from uncertainty. IEEE Transactions on Signal Processing 63 (16), pp. 4422–4437. Cited by: §2.2, §4.1.
 (2013) Monte carlo theory, methods and examples. Online. Cited by: §2.3.
 (2000) Safe and effective importance sampling. Journal of the American Statistical Association 95 (449), pp. 135–143. Cited by: §2.2.
 (2006) Nested Sampling for Bayesian Computations. Bayesian Analysis 4, pp. 833–860. External Links: Document, ISBN 0735402175, ISSN 19360975, Link Cited by: §4.1, §4.2.
 (1997) Robust monte carlo methods for light transport simulation. Vol. 1610, Stanford University PhD thesis. Cited by: §2.2.
 (2017) Experimental analysis of a generalized stratified sampling algorithm for hypercubes. arXiv preprint arXiv:1705.03809. Cited by: §2.3, §5.