Generalized Rejection Sampling Schemes and Applications in Signal Processing
Abstract
Bayesian methods and their implementations by means of sophisticated Monte Carlo techniques, such as Markov chain Monte Carlo (MCMC) and particle filters, have become very popular in signal processing over the last years. However, in many problems of practical interest these techniques demand procedures for sampling from probability distributions with nonstandard forms, hence we are often brought back to the consideration of fundamental simulation algorithms, such as rejection sampling (RS). Unfortunately, the use of RS techniques demands the calculation of tight upper bounds for the ratio of the target probability density function (pdf) over the proposal density from which candidate samples are drawn. Except for the class of logconcave target pdf’s, for which an efficient algorithm exists, there are no general methods to analytically determine this bound, which has to be derived from scratch for each specific case. In this paper, we introduce new schemes for (a) obtaining upper bounds for likelihood functions and (b) adaptively computing proposal densities that approximate the target pdf closely. The former class of methods provides the tools to easily sample from a posteriori probability distributions (that appear very often in signal processing problems) by drawing candidates from the prior distribution. However, they are even more useful when they are exploited to derive the generalized adaptive RS (GARS) algorithm introduced in the second part of the paper. The proposed GARS method yields a sequence of proposal densities that converge towards the target pdf and enable a very efficient sampling of a broad class of probability distributions, possibly with multiple modes and nonstandard forms. We provide some simple numerical examples to illustrate the use of the proposed techniques, including an example of target localization using range measurements, often encountered in sensor network applications.
Rejection sampling; adaptive rejection sampling; Gibbs sampling; particle filtering; Monte Carlo integration; sensor networks; target localization.
I Introduction
Bayesian methods have become very popular in signal processing during the past decades and, with them, there has been a surge of interest in the Monte Carlo techniques that are often necessary for the implementation of optimal a posteriori estimators Fitzgerald01 (); Doucet01b (); Robert04 (); Liu04b (). Indeed, Monte Carlo statistical methods are powerful tools for numerical inference and optimization Robert04 (). Currently, there exist several classes of MC techniques, including the popular Markov Chain Monte Carlo (MCMC) Fitzgerald01 (); Larocque02 () and particle filtering Djuric03 (); Doucet01b () families of algorithms, which enjoy numerous applications. However, in many problems of practical interest these techniques demand procedures for sampling from probability distributions with nonstandard forms, hence we are often brought back to the consideration of fundamental simulation algorithms, such as importance sampling DeGroot02 (), inversion procedures Robert04 () and the accept/reject method, also known as rejection sampling (RS).
The RS approach (Robert04, , Chapter 2) is a classical Monte Carlo technique for “universal sampling”. It can be used to generate samples from a target probability density function (pdf) by drawing from a possibly simpler proposal density. The sample is either accepted or rejected by an adequate test of the ratio of the two pdf’s, and it can be proved that accepted samples are actually distributed according to the target density. RS can be applied as a tool by itself, in problems where the goal is to approximate integrals with respect to (w.r.t.) the pdf of interest, but more often it is a useful building block for more sophisticated Monte Carlo procedures Gilks92 (); Gilks94 (); Kunsch05 (). An important limitation of RS methods is the need to analytically establish a bound for the ratio of the target and proposal densities, since there is a lack of general methods for the computation of exact bounds.
One exception is the socalled adaptive rejection sampling (ARS) method Gilks92 (); Gilks92derfree (); Robert04 () which, given a target density, provides a procedure to obtain both a suitable proposal pdf (easy to draw from) and the upper bound for the ratio of the target density over this proposal. Unfortunately, this procedure is only valid when the target pdf is strictly logconcave, which is not the case in most practical cases. Although an extension has been proposed Hoermann95 (); Evans98 () that enables the application of the ARS algorithm with concave distributions (where is a monotonically increasing function, not necessarily the logarithm), it does not address the main limitations of the original method (e.g., the impossibility to draw from multimodal distributions) and is hard to apply, due to the difficulty to find adequate transformations other than the logarithm. Another algorithm, called adaptive rejection metropolis sampling (ARMS) Gilks95 (), is an attempt to extend the ARS to multimodal densities by adding MetropolisHastings steps. However, the use of an MCMC procedure has two important consequences. First, the resulting samples are correlated (unlike in the original ARS method), and, second, for multimodal distributions the Markov Chain often tends to get trapped in a single mode.
In this paper we propose general procedures to apply RS when the target pdf is the posterior density of a signal of interest (SoI) given a collection of observations. Unlike the ARS technique, our methods can handle target pdf’s with several modes (hence nonlogconcave) and, unlike the ARMS algorithm, they do not involve MCMC steps. Hence, the resulting samples are independent and come exactly from the target pdf.
We first tackle the problem of computing an upper bound for the likelihood of the SoI given fixed observations. The proposed solutions, that include both closedform bounds and iterative procedures, are useful when we draw the candidate samples from the prior pdf.
In this second part of the paper, we extend our approach to devise a generalization of the ARS method that can be applied to a broad class of pdf’s, possibly multimodal. The generalized algorithm yields an efficient proposal density, tailored to the target density, that can attain a much better acceptance rate than the prior distribution. We remark that accepted samples from the target pdf are independent and identically distributed (i.i.d).
The remaining of the paper is organized as follows. We formally describe the signal model in Section II. Some useful definitions and basic assumptions are introduced in Section III. In Section IV, we propose a general procedure to compute upper bounds for a large family of likelihood functions. The ARS method is briefly reviewed in Section V, while the main contribution of the paper, the generalization of the ARS algorithm, is introduced in Section VI. Section VII is devoted to simple numerical examples and we conclude with a brief summary in Section VIII.
Ii Model and Problem Statement
Iia Notation
Scalar magnitudes are denoted using regular face letters, e.g., , , while vectors are displayed as boldface letters, e.g., x, X. We indicate random variates with uppercase letters, e.g., , X, while we use lowercase letters to denote the corresponding realizations, e.g., , x. We use letter to denote the true probability density function (pdf) of a random variable or vector. This is an argumentwise notation, common in Bayesian analysis. For two random variables and , is the true pdf of and is the true pdf of , possibly different. The conditional pdf of given is written . Sets are denoted with calligraphic uppercase letters, e.g., .
IiB Signal Model
Many problems in science and engineering involve the estimation of an unobserved SoI, , from a sequence of related observations. We assume an arbitrary prior probability density function (pdf) for the SoI, , and consider scalar random observations, , , which are obtained through nonlinear transformations of the signal X contaminated with additive noise. Formally, we write
(1) 
where is the random observation vector, , are nonlinearities and are independent noise variables, possibly with different distributions for each . We write for the vector of available observations, i.e., a realization of Y.
We assume exponentialtype noise pdf’s, of the form
(2) 
where is real constant and is a function, subsequently referred to as marginal potential, with the following properties:

It is real and non negative, i.e., .

It is increasing () for and decreasing () for .
These conditions imply that has a unique minimum at and, as a consequence has only one maximum (mode) at . Since the noise variables are independent, the joint pdf is easy to construct and we can define a joint potential function as
(3) 
Substituting (2) into (3) yields
(4) 
where is a constant. In subsequent sections we will be interested in a particular class of joint potential functions denoted as
(5) 
where the subscript identifies the specific member of the class. In particular, the function obtained for , is termed quadratic potential.
Let be the vectorvalued nonlinearity defined as . The scalar observations are conditionally independent given a realization of the SoI, , hence the likelihood function , can be factorized as
(6) 
where . The likelihood in (6) induces a system potential function , defined as
(7) 
that depends on x, the observations y, and the function g. Using (4) and (7), we can write the system potential in terms of the joint potential,
(8) 
IiC Rejection Sampling
Assume that we wish to approximate, by sampling, some integral of the form , where is some measurable function of x and is the posterior pdf of the SoI given the observations. Unfortunately, it may not be possible in general to draw directly from , so we need to apply simulation techniques to generate adequate samples. One appealing possibility is to perform RS using the prior, , as a proposal function. In such case, let be a lower bound for the system potential, , so that is an upper bound for the likelihood, . We can generate samples according to the standard RS algorithm.

Set .

Draw samples from and from , where is the uniform pdf in .

If then , else discard and go back to step 2.

Set . If then stop, else go back to step 2.
Then, can be approximated as . The fundamental figure of merit of a rejection sampler is the acceptance rate, i.e., the mean number of accepted samples over the total number of proposed candidates.
In Section IV, we address the problem of analytically calculating the bound . Note that, since the function is monotonous, it is equivalent to maximize w.r.t. x and to minimize the system potential also w.r.t. x. As a consequence, we may focus on the calculation of a lower bound for . Note that this problem is far from trivial. Even for very simple marginal potentials, , , the system potential can be highly multimodal w.r.t. x. See the example in the Section VIIA for an illustration.
Iii Definitions and Assumptions
Hereafter, we restrict our attention to the case of a scalar SoI, . This is done for the sake of clarity, since dealing with the general case requires additional definitions and notations. The techniques to be described in Sections IVVI can be extended to the general case, although this extension is not trivial. The example in Section VIIC illustrates how the proposal methodology is also useful in higher dimensional spaces, though.
For a given vector of observations , we define the set of simple estimates of the SoI as
(9) 
Each equation , in general, can yield zero, one or more simple estimates. We also introduce the maximum likelihood (ML) SoI estimator , as
(10) 
not necessarily unique.
Let us use to denote the support of the vector function g, i.e., . We assume that there exists a partition of (i.e., and , ) such that the subsets are intervals in and we can define functions and , as
(11) 
i.e., is the restriction of to the interval . We further assume that (a) every function is invertible in and (b) every function is either convex in or concave in . Assumptions (a) and (b) together mean that, for every and all , the first derivative is either strictly positive or strictly negative and the second derivative is either nonnegative or nonpositive. As a consequence, there are exactly simple estimates (one per observation) in each subset of the partition, namely for . We write the set of simple estimates in as . Due to the additivity of the noise in (1), if is bounded there may be a nonnegligible probability that (or ), where denotes the closure of set , hence may not exist for some realization . In such case, we define (or , respectively), and admit (respectively, ) as valid simple estimates.
Iv Computation of upper bounds on the likelihood
Iva Basic method
Let y be an arbitrary but fixed realization of the observation vector Y. Our goal is to obtain an analytical method for the computation of a scalar such that . Hereafter, we omit the dependence on the observation vector and write simply . The main difficulty to carry out this calculation is the nonlinearity g, which renders the problem not directly tractable. To circumvent this obstacle, we split the problem into subproblems and address the computation of bounds for each set , , in the partition of . Within , we build adequate linear functions in order to replace the nonlinearities . We require that, for every , the inequalities
(12) 
(13) 
hold jointly for all , and all , where is any closed interval in such that (i.e., any ML estimator of the SoI restricted to , possibly non unique) is contained in . The latter requirement can be fulfilled if we choose (see the Appendix for a proof).
If (12) and (13) hold, we can write
(14) 
which follows easily from the properties (P1) and (P2) of the marginal potential functions as described in Section IIB. Moreover, since and (this function will be subsequently referred as the modified system potential) where and , Eq. (14) implies that , , and, as a consequence,
(15) 
Therefore, it is possible to find a lower bound in for the system potential , denoted , by minimizing the modified potential in .
All that remains is to actually build the linearities . This construction is straightforward and can be described graphically by splitting the problem into two cases. Case 1 corresponds to nonlinearities such that (i.e., is either increasing and convex or decreasing and concave), while case 2 corresponds to functions that comply with (i.e., is either increasing and concave or decreasing and convex), when .
Figure 1 (a)(b) depicts the construction of in case 1. We choose a linear function that connects the point and the point corresponding to the simple estimate, . In the figure, and denote the distances and , respectively. It is apparent that for all , hence inequality (12) is granted. Inequality (13) also holds for all , since and are either simultaneously greater than (or equal to) , or simultaneously lesser than (or equal to) .
Figure 1 (c)(d) depicts the construction of in case 2. We choose a linear function that connects the point and the point corresponding to the simple estimate, . Again, and denote the distances and , respectively. It is apparent from the two plots that inequalities (12) and (13) hold for all .
A special subcase of 1 (respectively, of 2) occurs when (respectively, ). Then, is the tangent to at . If then is a horizontal asymptote of .
It is often possible to find in closedform. If we choose , then is a global lower bound of the system potential. Table I shows an outline of the proposed method, that will be subsequently referred to as bounding method 1 (BM1) for conciseness.
1. Find a partition of the space of the SoI. 
2. Compute the simple estimates for each . 
3. Calculate and build , for and . 
4. Replace with , and minimize to find the lower bound . 
5. Find . 
IvB Iterative Implementation
The quality of the bound depends, to a large extent, on the length of the interval , denoted . This is clear if we think of as a linear approximation on of the nonlinearity . Since we have assumed is continuous and bounded in , the procedure to build in BM1 implies that
(16) 
for all . Therefore, if we consider intervals which are shorter and shorter, then the modified potential function will be closer and closer to the true potential function , and hence the bound will be tighter.
The latter observation suggests a procedure to improve the bound for a given interval . Indeed, let us subdivide into subintervals denoted where and . We refer to the elements in the collection , with , as support points in the interval . We can build linear functions for every subinterval , using the procedure described in Section IVA. We recall that this procedure is graphically depicted in Fig. 1, where we simply need to

substitute by and

when the simple estimate , substitute by ( by ) if (if , respectively).
Using we compute a bound , , and then select . Note that the subscript in indicates how many support points have been used to computed the bound in (which becomes tighter as increases). Moreover if we take a new (arbitrary) support point from the subinterval that contains , and extend the set of support points with it, with , then we can iterate the proposed procedure and obtain a refined version of the bound, denoted .
The proposed iterative algorithm is described, with detail, in Table II. Note that is an iteration index that makes explicit the number of support points . If we plug this iterative procedure for the computation of into BM1 (specifically, replacing steps 3 and 4 of Table I), we obtain a new technique that we will hereafter term bounding method 2 (BM2).
As an illustration, Figure 2 shows four steps of the iterative algorithm. In Figure 2 (a) there are two support points , which yield a single interval . In Figures 2 (b)(c)(d), we successively add a point chosen in the interval that contains the latest bound. In this example, the point is chosen deterministically as the mean of the extremes of the interval .
1. Start with , and . Let and . 
2. Choose an arbitrary interior point in , and update the set of support points . 
3. Sort in ascending order, so that where , , 
and is the number of elements of . 
4. Build for each interval with . 
5. Find , for . 
6. Set the refined bound , and set . 
7. To iterate, go back to step 2. 
IvC Lower bound for quadratic potentials
Assume that the joint potential is quadratic, i.e., for each , and construct the set of linearities , for and . The modified system potential in becomes
(17) 
and it turns out straightforward to compute . Indeed, if we denote and , then we can readily obtain
(18) 
and . It is apparent that . Furthermore, is an approximation of the ML estimator restricted to .
IvD Adaptation of for generic system potentials
If the joint potential is not quadratic, in general it can still be difficult to minimize the modified function , despite the replacement of the nonlinearities with the linear functions . In this section, we propose a method to transform the bound for a quadratic potential, , into a bound for some other, nonquadratic, potential function.
Consider an arbitrary joint potential and assume the availability of an invertible increasing function such that , where denotes the composition of functions. Then, for the system potential we can write
(19) 
and, as consequence, , hence is a lower bound for the nonquadratic system potential constructed from .
For instance, consider the family of joint potentials . Using the monotonicity of norms, it is possible to prove Williams91 () that
(20) 
(21) 
Let . Since this function is, indeed, strictly increasing, we can transform the inequality (20) into
(22) 
which yields
(23) 
hence the transformation of the quadratic bound is a lower bound for with . Similarly, if we let , the inequality (21) yields
(24) 
hence the transformation is a lower bound for when .
It is possible to devise a systematic procedure to find a suitable function given an arbitrary joint potential , where . Let us define the manifold . We can construct by assigning with the maximum of the quadratic potential when , i.e., we define
(25) 
Note that (25) is a constrained optimization problem that can be solved using, e.g., Lagrangian multipliers.
From the definition in (25) we obtain that, , . In particular, since from the definition of , we obtain the desired relationship,
(26) 
We additionally need to check whether is a strictly increasing function of . The two functions in the earlier examples of this Section, and , can be readily found using this method.
IvE Convex marginal potentials
Assume that and that we have already found , and , using the technique in Section IVA. If a marginal potential is convex, the function is also convex in . Indeed, for all
(27) 
where we have used that (since is linear).
As a consequence, if all marginal potentials are convex, then the modified system potential, , is also convex in . This is easily shown using (27), to obtain
(28) 
Therefore, we can use the tangents to at the limit points of (i.e, and ) to find a lower bound for the system potential . Figure 3 (left) depicts a system potential (solid line), the corresponding modified potential (dotted line) and the two tangent lines at and . It is apparent that the intersection of the two tangents yields a lower bound in . Specifically, if we let be the piecewiselinear function composed of the two tangents, then the inequality is satisfied for all .
V Adaptive Rejection Sampling
The adaptive rejection sampling (ARS) Gilks92 () algorithm enables the construction of a sequence of proposal densities, , and bounds tailored to the target density. Its most appealing feature is that each time we draw a sample from a proposal and it is rejected, we can use this sample to build an improved proposal,