Improved Adaptive Rejection Metropolis Sampling Algorithms

# Improved Adaptive Rejection Metropolis Sampling Algorithms

## Abstract

Markov Chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings (MH) algorithm, are widely used for Bayesian inference. One of the most important challenges for any MCMC method is speeding up the convergence of the Markov chain, which depends crucially on a suitable choice of the proposal density. Adaptive Rejection Metropolis Sampling (ARMS) is a well-known MH scheme that generates samples from one-dimensional target densities making use of adaptive piecewise linear proposals constructed using support points taken from rejected samples. In this work we pinpoint a crucial drawback of the adaptive procedure used in ARMS: support points might never be added inside regions where the proposal is below the target. When this happens in many regions it leads to a poor performance of ARMS, and the sequence of proposals never converges to the target. In order to overcome this limitation, we propose two alternative adaptive schemes that guarantee convergence to the target distribution. These two new schemes improve the adaptive strategy of ARMS, thus allowing us to simplify the construction of the sequence of proposals. Numerical results show that the new algorithms outperform the standard ARMS and other techniques.

M

arkov Chain Monte Carlo (MCMC) methods; Metropolis-Hastings (MH) algorithm; Adaptive Rejection Metropolis Sampling (ARMS).

\IEEEpeerreviewmaketitle

## 1 Introduction

Bayesian inference techniques and their implementation by means of sophisticated Monte Carlo (MC) statistical methods, such as Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) approaches (also known as particle filters), has become a very active area of research over the last years (Gilks et al., 1995a; Gamerman, 1997; Liu, 2004; Robert and Casella, 2004; Liang et al., 2010). Monte Carlo techniques are very powerful tools for numerical inference, stochastic optimization and simulation of complex systems (Devroye, 1986; Fearnhead, 1998; Doucet et al., 2001; Jaeckel, 2002; Robert and Casella, 2004).

Rejection sampling (RS) (Devroye, 1986; von Neumann, 1951) and the Metropolis-Hastings (MH) algorithm are two well-known classical Monte Carlo methods for universal sampling (Metropolis and Ulam, 1949; Metropolis et al., 1953; Hastings, 1970; Liu, 2004). Indeed, they can be used to generate samples (independent with the RS and correlated with MH) from virtually any target probability density function (pdf) by drawing from a simpler proposal pdf. Consequently, these two methods have been widely diffused and applied. Unfortunately, in some situations they present certain important limitations and drawbacks as we describe below.

In the RS technique the generated samples are either accepted or rejected by an adequate test of the ratio of the target, , and the proposal density, , with .1 An important limitation of RS is the need to analytically establish an upper bound for the ratio of these two pdfs, or equivalently , which is not always an easy task. Moreover, even using the best bound, i.e. , the acceptance rate can be very low if the proposal is not similar to the target pdf. On the oher hand, in the MH algorithm, depending on the choice of the proposal pdf, the correlation among the samples in the Markov chain can be very high (Liu, 2004; Liang et al., 2010; Martino and Míguez, 2010). Correlated samples provide less statistical information and the resulting chain can remain trapped almost indefinitely in a local mode, meaning that convergence will be very slow in practice. Moreover, determining how long the chain needs to be run in order to converge is a difficult task, leading to the common practice of establishing a large enough burn-in period to ensure the chain’s convergence (Gilks et al., 1995a; Gamerman, 1997; Liu, 2004; Liang et al., 2010).

In order to overcome these limitations several extensions of both techniques have been introduced (Devroye, 1986; Liu, 2004; Liang et al., 2010), as well as methods combining them (Tierney, 1991, 1994). Furthermore, in the literature there is great interest in adaptive MCMC approaches that can speed up the convergence of the Markov chain (Haario et al., 2001; Gasemyr, 2003; Andrieu and Moulines, 2006; Andrieu and Thoms, 2008; Holden et al., 2009; Griffin and Walker, 2011). Here we focus on two of the most popular adaptive extensions, pointing out their important limitations, and proposing a novel adaptive technique that can overcome them and leads to a much more efficient approach for drawing samples from the target pdf.

One widely known extension of RS is the class of adaptive rejection sampling (ARS) methods (Gilks, 1992; Gilks and Wild, 1992; Gilks et al., 1997), which is an improvement of the standard RS technique that ensures high acceptance rates with a moderate and bounded computational cost. Indeed, the standard ARS algorithm of (Gilks and Wild, 1992) yields a sequence of proposal functions, , that converge towards the target pdf when the procedure is iterated. The basic mechanism of ARS is the following. Given a set of support points, , ARS builds a proposal pdf composed of truncated exponential pdfs inside the intervals (), considering also the open intervals (corresponding to ) and (associated to ) when . Then, when a newly proposed sample , drawn from , is rejected, it is always added to the set of support points, , which are used to build a refined proposal pdf, , for the next iteration.

Note that the proposal density used in ARS becomes quickly closer to the target pdf and the proportion of accepted samples grows. Consequently, since the proposal pdf is only updated when a sample is rejected and the probability of discarding a sample decreases quickly to zero, the computational cost is bounded and remains moderate, i.e. the computational cost does not diverge due to this smart adaptive strategy that improves the proposal pdf only when and where it is needed. Unfortunately, this algorithm can only be used with log-concave target densities and hence also unimodal. Therefore, several generalizations have been proposed (Gilks et al., 1995b; Hörmann, 1995; Evans and Swartz, 1998; Görür and Teh, 2011; Martino and Míguez, 2011) handling specific classes of target distributions or using jointly a MCMC approach.

Indeed, for instance, another popular technique that combines the ARS and MH approaches in an attempt to overcome the limitations of both methodologies is the Adaptive Rejection Metropolis Sampling (ARMS) (Gilks et al., 1995b). ARMS extends ARS to tackle multimodal and non-log-concave densities by allowing the proposal to remain below the target in some regions and adding a Metropolis-Hastings (MH) step to the algorithm to ensure that the accepted samples are properly distributed according to the target pdf. Note that introducing the MH step means that, unlike in the original ARS method, the resulting samples are correlated. The ARMS technique uses first an RS test on each generated sample and, if this sample is initially accepted, applies also an MH step to determine whether it is finally accepted or not. If a sample is rejected in the RS test (hence, imperatively , as otherwise samples are always initially accepted by the RS step), then it is incorporated to the set of support points, , that is used to build a better proposal pdf for the next iteration, , exactly as in ARS. On the other hand, when a sample is initially accepted, after the MH step the set of support points is never updated, i.e. , meaning that the proposal pdf is not improved and .

This is the crucial point w.r.t. the performance of ARMS. In general, if proper procedures for constructing given a set of support points, , such as the ones proposed in (Gilks et al., 1995b; Meyer et al., 2008) are adopted, the proposal will improve throughout the whole domain . However, inside the regions of the domain where , new support points might never be added. Consequently, the convergence of the sequence to cannot be guaranteed, and it may not occur in some cases, as shown in Section 4 through a simple example. Due to this problem and the correlation among the samples generated, when the target pdf is multimodal the Markov chain generated by the ARMS algorithm tends to get trapped in a single mode despite the (partial) adaptation of the proposal pdf (see e.g. example 2 in (Martino and Míguez, 2010)).

It is important to remark that this drawback is caused by a structural problem of ARMS: the adaptive mechanism proposed for updating the proposal pdf is not complete, since it never adds support points inside regions where . Therefore, even though a suitable procedure for constructing the proposals (see e.g. (Gilks et al., 1995b; Meyer et al., 2008)) can allow them to change inside these regions (eventually obtaining in some cases), the convergence cannot be guaranteed regardless of the procedure used to build the proposal densities. For this reason, the performance of the standard ARMS depends critically on the choice of a very good way to construct the proposal pdf, , that attains almost everywhere, implying that ARMS tends to be reduced to the ARS algorithm. Indeed, note that if the procedure used to build the proposal produces , then the proposal is never improved, i.e. for all , resulting in no adaptation at all.

This structural problem of the ARMS approach can be solved in a trivial way: adding a new support point each time that the MH step is applied. Unfortunately, in this case the computational cost of the algorithm increases rapidly as the number of support points diverges. Moreover, a second major problem is that the convergence of the Markov chain to the invariant density cannot be ensured, as partially discussed in (Gilks et al., 1995b) and in (Gilks et al., 1997) for the case of ARMS-within-Gibbs (see also (Liang et al., 2010, Chapter 8) for further considerations).

In this work we present two enhancements of ARMS that guarantee the convergence of the sequence of proposal densities to the target pdf with a bounded computational cost, as in the standard ARS and ARMS approaches, since the probability of incorporating a new support point quickly decreases to zero. Moreover, the two novel techniques fulfill one of the required condition to assure the convergence of the chain, the so-called diminishing adaptation (see e.g. (Liang et al., 2010, Chapter 8), (Haario et al., 2001; Roberts and Rosenthal, 2007)). The first one is a direct modification of the ARMS procedure that allows us to incorporate support points inside regions where the proposal is below the target with a decreasing probability as the proposal converges to the target. The second one is an adaptive independent MH algorithm that learns from all past generated samples except for the current state of the chain, thus also guaranteeing the convergence of the chain to the invariant density, as shown in (Gasemyr, 2003; Holden et al., 2009). Furthermore, these new strategies also allow us to reduce the complexity in the construction of the proposal pdfs (thus reducing both the effort of writing the code and the computational cost of the resulting algorithm), since they do not require that almost everywhere as in the standard ARMS algorithm in order to guarantee the smooth running of the adaptive mechanism and hence to improve the proposal pdf everywhere in the whole domain . We exemplify this point introducing simpler procedures to construct the sequence of proposal densities and illustrating their good performance through numerical examples.

We remark that, compared to other adaptive Metropolis-Hastings techniques available in literature (Warnes, 2001; Haario et al., 2001, 2006; Cai et al., 2008; Liang et al., 2010), the two new schemes provide a better performance. For instance, two alternative adaptive schemes used to draw samples from one-dimensional target pdfs are the adaptive triangle Metropolis sampling (ATRIMS) and the adaptive trapezoid Metropolis sampling (ATRAMS) proposed in (Cai et al., 2008). However, even though the proposal pdfs are adaptively improved in both of them (ATRIMS and ATRAMS), the sequence of proposals does not converge to the target density, unlike the adaptive strategies proposed in this work. Regarding other adaptive MH algorithms (Warnes, 2001; Haario et al., 2001, 2006; J. M. Keith and Sofronov, 2008; Hanson et al., 2011) only certain parameters of the proposal pdf (which follows a parametric model with a fixed analytical form) are adjusted and optimized, whereas here we improve the entire shape of the proposal density, which becomes closer and closer to the shape of the target density. Finally, another advantage of our schemes is that the proposed algorithms eventually become standard ARS techniques when the constructed proposal pdf is always above the target, i.e. when , thus producing independent samples with acceptance rate that quickly becomes close to one. For all these reasons, drawing from univariate pdfs, our techniques obtain better performance than the other techniques available in literature at the expense of a moderate increase in the computational cost.

The rest of the paper is organized as follows. Background material is presented in Section 2. Then we review and discuss certain limitations of ARMS in Sections 3 and 4. In Section 5 we present the two novel techniques, whereas alternative procedures to construct the proposal pdf are described in Section 6. Finally, numerical simulations are shown in Section 7 and conclusions are drawn in Section 8.

## 2 Background

### 2.1 Notation

We indicate random variables with upper-case letters, e.g. , while we use lower-case letters to denote the corresponding realizations, e.g. . Sets are denoted with calligraphic upper-case letters, e.g. . The domain of the variable of interest, , is denoted as . The normalized target PDF is indicated as , whereas represents the unnormalized target. The normalized proposal PDF is denoted as , whereas is the unnormalized proposal. For simplicity, we also refer to the unnormalized functions and , and in general to all unnormalized but proper PDFs, as densities.

The standard adaptive rejection sampling (ARS) algorithm (Gilks, 1992; Gilks and Wild, 1992; Gilks et al., 1997) enables the construction of a sequence of proposal densities, , tailored to the target density, . Its most appealing feature is that each time that a sample is drawn from a proposal, , and it is rejected, this sample can be used to build an improved proposal, , with a higher mean acceptance rate. Unfortunately, the ARS method can only be applied with target pdfs which are log-concave (and thus unimodal), which is a very stringent constraint for many practical applications (Gilks et al., 1995b; Martino and Míguez, 2011). In this section we briefly review the ARS algorithm, which is the basis for the ARMS method and the subsequent improvements proposed in this paper.

Let us assume that we want to draw samples from a target pdf, , with support , known up a normalization constant. The ARS procedure can be applied when is log-concave, i.e. when

 V(x)≜log(p(x)), (1)

is strictly concave . In this case, let

 St≜{s1,s2,…,smt}⊂D (2)

be the set of support points at time , sorted in ascending order (i.e. ). Note that the number of points can grow with the iteration index . From a piecewise-linear function is constructed such that

 Wt(x)≥V(x), (3)

for all and . Different approaches are possible for constructing this function. For instance, if we denote by the linear function tangent to at (Gilks and Wild, 1992), then a suitable piecewise linear function is:

 Wt(x)≜min{w1(x),…,wmt(x)}  for  x∈D. (4)

In this case, clearly we have , and , as shown in Figure 1(a). However, it is also possible to construct without using the first derivative of , which is involved in the calculation of the tangent lines used in the previous approach, by using secant lines (Gilks, 1992). Indeed, defining the intervals

 Extra open brace or missing close brace (5)

and denoting as the straight line passing through the points and for , a suitable function can be expressed as

 Wt(x)≜⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩L1,2(x),x∈I0=(−∞,s1];L2,3(x),x∈I1=(s1,s2];min{Lj−1,j(x),Lj+1,j+2(x)},x∈Ij=(sj,sj+1], 2≤j≤mt−2;Lmt−2,mt−1(x),x∈Imt−1=(smt−1,smt];Lmt−1,mt(x),x∈Imt=(smt,+∞). (6)

It is apparent also in this case that , and , as shown in Figure 1(b). Therefore, building such that , we have in both cases

 πt(x)≜exp(Wt(x))≥p(x)=exp{V(x)}. (7)

Since is a piecewise linear function, we obtain an exponential-type proposal density, . Note also that knowledge of the area below each exponential piece, for , is required to generate samples from , and, given the functional form of , it is straightforward to compute each of these terms, as well as the normalization constant, with , since .

Therefore, we conclude that, since is a piecewise-exponential function, it is very easy to draw samples from and, since , we can apply the rejection sampling principle. Moreover, when a sample drawn from is rejected, we can incorporate it into the set of support points, i.e. and . Then, we can compute a refined approximation, , following one of the two approaches described above, and a new proposal density, , can be constructed. Table 1 summarizes the procedure followed by the ARS algorithm to generate samples from a target .

It is important to observe that the adaptive strategy followed by ARS includes new support points only when and where they are needed, i.e. when a sample is rejected, indicating that the discrepancy between the shape of the target and the proposal pdfs is potentially large. This discrepancy is measured by computing the ratio . Since approaches when , the ratio becomes closer and closer to one, and the probability of adding a new support point becomes closer and closer to zero, thus ensuring that the computational cost remains bounded. More details about the convergence of ARS are provided below.

### 2.3 Convergence

Note that every time a sample drawn from is rejected, is incorporated as a support point in the new set . As a consequence, a refined lower hull is constructed, yielding a better approximation of the system’s potential function, . In this way, becomes “closer” to and the mean acceptance rate can be expected to become higher. More precisely, the probability of accepting a sample drawn from is

 at(x)≜p(x)πt(x), (8)

and we define the acceptance rate at the -th iteration of the ARS algorithm, denoted as , as the expected value of with respect to the normalized pdf , i.e.,

 ^at≜E[at(x)]=∫Dat(x)~πt(x)dx=∫Dp(x)πt(x)~πt(x)dx=1cπ∫Dp(x)dx=cpcπ≤1, (9)

where and are the proportionality constants for and respectively, i.e.

 cπ=∫Dπt(x)dx,

and

 cp=∫Dp(x)dx,

and we remark that because . Hence, from (9) we conclude that . Equivalently, defining the distance between two curves as

 Dπ|p(t)≜∫D|πt(x)−p(x)|dx=∫D|exp(Wt(x))−exp(V(x))|dx, (10)

then we have . Note that this distance measures the discrepancy between the proposal and the target pdfs. In particular, if decreases the acceptance rate increases, and, since , if, and only if, almost everywhere. Equivalently, if, and only if, almost everywhere.

## 3 Adaptive Rejection Metropolis Sampling Algorithm

The disadvantage of the classical ARS technique is that it can only draw samples from univariate log-concave densities. In general, for non-log-concave and multimodal pdfs it is difficult to build a function that satisfies the condition for all . Recognizing this problem, in (Gilks et al., 1995b) an extension of ARS is suggested to deal with pdfs that are not log-concave by appending a Metropolis-Hastings (MH) algorithm step (Hastings, 1970; Metropolis et al., 1953; Metropolis and Ulam, 1949). The adaptive rejection Metropolis sampling (ARMS) first performs an RS test, and the discarded samples are used to improve the proposal pdf, as in the standard ARS technique. However, if the sample is accepted in the RS test, then the ARMS adds another statistical control using the MH acceptance rule. This guarantees that the accepted samples are distributed according to the target pdf, , even if . Therefore, unlike ARS, the algorithm produces a Markov chain and the resulting samples are correlated. Finally, we remark that the ARMS technique can be seen as an adaptive generalization of the rejection sampling chain proposed in (Tierney, 1991, 1994).

The ARMS algorithm is described in detail in Table 2. The time index denotes the iteration of the chain whereas is the index corresponding to evolution of the sequence of proposal pdfs .

The key stages are steps 4 and 5. On the one hand, in step of Table 2, when a sample is rejected by the RS test (it is possible if and only if ), ARMS adds this new point to the set and uses it to improve the construction of and go back to step 2. On the other hand, when a sample is initially accepted by the RS test (clearly this always happens if ), ARMS enters in step 5 and uses the MH acceptance rule to determine whether it is finally accepted or not. However, notice that ARMS never incorporates this new point to , even if it is finally rejected by the MH step. Observe also that, if and , then the ARMS becomes the standard ARS scheme, since the proposed sample is always accepted in step 5.

Moreover, since depends on the support points, a more rigorous notation would be . However, since never contains the current state of the chain, then the ARMS can be considered an adaptive independent MH algorithm and, for this reason, the dependence on is usually omitted from the notation. Indeed, the ARMS can be also seen as a particular case of the auxiliary variables method in (Besag and Green, 1993).

Finally, we observe that, although in (Gilks et al., 1995b) the ARMS algorithm seems to be tightly linked to a particular mechanism to construct the sequence of the proposal pdfs, in fact these two issues can be studied separately. Therefore, in the following section we describe the two procedures proposed in literature to construct the function in a suitable way (Gilks et al., 1995b; Meyer et al., 2008).

### 3.1 Procedure to build Wt(x) proposed in (Gilks et al., 1995b)

Consider again a set of support points and the intervals , , for and . Moreover, let us denote as the straight line passing through the points and for , and also set

 L−1,0(x)=L0,1(x)≜L1,2(x),
 Lmt,mt+1(x)=Lmt+1,mt+2(x)≜Lmt−1,mt(x).

In the standard ARMS procedure introduced in (Gilks et al., 1995b), the function is piecewise linear and defined as

 Wt(x)=max[Li,i+1(x),min[Li−1,i(x),Li+1,i+2(x)]] for Ii=(si,si+1], (11)

with . The function can be also rewritten in a more explicit form as

 Wt(x)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩L1,2(x),x∈I0=(−∞,s1];max{L1,2(x),L2,3(x)},x∈I1=(s1,s2];max{Lj,j+1(x),min{Lj−1,j(x),Lj+1,j+2(x)}},x∈Ij=(sj,sj+1], 2≤j≤mt−2;max{Lmt−1,mt(x),Lmt−2,mt−1(x)},x∈Imt−1=(smt−1,smt];Lmt−1,mt(x),x∈Imt=(smt,+∞). (12)

Figure 2 illustrates this construction for a generic log-pdf with (a) support points and (b) support points.

It is important to remark that, with this construction, the addition of a new point inside an interval also affects the adjacent regions, and . Therefore, if the initial proposal pdf fulfills the inequality in some subset , the proposal pdf can theoretically be improved in the entire domain . However, in practice the convergence of the proposal to the target pdf can be very slow in some cases, and moreover it cannot be ensured in general (see Section 4 for further considerations).

Note also that, if the target pdf is log-concave, then

 max[Lj,j+1(x),min[Lj−1,j(x),Lj+1,j+2(x)]]=min[Lj−1,j(x),Lj+1,j+2(x)],

for , hence this procedure is reduced to Eq. (6) of the standard ARS technique. It is important to observe that, although can be expressed in the compact form of Eq. (11), drawing samples from the proposal, , requires knowing the equation of each straight line that composes , and also the corresponding definition interval (see Figure 2). Hence, the complexity of drawing from is much larger than what equations (11) or (12) might suggest, since it entails the computation of all the intersection points between all the contiguous straight lines (see Figure 2).

Simpler constructions without this requirement could be proposed (see, for instance, Section 6), but the efficiency of the standard ARMS scheme would be severely compromised, since the good performance of ARMS depends critically on the construction of the proposal, as discussed in the following section and shown in the simulations (see Section 7).

Another possible and more complex procedure for building was proposed in (Meyer et al., 2008), involving quadratic approximations of the log-pdf when it is possible, so that the proposal is also formed by truncated Gaussian pdfs. Indeed, parabolic pieces passing through support points are used if only if the resulting quadratic function is concave (in order to obtain truncated Gaussian as proposal pdf in the corresponding interval). Otherwise, linear functions are used to build . The main advantage of this alternative procedure (Meyer et al., 2008) is that it provides a better approximation of the function . However, it does not overcome the critical structural limitation of the standard ARMS technique that we explain in the sequel.

## 4 Structural limitation in the adaptive procedure of ARMS

Although ARMS can be a very effective technique for sampling from univariate non-log-concave pdfs, its performance depends critically on the following two issues:

• The construction of should be such that the condition is satisfied for most intervals as possible, with . That is, the proposal function must stay above the target pdf , inside as many intervals as possible, covering as much of the domain as possible. In this case the adaptive procedure works almost in the entire domain and the proposal density can be improved virtually everywhere. Indeed, if for all and for all , the ARMS is reduced the classical ARS algorithm.

• The addition of a support point within an interval, , with , must entail an improvement of the proposal pdf inside other neighboring intervals when building . This allows the proposal pdf to improve even inside regions where and a support point can never be added at the -th iteration, since we could have by adding a support point in an adjacent interval. For instance, in the procedure described in Section 3.1 (Gilks et al., 1995b), when a support point is added inside , the proposal pdf also changes in the intervals and . Consequently, the drawback of not adding support points within the intervals where is reduced, but may not completely eliminated, as shown through an example below.

In any case, it is important to remark that the convergence of the proposal to the target pdf , cannot be guaranteed using any suitable construction of except for the special case where and , where ARMS is reduced to ARS. This is due to a fundamental structural limitation of the ARMS technique caused by the impossibility of adding support points inside regions where .

Indeed, it is possible that inside some region where , we obtain a sequence of proposals such that for an arbitrarily large value of , and the discrepancy between the proposal and the target pdfs cannot be reduced after an iteration , clearly implying that the proposal does not converge to the target for .

In order to illustrate this structural problem of ARMS we provide below an example of an even more critical situation where for all , i.e. the proposal pdf does not change within an interval . Consider a multi-modal target density, , with function as shown in Figure 3(a). We build using support points and the procedure of Section 3.1 (Gilks et al., 1995b). Note that we have for all in the interval , as shown in Figure 3(a), where the dashed line depicts the tangent line to at .

Figures 3(b) and 3(c) show that we can incorporate new support points ( in Figure 3(b) and in Figure 3(c)) arbitrarily close to the interval ( in Figures 3(a)-(b)) without altering the construction of within ( in Figures 3(a)-(b)). The reason is described in the sequel.

Consider Figure 3(a). The construction of for all depends on the straight lines passing through and , passing through and , and passing through and . Hence, Eq. (11) within this interval can be written as

 Missing or unrecognized delimiter for \big (13)

Looking back at Figure 3(a) it becomes apparent that and . Therefore, for all , and this situation does not change when new support points are added inside the contiguous intervals, as shown in Figures 3(c)-(d). Indeed, consider now the limit case where two points are incorporated arbitrarily close to and , so that we can consider the new support points located exactly at and . In this extreme situation the secant lines of the adjacent intervals become tangent lines, as shown in Figure 3(d), and the minimum between the two tangent lines is represented by the straight line tangent to . Moreover, note that this tangent line stays always below the secant line passing through and , meaning that even in this extreme case.

Therefore, with the adaptive procedure used by the standard ARMS technique to build the proposal pdf, the function could remain unchanged in a subset of the entire domain that contains a mode of the target pdf, as shown in the previous example. Hence, the discrepancy between the proposal and the target , is not reduced at all, as .

A trivial solution for this drawback of the standard ARMS algorithm could be adding new support points within the set each time that the MH step in the ARMS (step 5) is applied. Unfortunately, from a practical point of view the first problem of this approach is the unbounded increase in computational cost since the number of points in grows indefinitely. Another major problem from a theoretical point of view is that the convergence of the Markov chain to the invariant pdf cannot be ensured in this case, since the current state of the chain could be contained in (see (Liang et al., 2010, Chapter 8), (Holden et al., 2009) for a discussion of this issue).

For these two reasons in the following section we propose two variants of the standard ARMS technique that ensure the convergence of the Markov chain to the target density while keeping the computational cost bounded and low.

## 5 Variants of the ARMS algorithm

In this section we describe two alternative strategies to improve the standard ARMS algorithm. We denote these two variants as ARMS and IARMS where the A emphasizes that we incorporate an additional adaptive step to improve the proposal density w.r.t. the standard ARMS. Note that in this section we use the more rigorous notation for the proposal, instead of simply , for clarity with respect to the convergence of the Markov chain.

### 5.1 First scheme: A2Rms

A first possible enhancement of the ARMS technique is summarized below. The basic underlying idea is enhancing the performance of ARMS by introducing the possibility of adding a sample to the support set even when it is initially accepted by the RS test. The procedure followed allows ARMS to incorporate support points inside regions where in a controlled way (i.e. with a decreasing probability as the Markov chain evolves), thus ensuring the convergence to the target and preventing the computational cost from growing indefinitely.

The steps taken by the ARMS algorithm are the following:

• Set (iteration of the chain), , choose an initial value and the time to stop the adaptation, . Let be the needed number of iterations of the Markov chain.

• Given a set of support points, such that , build an approximation of the potential function using a convenient procedure (e.g. the ones described in (Gilks et al., 1995b; Meyer et al., 2008) or the simpler ones proposed in Section 6).

• Draw a sample from and another sample from .

• If , then discard , set , , sort in ascending order, update , and go back to step 2.

• If , then:

• Set with probability

 α=min[1,p(x′)min[p(xk),πt(xk|St)]p(xk)min[p(x′),πt(x′|St)]], (14)

or with probability .

• If , draw and if

 u2>πt(x′|St)p(x′),

set , and sort in ascending order. Otherwise, set , .

• Update and .

• If , go back to step 2.

We remark again that the key point in the ARMS algorithm is that, due to the introduction of step 5.2 w.r.t. the ARMS, new support points can also added inside the regions of the domain where . Note that, when and we accept the proposed sample in the RS test, then we also apply the MH acceptance rule and always accept , as , but we never incorporate to the set of support points, since and . Therefore, if the condition is satisfied in all the domain , then the ARMS is reduced to the standard ARS, just like it happens for ARMS.

It is important to remark that the effort to code and implement the algorithm is virtually unchanged. Moreover, the ratio in the step 5.2 is already calculated in the RS control (steps 4 and 5), hence it is not necessary to evaluate the proposal and the target pdfs again.

Since it is difficult to guarantee that the Markov chain converge to the target pdf, , we have introduced a time to stop the second adaptation step 5.2. Therefore, theoretically the first samples produced by the algorithm should be discarded. Observe that for the ARMS coincides to the standard ARMS.

The issue with the convergence of the chain is due to the fact that we are incorporating the current state , into the set of support points , on which the proposal depends. Therefore, the balance condition using the acceptance function in Eq. (14) could be not satisfied for and those first samples could be seen just as auxiliary random variables obtained as part of the process to construct a good proposal pdf, and thus should be removed from the set of final returned samples. However, it is important to notice the following two facts:

• It is a common practice with MCMC techniques to remove a certain amount of the firstly generated samples in order to diminish the effect of the so called burn-in period.

• We have found out empirically that, even if we set and use all the samples produced by the Markov chain, the ARMS algorithm also provides very good results, as we show in Section 7. Indeed, the probability of incorporating new support points quickly tends to zero as increases, effectively vanishing as . This is due to the fact that, each time that a new point is added, the proposal density is improved and becomes closer and closer to the target pdf in the whole domain . Therefore, the two ratios and approach one and the probability of adding a new support point becomes arbitrarily close to zero. This implies that the computational cost of the algorithm remains bounded, i.e. a very good approximation of the target can be obtained with a finite number of support points adaptively chosen.

Hence, ARMS satisfies the first condition needed to ensure the convergence of the Markov chain to the target pdf, known as diminishing adaptation (see (Liang et al., 2010, Chapter 8), (Haario et al., 2001; Roberts and Rosenthal, 2007)). Unfortunately, it is difficult to assert whether the ARMS with also fulfills the second condition needed to guarantee the convergence of the chain, called bounded convergence (Liang et al., 2010, Chapter 8). For this reason, although the good results obtained in the numerical simulations with (see Section 7) lead us to believe that convergence occurs, it may be safer to set in order to avoid convergence problems in practical applications.

Furthermore, in the next section we introduce a second ARMS scheme for which convergence to the target can be ensured, since it is an adaptive independent MH algorithm (Gasemyr, 2003; Holden et al., 2009).

### 5.2 Second scheme: Independent A2Rms (Ia2Rms)

A second possible improvement of ARMS is an adaptive independent MH algorithm that we indicate as IARMS. The main modification of IARMS w.r.t. ARMS is building the proposal pdf using possibly all the generated past samples but without taking into account the current state of the chain. The IARMS is described in the following.

• Set (iteration of the chain), and choose an initial value .

• Given a set of support points, , such that , build an approximation of the function using a convenient procedure (e.g. the ones described in (Gilks et al., 1995b; Meyer et al., 2008) or the simpler ones proposed in Section 6).

• Draw a sample from and another sample from .

• If , then discard , set and , sort in ascending order, update and go back to step 2.

• If , then:

• Set and with probability

 α=min[1,p(x′)min[p(xk),πt(xk|St)]p(xk)min[p(x′),πt(x′|St)]], (15)

or and with probability .

• If is not already contained in , draw , and if

 u2>πt(