Nonreversibly updating a uniform value
for Metropolis accept/reject decisions
Radford M. Neal
[4pt] 31 January 2020
Abstract. I show how it can be beneficial to express Metropolis accept/reject decisions in terms of comparison with a uniform value, , and to then update nonreversibly, as part of the Markov chain state, rather than sampling it independently each iteration. This provides a small improvement for random walk Metropolis and Langevin updates in high dimensions. It produces a larger improvement when using Langevin updates with persistent momentum, giving performance comparable to that of Hamiltonian Monte Carlo (HMC) with long trajectories. This is of significance when some variables are updated by other methods, since if HMC is used, these updates can be done only between trajectories, whereas they can be done more often with Langevin updates. I demonstrate that for a problem with some continuous variables, updated by HMC or Langevin updates, and also discrete variables, updated by Gibbs sampling between updates of the continuous variables, Langevin with persistent momentum and nonreversible updates to samples nearly a factor of two more efficiently than HMC. Benefits are also seen for a Bayesian neural network model in which hyperparameters are updated by Gibbs sampling.
1 Introduction
In the Metropolis MCMC method, a decision to accept or reject a proposal to move from to can be done by checking whether , where is the density function for the distribution being sampled, and is a uniform random variable. Standard practice is to generate a value for independently for each decision. I show here that it can be beneficial to instead update each iteration without completely forgetting the previous value, using a nonreversible method.
Doing nonreversible updates to will not change the fraction of proposals that are accepted, but can result in acceptances being clustered together (with rejections similarly clustered). This can be beneficial, for example, when rejections cause reversals of direction, as in Horowitz’s (1991) variant of Langevin updates for continuous variables in which the momentum “persists” between updates. Compared to the alternative of Hamiltonian Monte Carlo (HMC) with long trajectories (Duane, Kennedy, Pendleton, and Roweth 1987; reviewed in Neal 2011), Langevin updates allow other updates, such as for discrete variables, to be done more often, which I will show can produce an advantage over HMC if persistent momentum is used in conjunction with nonreversible updates for .
2 The New Method for Accept/Reject Decisions
For any MCMC method, we can augment the variable of interest, , with density , by a variable , whose conditional distribution given is uniform over . The resulting joint distribution for and is uniform over the region where . This is the view underlying “slice sampling” (Neal 2003), in which is introduced temporarily, by sampling uniformly from , and then forgotten once a new has been chosen. Metropolis updates can also be viewed in this way, with the new found by accepting or rejecting a proposal, , according to whether , with newly sampled from .
However, it is valid to instead retain in the state between updates, utilizing its current value for accept/reject decisions, and updating this value when desired by any method that leaves invariant the uniform distribution on , which is the conditional distribution for given implied by the uniform distribution over for which .
Equivalently, one can retain in the state a value whose distribution is uniform over , independent of , with corresponding to — the state is just another way of viewing the state . Accept/reject decisions are then made by checking whether . Note, however, that if is accepted, must then be updated to , which corresponds to not changing.
Here, I will consider nonreversible updates for , which translate it by some fixed amount, , and perhaps add some noise, reflecting off the boundaries at and . It is convenient to express such an update with reflection in terms of a variable that is uniform over , and define . An update for can then be done as follows:
(1)  
For any and any distribution for the noise (not depending on the current value of ), this update leaves the uniform distribution over invariant, since it is simply a wrapped translation, which has Jacobian of magnitude one, and hence keeps the probability density for the new the same as the old (both being ).
The full state consists of and , with having density and, independently, being uniform over , which corresponds to the conditional distribution (given ) of being uniform over . If a proposed move from to is accepted we must change as follows:
(2) 
This leaves unchanged, and hence the reverse move would also be accepted, as necessary for reversibility. Because of this change to on acceptance, when varies continuously, it may not be necessary to include noise in the update for in (1), but if has only a finite number of possible values, adding noise may be necessary to ensure ergodicity.
The hope with these nonreversible updates is that will move slowly (if and the noise amount are small) between values near 0, where acceptance is easy, and values near 1, where acceptance is harder. (But note that may change in either direction when proposals are accepted, though will not change.) Nonreversibly updating will not change the overall acceptance rate, since the distribution of is still uniform over , independent of , but it is expected that acceptances and rejections will become more clustered — with an accepted proposal more likely to be followed by another acceptance, and a rejected proposal more likely to be followed by another rejection.
We might wish to intermingle Metropolis updates for that use to decide on acceptance with other sorts of updates for — for example, Gibbs sampling updates, or Metropolis updates accepted in the standard fashion. We can do these updates while ignoring , and then simply resume use of afterwards, since is independent of . We could indeed include several independent variables in the state, using different values for different classes of updates, but this generalization is not explored here.
3 Results for randomwalk Metropolis updates in high dimensions
A small benefit from nonreversible updating of can be seen with simple randomwalk Metropolis updates. Such updates operate as follows:

Propose , where is a stepsize parameter that needs to be tuned.

Accept as next state if ,; otherwise let .
Figure 1 compares results of such updates when is sampled independently each iteration versus being updated nonreversibly as described in Section 2, when sampling a 40dimensional Gaussian distribution with identity covariance matrix. When estimating the mean of a single coordinate, little difference is seen, but for estimating the mean of the log of the probability density, the nonreversible method with is times better.
One possible explanation for the better performance with nonreversible updates is that, as noted by Caracciolo, et al (1994), the performance of Metropolis methods in high dimensions is limited by their ability to sample different values for the log of the density. In dimensions, the log density typically varies over a range proportional to . A Metropolis update will typically change the log density only by about one — larger decreases in the log density are unlikely to be accepted, and it follows from reversibility that increases in the log density of much more than one must also be rare (once equilibrium has been reached). Since standard Metropolis methods are reversible, these changes of order one will occur in a random walk, and so around steps will be needed to traverse the range of log densities of about , limiting the speed of mixing.
The gain seen from using nonreversible updates for may come from helping with this problem. When is small few proposals will be rejected, and the chain will tend to drift towards smaller values for the log density, with the opposite behaviour at times when is near one. This could reduce the random walk nature of changes in the log density. Note, however, that this effect is limited by the changes to (and hence ) required by equation (2).
4 Application to Langevin updates with persistent momentum
I obtained more interesting results when using nonreversible updates of in conjunction with the onestep, nonreversible version of Hamiltonian Monte Carlo (Duane, et al 1987) due to Horowitz (1991). This method is a form of “Langevin” update in which the momentum is only partially replaced in each iteration, so that the method tends to move persistently in the same direction for many iterations. I briefly present this method below; it is discussed in more detail in my review of Hamiltonian Monte Carlo methods (Neal 2011, Section 5.3).
Hamiltonian Monte Carlo (HMC) works in a state space extended with momentum variables, , of the same dimension as , which are newly sampled each iteration. The distribution for is defined in terms of a “potential energy” function , with . HMC proposes a new value for by simulating Hamiltonian dynamics with some number of “leapfrog” steps (and then negating , so the proposal is reversible). A leapfrog step from to has the form
(3)  
such steps may be done to go from the current state to the proposal . If is large, but is sufficiently small (so acceptance is likely), a proposed point can be accepted that is distant from the current point, avoiding the slowdown from doing a random walk with small steps. In standard HMC, is sampled independently at each iteration from the distribution, randomizing the direction in which the next trajectory starts out.
The standard Langevin method is equivalent to HMC with . This method makes use of gradient information, , which gives it an advantage over simple randomwalk Metropolis methods, but it does not suppress random walks, since is randomized after each step.
In Horowitz’s method, only one leapfrog step is done each iteration, as for standard Langevin updates, but a trick is used so that these steps nevertheless usually keep going in the same direction for many iterations, except on a rejection. The updates with this method operate as follows:

Set , where , and is a tuning parameter. Set .

Find from with one leapfrog step, with stepsize , followed by negation of .

Accept if ; otherwise .

Let and .
For slightly less than 1, Step (1) only slightly changes . If Step (3) accepts, the negation in the proposal is canceled by the negation in Step (4). But a rejection will reverse , leading the chain to almost double back on itself. Setting to zero gives the equivalent of the standard Langevin method.
Unfortunately, even with this nonreversibility trick, and an optimallychosen , Horowitz’s method is not as efficient as HMC with long trajectories. To reduce the rejection rate, and hence randomwalkinducing reversals of direction, a small, inefficient stepsize () is needed.
But a higher rejection rate would be tolerable if rejections cluster in time, producing long runs of no rejections. For example, rejectionfree runs of 20, 0, 20, 0, …are better than rejectionfree runs of 10, 10, 10, 10, …, since of the former runs will potentially move via a random walk a distance proportional to , whereas of the latter runs will move only a distance proportional to .
We may hope that such clustering of rejections will be obtainable
using nonreversible updates of , with consequent improvement in
the performance of Langevin with persistent momentum.
5 Results for Langevin updates with persistent momentum
To test whether nonreversible updating of can improve the Langevin method with persistent momentum, I first tried sampling a multivariate Gaussian distribution consisting of 16 independent pairs of variables having variances of 1 and correlations of 0.99 (i.e., a 32dimensional Gaussian with blockdiagonal covariance matrix). For this test, the methods are not set up to use knowledge of this correlation structure, mimicking problems in which the dependencies take a more complex, and unknown, form.
The high correlation within each pair and (moderately) high dimensionality limit the stepsize () that can be used (while avoiding a high rejection rate). This test is therefore representative of problems in which sampling is much more efficient when many consecutive small steps to go in the same direction, rather than doing a random walk.
Figure 2 shows the results. The centre plots show that the run using a nonreversible update for (with , , and ) is times better at estimating the mean of the log probability density than the best run of the standard persistent Langevin method (with and ). The right plots show that rejections are indeed clustered when nonreversible updates are used, which reduces random walks.
For comparison, I tried HMC for this problem, with stepsizes of and trajectories with the number of leapfrog steps being . The autocorrelation time for the log probability density was found for groups of HMC updates, giving results that are roughly comparable to those shown in Figure 2. The best performance was for and , for which the autocorrelation time was 2.04 (though it’s possible that better results might be found with a search over a finer grid for ). This is worse than the best result of 1.69 for persistent Langevin with nonreversible updates of , using the stepsize of , which interestingly is close to the optimal stepsize for HMC. Use of a nonreversible update for has allowed persistent Langevin to match or exceed the performance of HMC for this problem.
I also used these methods to sample from a mixed continuous/discrete distribution, which may be specified as follows (with variables being conditionally independent except as shown):
This distribution is intended to mimic more complex distributions having both continuous and discrete variables, which are dependent. I will consider only MCMC methods that alternate updates for the continuous variables, and , with updates for the binary variables, (by Gibbs sampling).
Since the marginal distribution of is known to be , the quality of the sampling for is easily checked. Because of the high dependence between and the 20 binary variables, we might expect that a method that frequently updates both the continuous and the binary variables will do better than a method, such as HMC with long trajectories, in which the binary variables are updated infrequently — i.e., only after substantial computation time is spent updating the continuous variables.
In assessing performance, I assume that computation time is dominated by updates of the continuous variables, which in a real application would be more numerous, and in particular by evaluation of their log density, conditional on the current values of the binary variables. (This is not always actually the case here, however, due to the problem simplicity and to avoidable inefficiencies in the software used.)
I tried HMC with trajectories of 30, 40, or 60 leapfrog steps (), and stepsize () of 0.025, 0.030, 0.035, 0.040, and 0.045. Gibbs sampling updates were done between trajectories. Autocorrelation times were found for a group of HMC trajectories. Persistent Langevin with nonreversible updates of was tried with of 0.003, 0.005, 0.010, and 0.015, each with all combinations of in 0.98, 0.99, 0.995, 0.9975, 0.9985, and 0.9990 and in 0.015, 0.020, 0.025, 0.030, 0.040, and 0.050. Gibbs sampling updates were done after every tenth update. Autocorrelation times were found for a group of 60 Langevin updates — note that this involves half as many log density gradient evaluations as for a group of HMC iterations.
Performance was evaluated in terms of estimating the mean of the indicator for being in . Autocorrelation time for this function was minimized for HMC with and , with the minimum being 1.53. Autocorrelation time was minimized for persistent Langevin with nonreversible updates of with , , and , with the minimum being 1.67. Considering that the HMC groups require twice as many gradient evaluations, persistent Langevin with nonreversible updates of is times more efficient than HMC for this problem.
6 Use for Bayesian neural networks
I have also tried using the persistent Langevin method with nonreversible updates for to sample the posterior distribution of a Bayesian neural network model. Such models (Neal 1995) typically have hyperparameters controlling the variance of groups of weights in the network. It is convenient to use Gibbs sampling updates for these hyperparameters, alternating such updates with HMC updates for the network weights. However, when long trajectories are used for HMC, as is desirable to reduce randomwalk behaviour, the Gibbs sampling updates for hyperparameters are done infrequently. Using persistent Langevin updates for weights would allow hyperparameters to be updated more frequently, perhaps speeding overall convergence. We may hope that this will work better with a nonreversible update for .
I tested this approach on a binary classification problem, with 5 inputs, and 300 training cases. A network with one hidden layer of 12 tanh hidden units was used. Only two of the inputs for this problem were relevant, with two more being slightly noisy versions of the two relevant inputs, and one input being independent noise. Five separate hyperparameters controlled the variances of weights out of each input.
For the besttuned persistent Langevin method with nonreversibe update of , the average autocorrelation time for the four plausiblyrelevant input hyperparameters was 1.25 times smaller than for the besttuned HMC method. This is an encouraging preliminary result, but tests with more complex networks are needed to assess how much Bayesian neural network training can be improved in this way.
7 References

Caracciolo, S, Pelisseto, A, and Sokal, A. D. (1994) “A general limitation on Monte Carlo algorithms of Metropolis type”, Physical Review Letters, vol. 72, pp. 179182. Also available at arxiv.org/abs/heplat/9307021

Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. (1987) “Hybrid Monte Carlo”, Physics Letters B, vol. 195, pp. 216222.

Horowitz, A. M. (1991) “A generalized guided Monte Carlo algorithm”, Physics Letters B, vol. 268, pp. 247252.

Neal, R. M. (2003) “Slice sampling” (with discussion), Annals of Statistics, vol. 31, pp. 705767.

Neal, R. M. (2011) “MCMC using Hamiltonian dynamics”, in the Handbook of Markov Chain Monte Carlo, S. Brooks, A. Gelman, G. L. Jones, and X.L. Meng (editors), Chapman & Hall / CRC Press, pp. 113162. Also available at arxiv.org/abs/1206.1901

Neal, R. M. (1995) Bayesian Learning for Neural Networks, Ph.D. Thesis, Dept. of Computer Science, University of Toronto, 195 pages.
Appendix — Scripts used to produce results
The experimental results reported were obtained using my Software for Flexible Bayesian Modeling (FBM, the release of 20200124), available at www.cs.utoronto.ca/radford/fbm.software.html or at gitlab.com/radfordneal/fbm. This software takes the form of a number of programs that are run using Unix/Linux bash shell scripts. Here, I will show scripts that can be used to produce single points in the figures, omitting the more complex scripts and programs used to run the methods with multiple parameter settings and plot the results.
Scripts for randomwalk Metropolis in high dimensions
For Figure 1, the following commands may be used to produce the results for standard Metropolis sampling with stepsize of 1.8 (before scaling):
dim=40 stepunscaled=1.8 step=‘calc "$stepunscaled/Sqrt($dim)"‘ len=1001000 echo "STANDARD METROPOLIS" log=figmet.log bvgspec $log 1 1 0 ‘calc $dim/2‘ mcspec $log repeat $dim metropolis $step bvgmc $log $len echo n "Rejection Rate: " bvgtbl r $log 1001:  series m  tail 1  sed "s/.*mean://" echo n "Autocorrelation time for coordinate: " bvgtbl q1 $log 1001:  series mvsac 10 0  tail 1  sed "s/.* //" echo n "Autocorrelation time for log prob: " bvgtbl E $log 1001:  series mvsac 10 ‘calc $dim/2‘  tail 1  sed "s/.* //"
which produce the following output:
STANDARD METROPOLIS Rejection Rate: 0.626588 Autocorrelation time for coordinate: 3.475440 Autocorrelation time for log prob: 3.470835
The results in Figure 1 for Metropolis updates with nonreversible updates of , with stepsize of 1.8 (before scaling) and of 0.3, may be produced with the following commands:
dim=40 stepunscaled=1.8 step=‘calc "$stepunscaled/Sqrt($dim)"‘ delta=0.3 len=1001000 echo "METROPOLIS WITH NONREVERSIBLE UPDATE OF U" log=figmetnrevu.log bvgspec $log 1 1 0 ‘calc $dim/2‘ mcspec $log slevel $delta repeat $dim metropolis $step bvgmc $log $len echo n "Rejection Rate: " bvgtbl r $log 1001:  series m  tail 1  sed "s/.*mean://" echo n "Autocorrelation time for coordinate: " bvgtbl q1 $log 1001:  series mvsac 10 0  tail 1  sed "s/.* //" echo n "Autocorrelation time for log prob: " bvgtbl E $log 1001:  series mvsac 10 ‘calc $dim/2‘  tail 1  sed "s/.* //"
which produce the following output:
METROPOLIS WITH NONREVERSIBLE UPDATE OF U Rejection Rate: 0.626545 Autocorrelation time for coordinate: 3.487568 Autocorrelation time for log prob: 3.028137
In both scripts, 1001000 groups of 40 updates are simulated, with the first 1000 being discarded as burnin, and the remainder used to estimate rejection rate and autocorrelation times. The autocorrelation times are found using estimated autocorrelations up to lag 10, with the true means for a coordinate and for the log probability density being used for these estimates. (Note: Rather than the log probability density itself, the “energy” is used, which gives equivalent results since it is a linear function of the log probability density.)
Scripts for persistent Langevin and HMC for replicated bivariate Gaussian
For Figure 2, the following commands may be used to produce the results for standard persistent Langevin sampling with stepsize of 0.10 (before scaling) and an value derived from 0.4:
dim=32 stepunscaled=0.10 alphabase=0.4 step=‘calc "$stepunscaled/Exp(Log($dim)/6)"‘ rep=‘calc "v=10*Exp(Log($dim)/3)" "vFrac(v)"‘ decay=‘calc "Exp(Log($alphabase)*$step)"‘ len=101000 echo "STANDARD PERSISTENT LANGEVIN" log=figplang.log bvgspec $log 1 1 0.99 ‘calc $dim/2‘ mcspec $log repeat $rep heatbath $decay hybrid 1 $step negate bvgmc $log $len echo n "Rejection Rate " bvgtbl r $log 1001:  series m  tail 1  sed "s/.*mean://" echo n "Autocorrelation time for coordinate: " bvgtbl q1 $log 1001:  series vsac 10 0  tail 1  sed "s/.* //" echo n "Autocorrelation time for log prob: " bvgtbl E $log 1001:  series vsac 10 ‘calc $dim/2‘  tail 1  sed "s/.* //"
which produce the following output:
STANDARD PERSISTENT LANGEVIN Rejection Rate 0.069295 Autocorrelation time for coordinate: 6.875574 Autocorrelation time for log prob: 2.727262
The results in Figure 2 for persistent Langevin updates with nonreversible updates of , with stepsize of 0.12 (before scaling), an value derived from 0.5, and of 0.03, may be produced with the following commands:
dim=32 stepunscaled=0.12 alphabase=0.5 step=‘calc "$stepunscaled/Exp(Log($dim)/6)"‘ rep=‘calc "v=10*Exp(Log($dim)/3)" "vFrac(v)"‘ decay=‘calc "Exp(Log($alphabase)*$step)"‘ delta=0.03 len=101000 echo "PERSISTENT LANGEVIN WITH NONREVERSIBLE UPDATE OF U" log=figplang.log bvgspec $log 1 1 0.99 ‘calc $dim/2‘ mcspec $log slevel $delta repeat $rep heatbath $decay hybrid 1 $step negate bvgmc $log $len echo n "Rejection Rate " bvgtbl r $log 1001:  series m  tail 1  sed "s/.*mean://" echo n "Autocorrelation time for coordinate: " bvgtbl q1 $log 1001:  series vsac 10 0  tail 1  sed "s/.* //" echo n "Autocorrelation time for log prob: " bvgtbl E $log 1001:  series vsac 10 ‘calc $dim/2‘  tail 1  sed "s/.* //"
which produce the following output:
PERSISTENT LANGEVIN WITH NONREVERSIBLE UPDATE OF U Rejection Rate 0.119244 Autocorrelation time for coordinate: 2.827302 Autocorrelation time for log prob: 1.686796
Note that autocorrelation times for coordinates are not plotted in Figure 2 because, due to the ability of persistent Langevin methods to produce negative correlations, the coordinate autocorrelation times approach zero as goes to zero and goes to one. So there is no optimal choice of parameters by this criterion (which in this limit does not represent practical utility).
The HMC comparison results were obtained with commands such as the following (for stepsize of 0.07 and 16 leapfrog steps):
dim=32 step=0.07 leaps=16 rep=‘calc 32/$leaps‘ len=101000 echo "HMC" log=hmccompare.log bvgspec $log 1 1 0.99 ‘calc $dim/2‘ mcspec $log repeat $rep heatbath hybrid $leaps $step:30 bvgmc $log $len echo n "Rejection Rate " bvgtbl r $log 1001:  series m  tail 1  sed "s/.*mean://" echo n "Autocorrelation time for coordinate: " bvgtbl q1 $log 1001:  series vsac 10 0  tail 1  sed "s/.* //" echo n "Autocorrelation time for log prob: " bvgtbl E $log 1001:  series vsac 10 ‘calc $dim/2‘  tail 1  sed "s/.* //"
which produce the following output:
HMC Rejection Rate 0.142875 Autocorrelation time for coordinate: 3.364492 Autocorrelation time for log prob: 2.038866
In this script, in order to avoid possible periodicity effects, the leapfrog stepsize for HMC is randomized for each trajectory by multiplying the nominal stepsize by the reciprocal of the square root of a Gamma random variable with mean 1 and shape parameter 30/2.
Scripts for persistent Langevin and HMC for mixed continuous/discrete distribution
The results in Section 5 for persistent Langevin with nonreversible update of applied to a mixed continuous/discrete distribution, with , , and , may be produced with the following commands:
delta=0.010 alpha=0.995 step=0.030 echo "PERSISTENT LANGEVIN / NONREVERSIBLE UPDATE OF U FOR MIXED DISTRIBUTION" log=mixedplang.log distspec $log "u ~ Gaussian(0,1) + v ~ Gaussian(u,0.04^2) + \ y0 ~ Bernoulli(1/(1+Exp(u))) + \ y1 ~ Bernoulli(1/(1+Exp(u))) + \ y2 ~ Bernoulli(1/(1+Exp(u))) + \ y3 ~ Bernoulli(1/(1+Exp(u))) + \ y4 ~ Bernoulli(1/(1+Exp(u))) + \ y5 ~ Bernoulli(1/(1+Exp(u))) + \ y6 ~ Bernoulli(1/(1+Exp(u))) + \ y7 ~ Bernoulli(1/(1+Exp(u))) + \ y8 ~ Bernoulli(1/(1+Exp(u))) + \ y9 ~ Bernoulli(1/(1+Exp(u))) + \ z0 ~ Bernoulli(1/(1+Exp(u))) + \ z1 ~ Bernoulli(1/(1+Exp(u))) + \ z2 ~ Bernoulli(1/(1+Exp(u))) + \ z3 ~ Bernoulli(1/(1+Exp(u))) + \ z4 ~ Bernoulli(1/(1+Exp(u))) + \ z5 ~ Bernoulli(1/(1+Exp(u))) + \ z6 ~ Bernoulli(1/(1+Exp(u))) + \ z7 ~ Bernoulli(1/(1+Exp(u))) + \ z8 ~ Bernoulli(1/(1+Exp(u))) + \ z9 ~ Bernoulli(1/(1+Exp(u)))" mcspec $log slevel $delta \ repeat 6 repeat 10 heatbath $alpha hybrid 1 $step 0:1 negate end \ binarygibbs 2:21 distmc $log 200000 echo n "Rejection Rate: " disttbl r $log 1001:  series m  tail 1  sed "s/.*mean://" echo n "Autocorrelation time for I(0.5,1.5): " (echo "u<scan(quiet=T)"; \ disttbl u $log 1001:; echo ""; \ echo "write.table(as.numeric(u>0.5&u<1.5),row.names=F,col.names=F)" \ )  R slave vanilla  series vsac 15 0.6246553  tail 1  sed "s/.* //"
which produce the following output:
PERSISTENT LANGEVIN / NONREVERSIBLE UPDATE OF U FOR MIXED DISTRIBUTION Rejection Rate: 0.093834 Autocorrelation time for I(0.5,1.5): 1.666017
Note that each group of iterations performs 60 leapfrog steps.
The results for HMC, with 40 leapfrog steps and , may be produced with the following commands:
leaps=40 step=0.035 rep=‘calc 120/$leaps‘ echo "HMC FOR MIXED DISTRIBUTION" log=mixedhmc.log distspec $log "u ~ Gaussian(0,1) + v ~ Gaussian(u,0.04^2) + \ y0 ~ Bernoulli(1/(1+Exp(u))) + \ y1 ~ Bernoulli(1/(1+Exp(u))) + \ y2 ~ Bernoulli(1/(1+Exp(u))) + \ y3 ~ Bernoulli(1/(1+Exp(u))) + \ y4 ~ Bernoulli(1/(1+Exp(u))) + \ y5 ~ Bernoulli(1/(1+Exp(u))) + \ y6 ~ Bernoulli(1/(1+Exp(u))) + \ y7 ~ Bernoulli(1/(1+Exp(u))) + \ y8 ~ Bernoulli(1/(1+Exp(u))) + \ y9 ~ Bernoulli(1/(1+Exp(u))) + \ z0 ~ Bernoulli(1/(1+Exp(u))) + \ z1 ~ Bernoulli(1/(1+Exp(u))) + \ z2 ~ Bernoulli(1/(1+Exp(u))) + \ z3 ~ Bernoulli(1/(1+Exp(u))) + \ z4 ~ Bernoulli(1/(1+Exp(u))) + \ z5 ~ Bernoulli(1/(1+Exp(u))) + \ z6 ~ Bernoulli(1/(1+Exp(u))) + \ z7 ~ Bernoulli(1/(1+Exp(u))) + \ z8 ~ Bernoulli(1/(1+Exp(u))) + \ z9 ~ Bernoulli(1/(1+Exp(u)))" mcspec $log repeat $rep heatbath hybrid $leaps $step:10 0:1 binarygibbs 2:21 distmc $log 200000 echo n "Rejection Rate: " disttbl r $log 1001:  series m  tail 1  sed "s/.*mean://" echo n "Autocorrelation time for I(0.5,1.5): " (echo "u<scan(quiet=T)"; \ disttbl u $log 1001:; echo ""; \ echo "write.table(as.numeric(u>0.5&u<1.5),row.names=F,col.names=F)" \ )  R slave vanilla  series vsac 15 0.6246553  tail 1  sed "s/.* //"
which produce the following output:
HMC FOR MIXED DISTRIBUTION Rejection Rate: 0.171698 Autocorrelation time for I(0.5,1.5): 1.527655
Note that each group of iterations performs 120 leapfrog steps, and hence (is assumed to) take twice as long as for the persistent Langevin script.
Footnotes
 An earlier version of this work was presented at the 2nd Symposium on Advances in Approximate Bayesian Inference, Vancouver, British Columbia, 8 December 2019.
 Dept. of Statistical Sciences and Dept. of Computer Science, University of Toronto, Vector Institute Affiliate. http://www.cs.utoronto.ca/radford/, radford@stat.utoronto.ca
 Nonreversible updates of can also benefit the standard Langevin method. In particular, when sampling from the 40dimensional Gaussian of Section 3, using nonreversible updates of (with ) results in a factor of 1.19 lower autocorrelation time for the log probability density compared to the standard Langevin method.