A Hamiltonian Monte Carlo Method for Non-Smooth Energy Sampling

A Hamiltonian Monte Carlo Method for Non-Smooth Energy Sampling

Lotfi Chaari IEEE Member, Jean-Yves Tourneret, IEEE Senior member, Caroline Chaux, IEEE Senior member, and Hadj Batatia, Member, IEEE
L. Chaari, J.-Y. Tourneret and Hadj Batatia are with the University of Toulouse, IRIT - INP-ENSEEIHT (UMR 5505), 2 rue Charles Camichel, BP 7122, Toulouse Cedex 7 France. E-mail: firstname.lastname@enseeiht.fr.C. Chaux is with LATP and CNRS UMR 7353, Aix-Marseille University, 39 rue F. Joliot-Curie, 13453 Marseille Cedex 13, France. E-mail: caroline.chaux@latp.univ-mrs.fr.
Abstract

Efficient sampling from high-dimensional distributions is a challenging issue which is encountered in many large data recovery problems involving Markov chain Monte Carlo schemes. In this context, sampling using Hamiltonian dynamics is one of the recent techniques that have been proposed to exploit the target distribution geometry. Such schemes have clearly been shown to be efficient for multi-dimensional sampling, but are rather adapted to the exponential families of distributions with smooth energy function. In this paper, we address the problem of using Hamiltonian dynamics to sample from probability distributions having non-differentiable energy functions such as . Such distributions are being more and more used in sparse signal and image recovery applications. The proposed technique uses a modified leapfrog transform involving a proximal step. The resulting non-smooth Hamiltonian Monte Carlo (ns-HMC) method is tested and validated on a number of experiments. Results show its ability to accurately sample according to various multivariate target distributions. The proposed technique is illustrated on synthetic examples and is applied to an image denoising problem.

{keywords}

Sparse sampling, Bayesian methods, MCMC, Hamiltonian, proximity operator, leapfrog.

I Introduction

Sparse signal and image recovery is a hot topic which has gained a lot of interest during the last decades, especially after the emergence of the compressed sensing theory [1]. In addition, most of the recent applications such as remote sensing [2] and medical image reconstruction [3, 4] generate large and intractable data volumes that have to be processed either independently or jointly. To handle such inverse problems, Bayesian techniques have demonstrated their usefulness especially when the model hyperparameters are difficult to fix. Such techniques generally rely on a maximum a posteriori (MAP) estimation built upon the signal/image likelihood and priors. Since efficient priors generally have a complicated form, analytical expressions of the MAP estimators are often difficult to obtain. For this reason, many Bayesian techniques resort to Markov chain Monte Carlo (MCMC) sampling techniques [5]. To handle large-dimensional sampling, several techniques have been proposed during the last decades. In addition to the random walk Metropolis Hastings (MH) algorithm [5], one can mention the work in [6] about efficient high-dimensional importance sampling, the Metropolis-adjusted Langevin algorithm (MALA) of [7, 8], or the high-dimensional Gaussian sampling methods of [9, 10]. To handle log-concave smooth probability distributions, a Hamiltonian Monte Carlo (HMC) sampling technique has recently been proposed in [11, 12]. This technique uses the analogy with the kinetic energy conservation in physics to design efficient proposals that better follow the target distribution’s geometry. HMC has recently been investigated in a number of works dealing with multi-dimensional sampling problems for various applications [13, 14], which demonstrates the efficiency of such sampling schemes. Efficient sampling is obtained using these strategies where the convergence and mixing properties of the simulated chains are improved compared to classical sampling schemes such as the Gibbs and MH algorithms. However, these techniques are only adapted to log-concave probability distributions with smooth energy functions since the gradient of these functions has to be calculated. This constraint represents a real limitation in some applications where sparsity is one of the main processing ingredients especially for large data. Indeed, sparsity promoting probability distributions generally have a non-differentiable energy function such as the Laplace or the generalized Gaussian (GG) [15] distributions which involve and energy functions, respectively. Such distributions have been used as priors for the target signals or images in a number of works where inverse problems are handled in a Bayesian framework [16, 17, 18]. Using the HMC technique in this context is therefore not possible.
This paper presents a modified HMC scheme that makes it possible to sample from log-concave probability distributions with non-differentiable energy functions. The so called non-smooth HMC (ns-HMC) sampling scheme relies on a modified leapfrog transform [11, 12] that circumvents the non-differentiability of the target energy function. The modified leapfrog transform relies on the sub-differential and proximity operator concepts [19]. The proposed scheme is validated on a sampling example where samples are drawn from a GG distribution with different shape parameters. It is also illustrated on a signal recovery problem where a sparse regularization scheme is applied to recover a high-dimensional signal.

The rest of the paper is organized as follows. Section II formulates the problem of non-smooth sampling for large data using Hamiltonian dynamics. Section III presents the proposed ns-HMC sampling scheme. This technique is then validated in Section  IV to illustrate its efficiency for sampling from non-smooth log-concave distributions. Finally, some conclusions and perspectives are drawn in Section V.

Ii Problem formulation

Let us consider a signal of interest and let be its probability distribution function which is parametrized by the parameter vector . In this work, we focus on an exponential family of distributions such that

(1)

where is the energy function. As stated above, in this paper we concentrate on sampling from the class of log-concave probability densities, which means that the energy function is assumed to be convex but not necessarily differentiable. In addition, we will also make the assumption that belongs to , the class of proper lower semi-continuous convex functions from to . Finally, we will consider probability distributions from which direct sampling is not possible and requires the use of an acceptance-rejection step. Example II.1 presents the case of the GG distribution which satisfies the above mentioned assumptions.

Example II.1

Let and two real-positive scalars. The generalized Gaussian distribution is defined by the following probability density function

(2)

for .

Except for particular values of such as , the energy function is not differentiable (where ). In what follows, we are interested in efficiently drawing samples according to the probability distribution defined in (1). The following section describes the proposed non-smooth sampling scheme.

Iii Non-smooth sampling

Iii-a Hamiltonian Monte Carlo methods

HMC methods [11, 12, 14] are powerful tools that use the principle of Hamiltonian dynamics. These methods have been originally proposed by analogy to kinetic energy evolution. They usually provide better mixing and convergence properties than standard schemes. Let us assume that represents a potential energy function. If we associate to a momentum vector , the energy function for the Hamiltonian dynamics is the combination of the potential energy and a kinetic energy , i.e.,

(3)

The energy function is called the Hamiltonian and completely describes the considered system. For simplicity reasons, a quadratic kinetic energy corresponding to a unitary diagonal covariance matrix is usually assumed so that . The Hamiltonian’s motion equations determine the evolution of the state as a function of time  [12]

(4)

where denotes the partial derivative operator. These equations define a transformation that maps the state of the system at time to the state at time . The distribution of the Hamiltonian dynamics energy defined in (3) is therefore given by

(5)

HMC methods iteratively proceed by alternate updates of samples and drawn according to the distribution (III-A). At iteration , the HMC algorithm starts with the current values of vectors and . Two steps have then to be performed. The first one proceeds by an update of the momentum vector leading to by sampling according to the multivariate Gaussian distribution , where is the identity matrix. The second step updates both momentum and position by proposing two candidates and . These two candidates are generated by simulating the Hamiltonian dynamics, which are discretized using some discretization techniques such as the Euler or leapfrog methods. For instance, the discretization can be performed using steps of the leapfrog method with a stepsize . The th leapfrog discretization will be denoted by and can be summarized as follows

(6)
(7)
(8)

After the steps, the proposed candidates are given by and . These candidates are then accepted using the standard MH rule, i.e., with the following probability

(9)

where is the energy function defined in (3).

Iii-B Non-smooth Hamiltonian Monte Carlo schemes

The key step in standard HMC sampling schemes is the approximation of the Hamiltonian dynamics. This approximation allows the random simulation of uncorrelated samples according to a target distribution while exploiting the geometry of its corresponding energy. In this section, we propose two non-smooth Hamiltonian Monte Carlo (ns-HMC) schemes to perform this approximation for non-smooth energy functions. The first scheme is based on the subdifferential operator while the second one is based on proximity operators. For both schemes, the whole algorithm to sample and is detailed in Algorithms 1 and 2. These algorithms describe all the necessary steps to sample from a log-concave target distribution.

Iii-B1 Scheme 1 - subdifferential based approach


Let us first give the following definition of the sub-differential and a useful example.

Definition III.1

[19, p. 223] Let be in , the class of lower semi-continuous convex functions from to . The sub-differential of is the set , where defines the standard scalar product. Every element is a sub-gradient of at point . If is differentiable, the sub-differential reduces to its gradient: .

Example III.1

Let be defined as

(10)

The sub-differential of at is defined by

(11)

If in addition we consider a scalar and we call , then we have for every [19, Prop. 16.5].

For distributions with smooth energy, we propose to use the leapfrog method whose basic form requires to compute the gradient of the potential energy . Since we cannot determine this gradient for non-smooth energy functions, we resort to the following reformulation of the leapfrog scheme by using the concept of sub-differential introduced hereabove

(12)
(13)
(14)

where is sampled uniformly in the sub-differential of . This discretization scheme will be denoted by . If is differentiable, the mapping in (12), (13) and (14) exactly matches the conventional HMC mapping in  (6), (7) and (8).

As for the standard HMC scheme, the proposed candidates are defined by and that can be computed after leapfrog steps. These candidates are then accepted based on the standard MH rule defined in (9). The resulting sampling algorithm is summarized in Algorithm 1.

- Initialize with some .
- Set the iteration number , and ;
for  do
       - Sample ;
       - Compute ;
       - Compute ;
       for  to  do
             * Compute ;
             * Compute ;
            
       end for
      - Compute ;
       - Apply standard MH acceptation/rejection rule by taking and ;
      
end for
Algorithm 1 Gibbs sampler using Hamiltonian dynamics for non-smooth log-concave probability distributions: Scheme 1.

Note that we do not need to account for any additional term in the acceptance ratio in (9) since volume preservation is ensured by the Metropolis update. Volume preservation is equivalent to having an absolute value of the Jacobian matrix determinant for the mapping equal to one, and is due to the fact that candidates are proposed according to Hamiltonian dynamics. More precisely, volume preservation can be easily demonstrated by using the concept of Jacobian matrix approximation [20] such as the Clarke generalization [21], and by conducting calculations similar to [12, Chapter 5, p. 118]. To facilitate the reading of the paper, the volume preservation issue is addressed in Appendix -A.

Iii-B2 Scheme 2 - proximal based approach


Since the calculation of the sub differential is not straightforward for some classes of convex functions, a second scheme modifying the leapfrog steps (12), (13) and (14) can be considered by using the concept of proximity operators. These operators have been found to be fundamental in a number of recent works in convex optimization [22, 23, 24], and more recently in [25] where stochastic proximal algorithms have been investigated. Let us first recall the following definition.

Definition III.2

[19, Definition 12.23][26] Let . For every , the function reaches its infimum at a unique point referred to as proximity operator and denoted by .

Example III.2

For the function defined in Example III.1, the proximity operator is given by

(15)

Many other examples and interesting properties that make this tool very powerful and commonly used in the recent optimization literature are given in [27]. One of these properties in which we are interested here is given in the following property.

Property 1

[28, Prop. 3] Let and . There exists a unique point such that . Using the proximity operator definition hereabove, it turns out that .

By modifying the discretization scheme , we propose the following -th leapfrog discretization scheme denoted by

(16)
(17)
(18)

If is differentiable, the mapping in (16), (17) and (18) exactly matches the mapping in  (6), (7) and (8). The only difference is that the sub-differential of the mapping is evaluated in instead of . As for scheme 1, the proposed candidates are given by and after leapfrog steps. These candidates are then accepted based on the standard MH rule (9).

The Gibbs sampler resulting from the transformation is summarized in Algorithm 2. As well as for Algorithm 1, and due to the presence of the MH acceptance rule, the elements generated by this algorithm are asymptotically distributed according to the target distribution defined in (1).

- Initialize with some .
- Set the iteration number , and ;
for  do
       - Sample ;
       - Compute ;
       - Compute ;
       for  to  do
             * Compute ;
             * Compute ;
            
       end for
      - Compute ;
       - Apply standard MH acceptation/rejection rule by taking and ;
      
end for
Algorithm 2 Gibbs sampler using Hamiltonian dynamics for non-smooth log-concave probability distributions.

Iii-B3 Discussion


Fig. 1 illustrates the use of the adopted discretization schemes associated with algorithms 1 and 2 in approximating a Hamiltonian made up of a quadratic kinetic energy and a potential energy having the following form

(19)

where . For this potential energy, the sub-differential can be analytically calculated and is given by

(20)

where is defined in Example III.1 and is the identity operator. The two proposed algorithms can therefore be compared for this example.

Fig. 1: The potential energy (solid black line) in (19) () and its discretizations using the modified leapfrog schemes (squares) and (circles), as well as the difference between the two discretizations (dashed blue line).

Fig. 1 shows that the discretized energy is close to the continuous one for the two mappings and . Moreover, the slight difference between the two mappings shows that the two discretization schemes perform very similarly close to the critical region of non-differentiability (the interval with small , see the zoom around the origin in Fig. 1). Fig. 2 illustrates the shape of the proximity operator for the considered energy function , as well as the identity function and the difference . This figure clearly shows that, due to the thresholding property of the proximity operator, for . In particular, for the considered example, we have for every . This comparison confirms that the two schemes perform similarly especially close to the non-differentiability point.

Fig. 2: The proximity operator , the identity function () and the difference for .

Since Algorithm 2 is more general than Algorithm 1 and allows us to handle energies for which the sub-differential is not straightforward (while performing well especially close to the critical regions), we will focus on this discretization scheme for our experiments.

Iv Experimental validation

This section validates the proposed ns-HMC scheme for non-smooth log-concave distributions through three experiments. The two first experiments consider the GG distribution whose energy function is non-differentiable for the values of the shape parameter considered here ( and ). For the third experiment, a Laplace distribution (GG distribution with ) is used for an image denoising problem where the clean image is recovered from noisy measurements using a Bayesian regularization scheme involving a sampling technique based on the proposed ns-HMC algorithm.

Iv-a Experiment 1: 1D sampling

In the first experiment, a 1D sampling is performed for different values of the shape and scale parameters of a GG distribution ( and ). Chains generated using the proposed ns-HMC sampling scheme are compared to the ones obtained with a random walk Metropolis-Hastings (rw-MH) scheme. The rw-MH strategy is used here for comparison since it generally improves the mixing properties of the generated samples when compared to a fixed proposal distribution. Let be the current sample and the proposed one. A Gaussian proposal centered on the current sample with unitary variance is used for the rw-MH algorithm, i.e., . Fig. 3 displays the mean square error (MSE) between the target GG pdf and the histogram of the generated samples with respect to the number of sampled coefficients. This figure shows slightly faster convergence for the proposed ns-HMC scheme compared to the rw-MH algorithm.
To further investigate the sampling efficiency, Fig. 4 displays the autocorrelation functions (ACFs) of the sampled chains for the same values of . This figure clearly shows that samples generated using the ns-HMC scheme are less correlated than those generated using rw-MH, which corroborates the fast convergence of the ns-HMC scheme. In fact, the proposed technique does not need any adjustment of the proposal variance contrary to the rw-HM algorithm while giving acceptable level of intra-chain correlation. For the sake of comparison, Fig. 4 also displays the ACFs of chains sampled using a standard MH algorithm with a centered Gaussian proposal (). Indeed, it has been reported that rw-MH increases the correlation level within sampled chains [5], while an MH algorithm provides uncorrelated samples. The comparison between the ACFs corresponding to ns-HMC and MH shows that chains sampled using ns-HMC are as less correlated as the standard MH algorithm with proposal.

,

,

Fig. 3: MSEs between the target 1D GG pdf and the histogram of the generated samples using the rw-MH and ns-HMC algorithms for two different combinations of and .

,

,

Fig. 4: ACFs of sampled chains using Metropolis-Hastings (MH) and random walk Metropolis-Hastings (rw-MH) algorithms, in addition to the proposed method (ns-HMC) for two values of .

Iv-B Experiment 2: multivariate sampling

In this experiment, sampling is performed according to a multivariate GG distribution. First, sampling using rw-MH is performed with a large number of iterations (long burn-in period) so that the target distribution is guaranteed to be reached. The histogram of the obtained samples is calculated and corresponds to our ground truth. Then, samplings using the proposed ns-HMC method and the rw-MH scheme are performed on the same dataset. The MSE between the obtained histograms and the ground truth are computed versus the number of iterations of the sampler. Fig. 5 displays this MSE for the multivariate GG sampling and for different values of the scale and shape parameters ( and ). Note that simulations have been performed for the 2D, 3D and 4D cases.

2D

MSE

MSE

, ,
3D

MSE

MSE

, ,
4D

MSE

MSE

, ,
Fig. 5: MSEs between the target GG pdf and the histogram of the generated samples for the rw-MH and ns-HMC algorithms.

The reported MSE values show that the target distribution is more rapidly approached using ns-HMC compared to rw-MH. Specifically, for the 2D case, the ns-HMC scheme converges after about 500 iterations for (resp. 1000 iterations for ), while the rw-MH sampling needs about 3500 iterations for and . When looking at the same curves in Fig. 5 for the 3D and 4D cases, it is worth noticing that the gap between the two methods in terms of convergence speed increases with the problem dimensionality. This effect is confirmed for the shape and scale parameters. This corroborates the usefulness of the proposed ns-HMC scheme especially for large data sampling where the convergence speed of the standard MH or rw-MH algorithms is altered by the size of the data.

Iv-C Experiment 3: denoising

In this experiment, the performance of the proposed ns-HMC sampling algorithm is analyzed for an image denoising problem. The 2D image of size displayed in Fig. 6[top-left] has been used as a ground truth for this example. An independent identically distributed additive Gaussian noise of variance has been added to this image to obtain the noisy image depicted in Fig. 6[top-right]. The objective of this third experiment is to promote the sparsity of the wavelet coefficients associated with the target image. To this end, we express the image formation model as function of the wavelet coefficients which are related to the ground truth image through the relation where denotes the dual frame operator. The analysis frame operator thus corresponds to and as orthonormal bases are considered here, the dual frame operator reduces to the inverse operator yielding . The observation model can thus be expressed as

(21)

where is the observed image, contains the unknown wavelet coefficients and is the additive noise. Note that the denoised image can be easily recovered from the estimated wavelet coefficients by taking .
Based on this model and the Gaussian likelihood assumption, a hierarchical Bayesian model has been built using an independent Laplace prior for the wavelet coefficients [29, 30]

(22)

where is an unknown parameter that is estimated within the proposed Bayesian algorithm. More precisely, an inverse gamma prior distribution is assigned to [31, 32]

(23)

where is the gamma function, and and are fixed hyperparameters (in our experiments these hyperparameters have been set to ).
Using a Jeffrey’s prior for the noise variance (), the full posterior of this denoising model can be derived. The associated Gibbs sampler generates samples according to the conditional distributions of the posterior. The conditional distribution of the wavelet coefficients writes

(24)

where the energy function is defined by . Sampling according to this distribution is performed using the proposed ns-HMC scheme, which requires the calculation of the proximity operator of its energy function given by

(25)

where and can easily be calculated using standard properties of the proximity operators [23, 33, 19]. See Appendix -B for more details.
Regarding the noise variance and the prior hyperparameter, straightforward calculations lead to the following conditional distributions which are easy to sample

(26)
(27)

where is the inverse gamma distribution. The estimation of the denoised image is performed based on the sampled wavelet coefficients after an appropriate burn-in perior, i.e., after convergence of the Gibbs sampler in Algorithm 3.

- Initialize with some .
for  do
       - Sample according to (26);
       - Sample according to (27);
       - Sample according to its conditional distribution using the proposed ns-HMC scheme;
      
end for
- After convergence, compute the MMSE estimator and return the estimated image .
Algorithm 3 Gibbs sampler for image denoising.

An example of denoised image using Algorithm 3 is displayed in Fig. 6[bottom-left]. For the sake of comparison, a denoised image using the Wiener filter is displayed in Fig. 6[bottom-right]. From a visual point of view, we can easily notice that Algorithm 3 provides a better denoised image compared to the Wiener filter. Quantitatively speaking, the evaluation of the noisy and denoised images is based on both SNR (signal to noise ratio) and SSIM [34] (structural similarity). These values are directly reported in the figure and show the efficiency of the denoising algorithm based on the proposed ns-HMC technique to sample from the conditional distribution of the wavelet coefficients . As regards the computational time, only 1000 iterations are necessary for the proposed algorithm involving a burn-in period of 500 iterations, taking around 9 seconds on a 64-bit 2.00GHz i7-3667U architecture with a Matlab implementation. For the ns-HMC step, the second scheme has been used with .

Reference Noisy SNR = 5.68 dB, SSIM = 0.699
Algorithm 3 SNR = 20.48 dB, SSIM = 0.985 Wiener SNR = 8.44 dB, SSIM = 0.764
Fig. 6: Reference (top-left), noisy (top-right) and denoised images using Algorithm 3 (bottom-left) and the Wiener filter (bottom-right).

V Conclusion

This paper proposed a solution to make feasible the use of Hamiltonian dynamics for sampling according to log-concave probability distributions with non-smooth energy functions. The proposed sampling technique relies on some interesting results from convex optimization and Hamiltonian Monte Carlo methods. More precisely, proximity operators were investigated to address the non-differentiability problem of the energy function related to the target distribution. Validation results showed that the proposed technique provides faster convergence and interesting decorrelation properties for the sampled chains when compared to more standard methods such as the random walk Metropolis Hastings algorithm. The proposed technique was evaluated on synthetic data and applied to an image denoising problem. Our results showed that the use of proximity operators in a Hamiltonian Monte Carlo method allows faster convergence to the target distribution to be obtained. This conclusion is particularly important for large scale data sampling since the gain in convergence speed increases with the problem dimensionality. In a future work, we will focus on the investigation of this technique for sparse signal recovery where a non-tight linear operator is involved in the observation model.

-a Volume preservation


This appendix studies the volume preservation condition both for the Hamiltonian dynamics and the proposed discretization schemes. Volume preservation is a key property of sampling algorithms involving an acceptance-rejection step such as the Metropolis within Gibbs algorithm since it allows simpler acceptance probability to be obtained.

-A1 Volume preservation for the Hamiltonian dynamics


Akin to [12], we consider here volume preservation for Hamiltonian dynamics for the one dimensional case. The multi-dimensional case can then be handled through simple generalizations. Let us denote by (see Section III-A) the mapping between the state at time , denoted by , and the state at time . For small enough, can be approximated by [12]

(28)

where involves terms of order or higher. After replacing the time derivatives of (III-A) in (28), and accounting for the fact that the Hamiltonian may be non-differentiable with respect to , the generalized Jacobian matrix can be written as

(29)

where denotes the sub-gradient and is an element of the second-order sub differential with respect to and . The determinant of this matrix can therefore be written as

(30)

where is the determinant of the matrix . Following the construction proposed in [12], it turns out that for some time interval that is not close to zero, , which means that the transformation ensures volume preservation.

-A2 Volume preservation for the proposed discretization schemes


It is woth noticing here that the two modified leapfrog discretization schemes and defined respectively in (12)-(14) and (16)-(18), as the original leapfrog scheme defined in (6)-(8), preserve volume since they are shear transformations. The interested reader can refer to [12] or [35, page 121] for more details.

-B Proximity operator calculation for the experiment of Section Iv-C


The energy function considered in this appendix is the one involved in the conditional distribution of the wavelet coefficients in (24), i.e.,

(31)

where and . In order to use the proposed ns-HMC sampling algorithm, the proximity operator of the function has to be calculated. Following the standard definition of the proximity operator [26, 23], we can write

(32)

which proves the expresssion of the proximity operator given in (25).

References

  • [1] D. L Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
  • [2] C. Chaux, P. L. Combettes, J.-C Pesquet, and V. Wajs, “Iterative image deconvolution using overcomplete representations,” in Proc. European Signal Processing Conference (EUSIPCO), Florence, Italy, Sep. 4-8 2006.
  • [3] L. Chaari, P. Ciuciu, S. Mériaux, and J.-C. Pesquet, “Spatio-temporal wavelet regularization for parallel MRI reconstruction: application to functional MRI,” Mag. Reson. Mater. in Phys., Biol. and Med. (MAGMA), vol. 27, pp. 509–529, 2014.
  • [4] L. Boubchir and B. Boashash, “Wavelet denoising based on the MAP estimation using the BKF prior with application to images and EEG signals,” IEEE Trans. Signal Process., vol. 61, no. 8, pp. 1880–1894, Apr. 2013.
  • [5] C. Robert and G. Castella, Monte Carlo statistical methods, Springer, New York, 2004.
  • [6] J.-F. Richard and Wei Zhang, “Efficient high-dimensional importance sampling,” J. Econom., vol. 141, no. 2, pp. 1385 – 1411, 2007.
  • [7] G. O. Roberts and R. L. Tweedie, “Exponential convergence of Langevin distributions and their discrete approximations,” Bernouilli, vol. 1, no. 4, pp. 341–363, 1996.
  • [8] M. Pereyra, “Proximal markov chain monte carlo algorithms,” Arxiv, 2013, http://arxiv.org/abs/1306.0187.
  • [9] F. Orieux, O. Féron, and J.-F. Giovannelli, “Sampling high-dimensional Gaussian distributions for general linear inverse problems,” Signal Process. Lett., vol. 19, no. 5, pp. 251–254, May 2012.
  • [10] C. Gilavert, S. Moussaoui, and J. Idier, “Efficient Gaussian Sampling for Solving Large-Scale Inverse Problems using MCMC Methods,” IEEE Trans. Signal Process., vol. 63, no. 1, Aug. 2014.
  • [11] K. M. Hanson, “Markov chain Monte Carlo posterior sampling with the Hamiltonian method,” in SPIE Medical Imaging: Image Processing, M. Sonka and K. M. Hanson, eds., 2001, pp. 456–467.
  • [12] R. M. Neal, “MCMC using Hamiltonian dynamics,” in Handbook of Markov Chain Monte Carlo, G. Jones X. L. Meng S. Brooks, A. Gelman, Ed., chapter 5. Chapman and Hall/CRC Press, 2010.
  • [13] A. Pakman and L. Paninski, “Exact Hamiltonian Monte Carlo for truncated multivariate Gaussians,” http://arxiv.org/abs/1208.4118.
  • [14] Y. Altmann, N. Dobigeon, and J.-Y. Tourneret, “Unsupervised post-nonlinear unmixing of hyperspectral images using a Hamiltonian Monte Carlo algorithm,” IEEE Trans. Image Process., vol. 23, no. 6, pp. 2663–2675, 2014.
  • [15] H. P. Wai and B. D. Jeffs, “Adaptive image restoration using a generalized Gaussian model for unknown noise,” IEEE Trans. Image Process., vol. 4, no. 10, pp. 1451–1456, Oct. 1995.
  • [16] E. P. Simoncelli and E. H. Adelson, “Noise removal via Bayesian wavelet coring,” in IEEE Int. Conf. on Image Process. (ICIP), Lausanne, Switzerland, Sep. 16-19 1996, pp. 379–382.
  • [17] P. Moulin and J. Liu, “Analysis of multiresolution image denoising schemes using generalized-Gaussian priors,” in IEEE Int. Symp. on Time-Frequency and Time-Scale Analysis, Pittsburgh, USA, Oct. 1998, pp. 633–636.
  • [18] L. Chaari, J.-C. Pesquet, J.-Y. Tourneret, Ph. Ciuciu, and A. Benazza-Benyahia, “A hierarchical Bayesian model for frame representation,” IEEE Trans. Signal Process., vol. 18, no. 11, pp. 5560–5571, Nov. 2010.
  • [19] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2011.
  • [20] V. Jeyakumar and D. T. Luc, “Approximate Jacobian matrices for nonsmooth continuous maps and C optimization,” SIAM J. Control and Optim., vol. 36, no. 5, pp. 1815–1832, 1998.
  • [21] F. H. Clarke, Optimization and nonsmooth analysis, Wiley-Interscience, New York, 1983.
  • [22] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model. Simul., vol. 4, pp. 1168–1200, 2005.
  • [23] C. Chaux, P. Combettes, J.-C. Pesquet, and V.R Wajs, “A variational formulation for frame-based inverse problems,” Inv. Prob., vol. 23, no. 4, pp. 1495–1518, Aug. 2007.
  • [24] L. Chaari, J.-C. Pesquet, A. Benazza-Benyahia, and P. Ciuciu, “A wavelet-based regularized reconstruction algorithm for SENSE parallel MRI with applications to neuroimaging,” Med. Image Anal., vol. 15, no. 2, pp. 185–201, Nov. 2011.
  • [25] Y. F. Atchade, G. Fort, and E. Moulines, “On stochastic proximal gradient algorithms,” Arxiv, 2014, http://arxiv.org/abs/1402.2365.
  • [26] J.-J. Moreau, “Proximité et dualité dans un espace hilbertien,” Bulletin de la Société Mathématique de France, vol. 93, pp. 273–299, 1965.
  • [27] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, P. L. Combettes V. Elser D. R. Luke H. H. Bauschke, R. Burachik and H. Wolkowicz, Eds., pp. 185–212. Springer-Verlag, New York, 2011.
  • [28] J. Douglas and H. H. Rachford, “On the numerical solution of heat conduction problems in two or three space variables,” Trans. Amer. Math. Soc., vol. 82, pp. 421–439., 1956.
  • [29] M. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, 2008.
  • [30] M. Seeger, “Bayesian inference and optimal design in the sparse linear model,” J. Mach. Learn. Res., vol. 9, pp. 759–813, 2008.
  • [31] L. Chaari, J.-Y. Tourneret, and H. Batatia, “Sparse Bayesian regularization using Bernoulli-Laplacian priors,” in Proc. European Signal Process. Conf. (EUSIPCO), Marrakech, Morocco, September 9-13, 2013.
  • [32] N. Dobigeon, A. O. Hero, and J.-Y. Tourneret, “Hierarchical Bayesian sparse image reconstruction with application to MRFM,” IEEE Trans. Image Process., vol. 19, no. 9, pp. 2059–2070, Sept. 2009.
  • [33] P. L. Combettes and J.-C. Pesquet, “A proximal decomposition method for solving convex variational inverse problems,” Inv. Prob., vol. 24, no. 6, Dec. 2008, 27 p.
  • [34] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
  • [35] S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Handbook of Markov Chain Monte Carlo, Chapman & Hall/CRC, Boston, USA, 2011.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
48448
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description