A Hamiltonian Monte Carlo Method for NonSmooth Energy Sampling
Abstract
Efficient sampling from highdimensional distributions is a challenging issue which is encountered in many large data recovery problems involving Markov chain Monte Carlo schemes. In this context, sampling using Hamiltonian dynamics is one of the recent techniques that have been proposed to exploit the target distribution geometry. Such schemes have clearly been shown to be efficient for multidimensional sampling, but are rather adapted to the exponential families of distributions with smooth energy function. In this paper, we address the problem of using Hamiltonian dynamics to sample from probability distributions having nondifferentiable energy functions such as . Such distributions are being more and more used in sparse signal and image recovery applications. The proposed technique uses a modified leapfrog transform involving a proximal step. The resulting nonsmooth Hamiltonian Monte Carlo (nsHMC) method is tested and validated on a number of experiments. Results show its ability to accurately sample according to various multivariate target distributions. The proposed technique is illustrated on synthetic examples and is applied to an image denoising problem.
Sparse sampling, Bayesian methods, MCMC, Hamiltonian, proximity operator, leapfrog.
I Introduction
Sparse signal and image recovery is a hot topic which has gained a lot of interest during the last decades, especially after the
emergence of the compressed sensing theory [1]. In addition, most of the recent applications such as
remote sensing [2] and medical image
reconstruction [3, 4]
generate large and intractable data volumes that have to be processed either independently or jointly. To handle such inverse problems, Bayesian techniques have
demonstrated their usefulness especially when the model hyperparameters are difficult to fix. Such techniques generally rely on
a maximum a posteriori (MAP) estimation built upon the signal/image likelihood and priors. Since efficient priors generally have a
complicated form, analytical expressions of the MAP estimators are often difficult to obtain.
For this reason, many Bayesian techniques
resort to Markov chain Monte Carlo (MCMC) sampling techniques [5].
To handle largedimensional sampling, several techniques have
been proposed during the last decades. In addition to the random walk Metropolis Hastings (MH) algorithm [5],
one can mention the work in [6] about efficient highdimensional importance
sampling, the Metropolisadjusted Langevin algorithm (MALA) of [7, 8],
or the highdimensional Gaussian sampling methods of [9, 10].
To handle logconcave smooth probability distributions, a Hamiltonian Monte Carlo (HMC) sampling technique has recently been proposed
in [11, 12]. This technique uses the analogy with the kinetic energy conservation in physics
to design efficient proposals that better follow the target distribution’s geometry. HMC has recently been investigated in a number of
works dealing with multidimensional sampling problems for various applications [13, 14], which demonstrates the
efficiency of such sampling schemes. Efficient sampling is obtained using these strategies where the convergence and mixing properties
of the simulated chains are improved compared to classical sampling schemes such as the Gibbs and MH algorithms.
However, these techniques are only adapted to logconcave probability distributions with smooth energy functions since the gradient of these
functions has to be calculated. This constraint represents a real limitation in some applications
where sparsity is one of the main processing ingredients especially
for large data. Indeed, sparsity promoting
probability distributions generally have a nondifferentiable energy function such as the Laplace or the
generalized Gaussian (GG) [15]
distributions which involve and energy functions, respectively. Such distributions have been used as priors for the
target signals or images in a number of works where inverse problems are handled in a Bayesian
framework [16, 17, 18].
Using the HMC technique in this context is therefore not possible.
This paper presents a modified HMC scheme that makes it possible to sample from logconcave probability distributions with nondifferentiable
energy functions.
The so called nonsmooth HMC (nsHMC) sampling scheme relies on a modified leapfrog transform [11, 12]
that circumvents
the nondifferentiability of the target energy function. The modified leapfrog transform relies on the subdifferential and proximity
operator concepts [19].
The proposed scheme is validated on a sampling example where samples are drawn from a
GG distribution with different shape parameters. It is also illustrated on a signal recovery problem
where a sparse regularization scheme is applied to recover a highdimensional signal.
The rest of the paper is organized as follows. Section II formulates the problem of nonsmooth sampling for large data using Hamiltonian dynamics. Section III presents the proposed nsHMC sampling scheme. This technique is then validated in Section IV to illustrate its efficiency for sampling from nonsmooth logconcave distributions. Finally, some conclusions and perspectives are drawn in Section V.
Ii Problem formulation
Let us consider a signal of interest and let be its probability distribution function which is parametrized by the parameter vector . In this work, we focus on an exponential family of distributions such that
(1) 
where is the energy function. As stated above, in this paper we concentrate on sampling from the class of logconcave probability densities, which means that the energy function is assumed to be convex but not necessarily differentiable. In addition, we will also make the assumption that belongs to , the class of proper lower semicontinuous convex functions from to . Finally, we will consider probability distributions from which direct sampling is not possible and requires the use of an acceptancerejection step. Example II.1 presents the case of the GG distribution which satisfies the above mentioned assumptions.
Example II.1
Let and two realpositive scalars. The generalized Gaussian distribution is defined by the following probability density function
(2) 
for .
Except for particular values of such as , the energy function is not differentiable (where ). In what follows, we are interested in efficiently drawing samples according to the probability distribution defined in (1). The following section describes the proposed nonsmooth sampling scheme.
Iii Nonsmooth sampling
Iiia Hamiltonian Monte Carlo methods
HMC methods [11, 12, 14] are powerful tools that use the principle of Hamiltonian dynamics. These methods have been originally proposed by analogy to kinetic energy evolution. They usually provide better mixing and convergence properties than standard schemes. Let us assume that represents a potential energy function. If we associate to a momentum vector , the energy function for the Hamiltonian dynamics is the combination of the potential energy and a kinetic energy , i.e.,
(3) 
The energy function is called the Hamiltonian and completely describes the considered system. For simplicity reasons, a quadratic kinetic energy corresponding to a unitary diagonal covariance matrix is usually assumed so that . The Hamiltonian’s motion equations determine the evolution of the state as a function of time [12]
(4) 
where denotes the partial derivative operator. These equations define a transformation that maps the state of the system at time to the state at time . The distribution of the Hamiltonian dynamics energy defined in (3) is therefore given by
(5) 
HMC methods iteratively proceed by alternate updates of samples and drawn according to the distribution (IIIA). At iteration , the HMC algorithm starts with the current values of vectors and . Two steps have then to be performed. The first one proceeds by an update of the momentum vector leading to by sampling according to the multivariate Gaussian distribution , where is the identity matrix. The second step updates both momentum and position by proposing two candidates and . These two candidates are generated by simulating the Hamiltonian dynamics, which are discretized using some discretization techniques such as the Euler or leapfrog methods. For instance, the discretization can be performed using steps of the leapfrog method with a stepsize . The th leapfrog discretization will be denoted by and can be summarized as follows
(6)  
(7)  
(8) 
After the steps, the proposed candidates are given by and . These candidates are then accepted using the standard MH rule, i.e., with the following probability
(9) 
where is the energy function defined in (3).
IiiB Nonsmooth Hamiltonian Monte Carlo schemes
The key step in standard HMC sampling schemes is the approximation of the Hamiltonian dynamics.
This approximation allows the random simulation of uncorrelated samples according to a target distribution while exploiting the
geometry of its corresponding energy. In this section, we propose two nonsmooth Hamiltonian Monte Carlo (nsHMC) schemes
to perform this approximation for
nonsmooth energy functions. The first scheme is based on the subdifferential operator while the second one is based on proximity operators.
For both schemes, the whole algorithm to sample and is detailed in Algorithms 1 and 2. These algorithms describe all the necessary
steps to sample from a logconcave target distribution.
IiiB1 Scheme 1  subdifferential based approach
Let us first give the following definition of the subdifferential and a useful example.
Definition III.1
[19, p. 223] Let be in , the class of lower semicontinuous convex functions from to . The subdifferential of is the set , where defines the standard scalar product. Every element is a subgradient of at point . If is differentiable, the subdifferential reduces to its gradient: .
Example III.1
Let be defined as
(10) 
The subdifferential of at is defined by
(11) 
If in addition we consider a scalar and we call , then we have for every [19, Prop. 16.5].
For distributions with smooth energy, we propose to use the leapfrog method whose basic form requires to compute the gradient of the potential energy . Since we cannot determine this gradient for nonsmooth energy functions, we resort to the following reformulation of the leapfrog scheme by using the concept of subdifferential introduced hereabove
(12)  
(13)  
(14) 
where is sampled uniformly in the subdifferential of . This discretization scheme will be denoted by
. If is differentiable,
the mapping in (12), (13) and (14)
exactly matches the conventional HMC mapping in (6), (7) and (8).
As for the standard HMC scheme, the proposed candidates are defined by and that can be computed after leapfrog steps. These candidates are then accepted based on the standard MH rule defined in (9). The resulting sampling algorithm is summarized in Algorithm 1.
Note that we do not need to account for any additional term in the acceptance ratio in (9) since volume preservation is
ensured by the Metropolis
update. Volume preservation is equivalent to having an absolute value of the Jacobian matrix determinant for the mapping
equal to one, and
is due to the fact that candidates are proposed according to Hamiltonian dynamics.
More precisely, volume preservation can be easily demonstrated by using
the concept of Jacobian matrix approximation [20] such as the Clarke
generalization [21], and by conducting calculations
similar to [12, Chapter 5, p. 118].
To facilitate the reading of the paper, the volume preservation issue is addressed in Appendix A.
IiiB2 Scheme 2  proximal based approach
Since the calculation of the sub differential is not straightforward for some classes of convex functions, a second scheme modifying the leapfrog steps (12), (13) and (14) can be considered by using the concept of proximity operators. These operators have been found to be fundamental in a number of recent works in convex optimization [22, 23, 24], and more recently in [25] where stochastic proximal algorithms have been investigated. Let us first recall the following definition.
Definition III.2
Example III.2
For the function defined in Example III.1, the proximity operator is given by
(15) 
Many other examples and interesting properties that make this tool very powerful and commonly used in the recent optimization literature are given in [27]. One of these properties in which we are interested here is given in the following property.
Property 1
[28, Prop. 3] Let and . There exists a unique point such that . Using the proximity operator definition hereabove, it turns out that .
By modifying the discretization scheme , we propose the following th leapfrog discretization scheme denoted by
(16)  
(17)  
(18) 
If is differentiable, the mapping in (16), (17) and (18) exactly matches the mapping in (6), (7) and (8). The only difference is that the subdifferential of the mapping is evaluated in instead of . As for scheme 1, the proposed candidates are given by and after leapfrog steps. These candidates are then accepted based on the standard MH rule (9).
IiiB3 Discussion
Fig. 1 illustrates the use of the adopted discretization schemes associated with algorithms 1 and 2 in approximating a Hamiltonian made up of a quadratic kinetic energy and a potential energy having the following form
(19) 
where . For this potential energy, the subdifferential can be analytically calculated and is given by
(20) 
where is defined in Example III.1 and is the identity operator. The two proposed algorithms can therefore be compared for this example.
Fig. 1 shows that the discretized energy is close to the continuous one for the two mappings and . Moreover, the slight difference between the two mappings shows that the two discretization schemes perform very similarly close to the critical region of nondifferentiability (the interval with small , see the zoom around the origin in Fig. 1). Fig. 2 illustrates the shape of the proximity operator for the considered energy function , as well as the identity function and the difference . This figure clearly shows that, due to the thresholding property of the proximity operator, for . In particular, for the considered example, we have for every . This comparison confirms that the two schemes perform similarly especially close to the nondifferentiability point.
Since Algorithm 2 is more general than Algorithm 1 and allows us to handle energies for which the subdifferential is not straightforward (while performing well especially close to the critical regions), we will focus on this discretization scheme for our experiments.
Iv Experimental validation
This section validates the proposed nsHMC scheme for nonsmooth logconcave distributions through three experiments. The two first experiments consider the GG distribution whose energy function is nondifferentiable for the values of the shape parameter considered here ( and ). For the third experiment, a Laplace distribution (GG distribution with ) is used for an image denoising problem where the clean image is recovered from noisy measurements using a Bayesian regularization scheme involving a sampling technique based on the proposed nsHMC algorithm.
Iva Experiment 1: 1D sampling
In the first experiment, a 1D sampling is performed for different values of the shape and scale parameters of a GG distribution
( and ). Chains generated using the proposed nsHMC sampling scheme are compared to the ones
obtained with a random walk MetropolisHastings (rwMH) scheme.
The rwMH strategy is used here for comparison since it generally improves the mixing properties of the generated samples when compared
to a fixed proposal distribution.
Let be the current sample and the proposed one.
A Gaussian proposal centered on the current sample with unitary variance is used for the rwMH algorithm,
i.e., .
Fig. 3 displays the mean square error (MSE) between the target GG pdf and the histogram of the generated samples with respect to
the number of sampled coefficients.
This figure shows slightly faster convergence for the proposed nsHMC scheme compared to the rwMH algorithm.
To further investigate the sampling efficiency, Fig. 4 displays the autocorrelation functions (ACFs) of the sampled chains
for the same values of . This figure clearly shows that samples generated using the nsHMC scheme are less
correlated than those generated using rwMH, which corroborates the fast convergence of the nsHMC scheme.
In fact, the proposed technique does not need any adjustment of the proposal variance contrary to the rwHM algorithm while giving
acceptable level of intrachain correlation.
For the sake
of comparison, Fig. 4 also displays the ACFs of chains sampled using a standard MH algorithm with a
centered Gaussian proposal ().
Indeed, it has been reported that
rwMH increases the correlation level within sampled chains [5], while an MH algorithm
provides uncorrelated samples.
The comparison between the ACFs corresponding to nsHMC and MH shows that chains sampled using nsHMC are as less correlated
as the standard MH algorithm with proposal.
, 


, 
, 


, 
IvB Experiment 2: multivariate sampling
In this experiment, sampling is performed according to a multivariate GG distribution. First, sampling using rwMH is performed with a large number of iterations (long burnin period) so that the target distribution is guaranteed to be reached. The histogram of the obtained samples is calculated and corresponds to our ground truth. Then, samplings using the proposed nsHMC method and the rwMH scheme are performed on the same dataset. The MSE between the obtained histograms and the ground truth are computed versus the number of iterations of the sampler. Fig. 5 displays this MSE for the multivariate GG sampling and for different values of the scale and shape parameters ( and ). Note that simulations have been performed for the 2D, 3D and 4D cases.
2D 
MSE 
MSE 


,  ,  
3D 
MSE 
MSE 

,  ,  
4D 
MSE 
MSE 

,  , 
The reported MSE values show that the target distribution is more rapidly approached using nsHMC compared to rwMH. Specifically, for the 2D case, the nsHMC scheme converges after about 500 iterations for (resp. 1000 iterations for ), while the rwMH sampling needs about 3500 iterations for and . When looking at the same curves in Fig. 5 for the 3D and 4D cases, it is worth noticing that the gap between the two methods in terms of convergence speed increases with the problem dimensionality. This effect is confirmed for the shape and scale parameters. This corroborates the usefulness of the proposed nsHMC scheme especially for large data sampling where the convergence speed of the standard MH or rwMH algorithms is altered by the size of the data.
IvC Experiment 3: denoising
In this experiment, the performance of the proposed nsHMC sampling algorithm is analyzed for an image denoising problem. The 2D image of size displayed in Fig. 6[topleft] has been used as a ground truth for this example. An independent identically distributed additive Gaussian noise of variance has been added to this image to obtain the noisy image depicted in Fig. 6[topright]. The objective of this third experiment is to promote the sparsity of the wavelet coefficients associated with the target image. To this end, we express the image formation model as function of the wavelet coefficients which are related to the ground truth image through the relation where denotes the dual frame operator. The analysis frame operator thus corresponds to and as orthonormal bases are considered here, the dual frame operator reduces to the inverse operator yielding . The observation model can thus be expressed as
(21) 
where is the observed image, contains the unknown wavelet coefficients and is
the additive noise. Note that the denoised image can be easily recovered from the estimated wavelet coefficients by taking .
Based on this model and the Gaussian likelihood assumption, a hierarchical Bayesian model has been built using an independent Laplace prior
for the wavelet coefficients [29, 30]
(22) 
where is an unknown parameter that is estimated within the proposed Bayesian algorithm. More precisely, an inverse gamma prior distribution is assigned to [31, 32]
(23) 
where is the gamma function, and and are fixed hyperparameters
(in our experiments these hyperparameters have been set to ).
Using a Jeffrey’s prior for the noise variance (), the full
posterior of this denoising model can be derived. The associated Gibbs sampler generates samples according to the conditional
distributions of the posterior. The conditional distribution of the wavelet coefficients writes
(24) 
where the energy function is defined by . Sampling according to this distribution is performed using the proposed nsHMC scheme, which requires the calculation of the proximity operator of its energy function given by
(25) 
where and can easily be calculated using standard properties of the proximity
operators [23, 33, 19]. See Appendix B for more details.
Regarding the noise variance and the prior hyperparameter, straightforward calculations lead to the following
conditional distributions which are
easy to sample
(26)  
(27) 
where is the inverse gamma distribution. The estimation of the denoised image is performed based on the sampled wavelet coefficients after an appropriate burnin perior, i.e., after convergence of the Gibbs sampler in Algorithm 3.
An example of denoised image using Algorithm 3 is displayed in Fig. 6[bottomleft]. For the sake of comparison, a denoised image using the Wiener filter is displayed in Fig. 6[bottomright]. From a visual point of view, we can easily notice that Algorithm 3 provides a better denoised image compared to the Wiener filter. Quantitatively speaking, the evaluation of the noisy and denoised images is based on both SNR (signal to noise ratio) and SSIM [34] (structural similarity). These values are directly reported in the figure and show the efficiency of the denoising algorithm based on the proposed nsHMC technique to sample from the conditional distribution of the wavelet coefficients . As regards the computational time, only 1000 iterations are necessary for the proposed algorithm involving a burnin period of 500 iterations, taking around 9 seconds on a 64bit 2.00GHz i73667U architecture with a Matlab implementation. For the nsHMC step, the second scheme has been used with .
V Conclusion
This paper proposed a solution to make feasible the use of Hamiltonian dynamics for sampling according to logconcave probability distributions with nonsmooth energy functions. The proposed sampling technique relies on some interesting results from convex optimization and Hamiltonian Monte Carlo methods. More precisely, proximity operators were investigated to address the nondifferentiability problem of the energy function related to the target distribution. Validation results showed that the proposed technique provides faster convergence and interesting decorrelation properties for the sampled chains when compared to more standard methods such as the random walk Metropolis Hastings algorithm. The proposed technique was evaluated on synthetic data and applied to an image denoising problem. Our results showed that the use of proximity operators in a Hamiltonian Monte Carlo method allows faster convergence to the target distribution to be obtained. This conclusion is particularly important for large scale data sampling since the gain in convergence speed increases with the problem dimensionality. In a future work, we will focus on the investigation of this technique for sparse signal recovery where a nontight linear operator is involved in the observation model.
a Volume preservation
This appendix studies the volume preservation condition both for the Hamiltonian dynamics and the proposed discretization schemes. Volume preservation is a key property of sampling algorithms involving an acceptancerejection step such as the Metropolis within Gibbs algorithm since it allows simpler acceptance probability to be obtained.
A1 Volume preservation for the Hamiltonian dynamics
Akin to [12], we consider here volume preservation for Hamiltonian dynamics for the one dimensional case. The multidimensional case can then be handled through simple generalizations. Let us denote by (see Section IIIA) the mapping between the state at time , denoted by , and the state at time . For small enough, can be approximated by [12]
(28) 
where involves terms of order or higher. After replacing the time derivatives of (IIIA) in (28), and accounting for the fact that the Hamiltonian may be nondifferentiable with respect to , the generalized Jacobian matrix can be written as
(29) 
where denotes the subgradient and is an element of the secondorder sub differential with respect to and . The determinant of this matrix can therefore be written as
(30) 
where is the determinant of the matrix .
Following the construction proposed in [12],
it turns out that for some time interval that is not close to zero, , which means that the
transformation ensures volume preservation.
A2 Volume preservation for the proposed discretization schemes
It is woth noticing here that the two modified leapfrog discretization schemes and defined respectively in (12)(14) and (16)(18), as the original leapfrog scheme defined in (6)(8), preserve volume since they are shear transformations. The interested reader can refer to [12] or [35, page 121] for more details.
B Proximity operator calculation for the experiment of Section IvC
The energy function considered in this appendix is the one involved in the conditional distribution of the wavelet coefficients in (24), i.e.,
(31) 
where and . In order to use the proposed nsHMC sampling algorithm, the proximity operator of the function has to be calculated. Following the standard definition of the proximity operator [26, 23], we can write
(32) 
which proves the expresssion of the proximity operator given in (25).
References
 [1] D. L Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
 [2] C. Chaux, P. L. Combettes, J.C Pesquet, and V. Wajs, “Iterative image deconvolution using overcomplete representations,” in Proc. European Signal Processing Conference (EUSIPCO), Florence, Italy, Sep. 48 2006.
 [3] L. Chaari, P. Ciuciu, S. Mériaux, and J.C. Pesquet, “Spatiotemporal wavelet regularization for parallel MRI reconstruction: application to functional MRI,” Mag. Reson. Mater. in Phys., Biol. and Med. (MAGMA), vol. 27, pp. 509–529, 2014.
 [4] L. Boubchir and B. Boashash, “Wavelet denoising based on the MAP estimation using the BKF prior with application to images and EEG signals,” IEEE Trans. Signal Process., vol. 61, no. 8, pp. 1880–1894, Apr. 2013.
 [5] C. Robert and G. Castella, Monte Carlo statistical methods, Springer, New York, 2004.
 [6] J.F. Richard and Wei Zhang, “Efficient highdimensional importance sampling,” J. Econom., vol. 141, no. 2, pp. 1385 – 1411, 2007.
 [7] G. O. Roberts and R. L. Tweedie, “Exponential convergence of Langevin distributions and their discrete approximations,” Bernouilli, vol. 1, no. 4, pp. 341–363, 1996.
 [8] M. Pereyra, “Proximal markov chain monte carlo algorithms,” Arxiv, 2013, http://arxiv.org/abs/1306.0187.
 [9] F. Orieux, O. FÃ©ron, and J.F. Giovannelli, “Sampling highdimensional Gaussian distributions for general linear inverse problems,” Signal Process. Lett., vol. 19, no. 5, pp. 251–254, May 2012.
 [10] C. Gilavert, S. Moussaoui, and J. Idier, “Efficient Gaussian Sampling for Solving LargeScale Inverse Problems using MCMC Methods,” IEEE Trans. Signal Process., vol. 63, no. 1, Aug. 2014.
 [11] K. M. Hanson, “Markov chain Monte Carlo posterior sampling with the Hamiltonian method,” in SPIE Medical Imaging: Image Processing, M. Sonka and K. M. Hanson, eds., 2001, pp. 456–467.
 [12] R. M. Neal, “MCMC using Hamiltonian dynamics,” in Handbook of Markov Chain Monte Carlo, G. Jones X. L. Meng S. Brooks, A. Gelman, Ed., chapter 5. Chapman and Hall/CRC Press, 2010.
 [13] A. Pakman and L. Paninski, “Exact Hamiltonian Monte Carlo for truncated multivariate Gaussians,” http://arxiv.org/abs/1208.4118.
 [14] Y. Altmann, N. Dobigeon, and J.Y. Tourneret, “Unsupervised postnonlinear unmixing of hyperspectral images using a Hamiltonian Monte Carlo algorithm,” IEEE Trans. Image Process., vol. 23, no. 6, pp. 2663–2675, 2014.
 [15] H. P. Wai and B. D. Jeffs, “Adaptive image restoration using a generalized Gaussian model for unknown noise,” IEEE Trans. Image Process., vol. 4, no. 10, pp. 1451–1456, Oct. 1995.
 [16] E. P. Simoncelli and E. H. Adelson, “Noise removal via Bayesian wavelet coring,” in IEEE Int. Conf. on Image Process. (ICIP), Lausanne, Switzerland, Sep. 1619 1996, pp. 379–382.
 [17] P. Moulin and J. Liu, “Analysis of multiresolution image denoising schemes using generalizedGaussian priors,” in IEEE Int. Symp. on TimeFrequency and TimeScale Analysis, Pittsburgh, USA, Oct. 1998, pp. 633–636.
 [18] L. Chaari, J.C. Pesquet, J.Y. Tourneret, Ph. Ciuciu, and A. BenazzaBenyahia, “A hierarchical Bayesian model for frame representation,” IEEE Trans. Signal Process., vol. 18, no. 11, pp. 5560–5571, Nov. 2010.
 [19] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2011.
 [20] V. Jeyakumar and D. T. Luc, “Approximate Jacobian matrices for nonsmooth continuous maps and C optimization,” SIAM J. Control and Optim., vol. 36, no. 5, pp. 1815–1832, 1998.
 [21] F. H. Clarke, Optimization and nonsmooth analysis, WileyInterscience, New York, 1983.
 [22] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forwardbackward splitting,” Multiscale Model. Simul., vol. 4, pp. 1168–1200, 2005.
 [23] C. Chaux, P. Combettes, J.C. Pesquet, and V.R Wajs, “A variational formulation for framebased inverse problems,” Inv. Prob., vol. 23, no. 4, pp. 1495–1518, Aug. 2007.
 [24] L. Chaari, J.C. Pesquet, A. BenazzaBenyahia, and P. Ciuciu, “A waveletbased regularized reconstruction algorithm for SENSE parallel MRI with applications to neuroimaging,” Med. Image Anal., vol. 15, no. 2, pp. 185–201, Nov. 2011.
 [25] Y. F. Atchade, G. Fort, and E. Moulines, “On stochastic proximal gradient algorithms,” Arxiv, 2014, http://arxiv.org/abs/1402.2365.
 [26] J.J. Moreau, “Proximité et dualité dans un espace hilbertien,” Bulletin de la Société Mathématique de France, vol. 93, pp. 273–299, 1965.
 [27] P. L. Combettes and J.C. Pesquet, “Proximal splitting methods in signal processing,” in FixedPoint Algorithms for Inverse Problems in Science and Engineering, P. L. Combettes V. Elser D. R. Luke H. H. Bauschke, R. Burachik and H. Wolkowicz, Eds., pp. 185–212. SpringerVerlag, New York, 2011.
 [28] J. Douglas and H. H. Rachford, “On the numerical solution of heat conduction problems in two or three space variables,” Trans. Amer. Math. Soc., vol. 82, pp. 421–439., 1956.
 [29] M. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, 2008.
 [30] M. Seeger, “Bayesian inference and optimal design in the sparse linear model,” J. Mach. Learn. Res., vol. 9, pp. 759–813, 2008.
 [31] L. Chaari, J.Y. Tourneret, and H. Batatia, “Sparse Bayesian regularization using BernoulliLaplacian priors,” in Proc. European Signal Process. Conf. (EUSIPCO), Marrakech, Morocco, September 913, 2013.
 [32] N. Dobigeon, A. O. Hero, and J.Y. Tourneret, “Hierarchical Bayesian sparse image reconstruction with application to MRFM,” IEEE Trans. Image Process., vol. 19, no. 9, pp. 2059–2070, Sept. 2009.
 [33] P. L. Combettes and J.C. Pesquet, “A proximal decomposition method for solving convex variational inverse problems,” Inv. Prob., vol. 24, no. 6, Dec. 2008, 27 p.
 [34] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
 [35] S. Brooks, A. Gelman, G. Jones, and X.L. Meng, Handbook of Markov Chain Monte Carlo, Chapman & Hall/CRC, Boston, USA, 2011.