A Hamiltonian Monte Carlo Method for Non-Smooth Energy Sampling
Efficient sampling from high-dimensional distributions is a challenging issue which is encountered in many large data recovery problems involving Markov chain Monte Carlo schemes. In this context, sampling using Hamiltonian dynamics is one of the recent techniques that have been proposed to exploit the target distribution geometry. Such schemes have clearly been shown to be efficient for multi-dimensional sampling, but are rather adapted to the exponential families of distributions with smooth energy function. In this paper, we address the problem of using Hamiltonian dynamics to sample from probability distributions having non-differentiable energy functions such as . Such distributions are being more and more used in sparse signal and image recovery applications. The proposed technique uses a modified leapfrog transform involving a proximal step. The resulting non-smooth Hamiltonian Monte Carlo (ns-HMC) method is tested and validated on a number of experiments. Results show its ability to accurately sample according to various multivariate target distributions. The proposed technique is illustrated on synthetic examples and is applied to an image denoising problem.
Sparse sampling, Bayesian methods, MCMC, Hamiltonian, proximity operator, leapfrog.
Sparse signal and image recovery is a hot topic which has gained a lot of interest during the last decades, especially after the
emergence of the compressed sensing theory . In addition, most of the recent applications such as
remote sensing  and medical image
reconstruction [3, 4]
generate large and intractable data volumes that have to be processed either independently or jointly. To handle such inverse problems, Bayesian techniques have
demonstrated their usefulness especially when the model hyperparameters are difficult to fix. Such techniques generally rely on
a maximum a posteriori (MAP) estimation built upon the signal/image likelihood and priors. Since efficient priors generally have a
complicated form, analytical expressions of the MAP estimators are often difficult to obtain.
For this reason, many Bayesian techniques
resort to Markov chain Monte Carlo (MCMC) sampling techniques .
To handle large-dimensional sampling, several techniques have
been proposed during the last decades. In addition to the random walk Metropolis Hastings (MH) algorithm ,
one can mention the work in  about efficient high-dimensional importance
sampling, the Metropolis-adjusted Langevin algorithm (MALA) of [7, 8],
or the high-dimensional Gaussian sampling methods of [9, 10].
To handle log-concave smooth probability distributions, a Hamiltonian Monte Carlo (HMC) sampling technique has recently been proposed
in [11, 12]. This technique uses the analogy with the kinetic energy conservation in physics
to design efficient proposals that better follow the target distribution’s geometry. HMC has recently been investigated in a number of
works dealing with multi-dimensional sampling problems for various applications [13, 14], which demonstrates the
efficiency of such sampling schemes. Efficient sampling is obtained using these strategies where the convergence and mixing properties
of the simulated chains are improved compared to classical sampling schemes such as the Gibbs and MH algorithms.
However, these techniques are only adapted to log-concave probability distributions with smooth energy functions since the gradient of these
functions has to be calculated. This constraint represents a real limitation in some applications
where sparsity is one of the main processing ingredients especially
for large data. Indeed, sparsity promoting
probability distributions generally have a non-differentiable energy function such as the Laplace or the
generalized Gaussian (GG) 
distributions which involve and energy functions, respectively. Such distributions have been used as priors for the
target signals or images in a number of works where inverse problems are handled in a Bayesian
framework [16, 17, 18].
Using the HMC technique in this context is therefore not possible.
This paper presents a modified HMC scheme that makes it possible to sample from log-concave probability distributions with non-differentiable energy functions. The so called non-smooth HMC (ns-HMC) sampling scheme relies on a modified leapfrog transform [11, 12] that circumvents the non-differentiability of the target energy function. The modified leapfrog transform relies on the sub-differential and proximity operator concepts . The proposed scheme is validated on a sampling example where samples are drawn from a GG distribution with different shape parameters. It is also illustrated on a signal recovery problem where a sparse regularization scheme is applied to recover a high-dimensional signal.
The rest of the paper is organized as follows. Section II formulates the problem of non-smooth sampling for large data using Hamiltonian dynamics. Section III presents the proposed ns-HMC sampling scheme. This technique is then validated in Section IV to illustrate its efficiency for sampling from non-smooth log-concave distributions. Finally, some conclusions and perspectives are drawn in Section V.
Ii Problem formulation
Let us consider a signal of interest and let be its probability distribution function which is parametrized by the parameter vector . In this work, we focus on an exponential family of distributions such that
where is the energy function. As stated above, in this paper we concentrate on sampling from the class of log-concave probability densities, which means that the energy function is assumed to be convex but not necessarily differentiable. In addition, we will also make the assumption that belongs to , the class of proper lower semi-continuous convex functions from to . Finally, we will consider probability distributions from which direct sampling is not possible and requires the use of an acceptance-rejection step. Example II.1 presents the case of the GG distribution which satisfies the above mentioned assumptions.
Let and two real-positive scalars. The generalized Gaussian distribution is defined by the following probability density function
Except for particular values of such as , the energy function is not differentiable (where ). In what follows, we are interested in efficiently drawing samples according to the probability distribution defined in (1). The following section describes the proposed non-smooth sampling scheme.
Iii Non-smooth sampling
Iii-a Hamiltonian Monte Carlo methods
HMC methods [11, 12, 14] are powerful tools that use the principle of Hamiltonian dynamics. These methods have been originally proposed by analogy to kinetic energy evolution. They usually provide better mixing and convergence properties than standard schemes. Let us assume that represents a potential energy function. If we associate to a momentum vector , the energy function for the Hamiltonian dynamics is the combination of the potential energy and a kinetic energy , i.e.,
The energy function is called the Hamiltonian and completely describes the considered system. For simplicity reasons, a quadratic kinetic energy corresponding to a unitary diagonal covariance matrix is usually assumed so that . The Hamiltonian’s motion equations determine the evolution of the state as a function of time 
where denotes the partial derivative operator. These equations define a transformation that maps the state of the system at time to the state at time . The distribution of the Hamiltonian dynamics energy defined in (3) is therefore given by
HMC methods iteratively proceed by alternate updates of samples and drawn according to the distribution (III-A). At iteration , the HMC algorithm starts with the current values of vectors and . Two steps have then to be performed. The first one proceeds by an update of the momentum vector leading to by sampling according to the multivariate Gaussian distribution , where is the identity matrix. The second step updates both momentum and position by proposing two candidates and . These two candidates are generated by simulating the Hamiltonian dynamics, which are discretized using some discretization techniques such as the Euler or leapfrog methods. For instance, the discretization can be performed using steps of the leapfrog method with a stepsize . The th leapfrog discretization will be denoted by and can be summarized as follows
After the steps, the proposed candidates are given by and . These candidates are then accepted using the standard MH rule, i.e., with the following probability
where is the energy function defined in (3).
Iii-B Non-smooth Hamiltonian Monte Carlo schemes
The key step in standard HMC sampling schemes is the approximation of the Hamiltonian dynamics.
This approximation allows the random simulation of uncorrelated samples according to a target distribution while exploiting the
geometry of its corresponding energy. In this section, we propose two non-smooth Hamiltonian Monte Carlo (ns-HMC) schemes
to perform this approximation for
non-smooth energy functions. The first scheme is based on the subdifferential operator while the second one is based on proximity operators.
For both schemes, the whole algorithm to sample and is detailed in Algorithms 1 and 2. These algorithms describe all the necessary
steps to sample from a log-concave target distribution.
Iii-B1 Scheme 1 - subdifferential based approach
Let us first give the following definition of the sub-differential and a useful example.
[19, p. 223] Let be in , the class of lower semi-continuous convex functions from to . The sub-differential of is the set , where defines the standard scalar product. Every element is a sub-gradient of at point . If is differentiable, the sub-differential reduces to its gradient: .
Let be defined as
The sub-differential of at is defined by
If in addition we consider a scalar and we call , then we have for every [19, Prop. 16.5].
For distributions with smooth energy, we propose to use the leapfrog method whose basic form requires to compute the gradient of the potential energy . Since we cannot determine this gradient for non-smooth energy functions, we resort to the following reformulation of the leapfrog scheme by using the concept of sub-differential introduced hereabove
where is sampled uniformly in the sub-differential of . This discretization scheme will be denoted by
. If is differentiable,
the mapping in (12), (13) and (14)
exactly matches the conventional HMC mapping in (6), (7) and (8).
As for the standard HMC scheme, the proposed candidates are defined by and that can be computed after leapfrog steps. These candidates are then accepted based on the standard MH rule defined in (9). The resulting sampling algorithm is summarized in Algorithm 1.
Note that we do not need to account for any additional term in the acceptance ratio in (9) since volume preservation is
ensured by the Metropolis
update. Volume preservation is equivalent to having an absolute value of the Jacobian matrix determinant for the mapping
equal to one, and
is due to the fact that candidates are proposed according to Hamiltonian dynamics.
More precisely, volume preservation can be easily demonstrated by using
the concept of Jacobian matrix approximation  such as the Clarke
generalization , and by conducting calculations
similar to [12, Chapter 5, p. 118].
To facilitate the reading of the paper, the volume preservation issue is addressed in Appendix -A.
Iii-B2 Scheme 2 - proximal based approach
Since the calculation of the sub differential is not straightforward for some classes of convex functions, a second scheme modifying the leapfrog steps (12), (13) and (14) can be considered by using the concept of proximity operators. These operators have been found to be fundamental in a number of recent works in convex optimization [22, 23, 24], and more recently in  where stochastic proximal algorithms have been investigated. Let us first recall the following definition.
For the function defined in Example III.1, the proximity operator is given by
Many other examples and interesting properties that make this tool very powerful and commonly used in the recent optimization literature are given in . One of these properties in which we are interested here is given in the following property.
[28, Prop. 3] Let and . There exists a unique point such that . Using the proximity operator definition hereabove, it turns out that .
By modifying the discretization scheme , we propose the following -th leapfrog discretization scheme denoted by
If is differentiable, the mapping in (16), (17) and (18) exactly matches the mapping in (6), (7) and (8). The only difference is that the sub-differential of the mapping is evaluated in instead of . As for scheme 1, the proposed candidates are given by and after leapfrog steps. These candidates are then accepted based on the standard MH rule (9).
Fig. 1 illustrates the use of the adopted discretization schemes associated with algorithms 1 and 2 in approximating a Hamiltonian made up of a quadratic kinetic energy and a potential energy having the following form
where . For this potential energy, the sub-differential can be analytically calculated and is given by
where is defined in Example III.1 and is the identity operator. The two proposed algorithms can therefore be compared for this example.
Fig. 1 shows that the discretized energy is close to the continuous one for the two mappings and . Moreover, the slight difference between the two mappings shows that the two discretization schemes perform very similarly close to the critical region of non-differentiability (the interval with small , see the zoom around the origin in Fig. 1). Fig. 2 illustrates the shape of the proximity operator for the considered energy function , as well as the identity function and the difference . This figure clearly shows that, due to the thresholding property of the proximity operator, for . In particular, for the considered example, we have for every . This comparison confirms that the two schemes perform similarly especially close to the non-differentiability point.
Since Algorithm 2 is more general than Algorithm 1 and allows us to handle energies for which the sub-differential is not straightforward (while performing well especially close to the critical regions), we will focus on this discretization scheme for our experiments.
Iv Experimental validation
This section validates the proposed ns-HMC scheme for non-smooth log-concave distributions through three experiments. The two first experiments consider the GG distribution whose energy function is non-differentiable for the values of the shape parameter considered here ( and ). For the third experiment, a Laplace distribution (GG distribution with ) is used for an image denoising problem where the clean image is recovered from noisy measurements using a Bayesian regularization scheme involving a sampling technique based on the proposed ns-HMC algorithm.
Iv-a Experiment 1: 1D sampling
In the first experiment, a 1D sampling is performed for different values of the shape and scale parameters of a GG distribution
( and ). Chains generated using the proposed ns-HMC sampling scheme are compared to the ones
obtained with a random walk Metropolis-Hastings (rw-MH) scheme.
The rw-MH strategy is used here for comparison since it generally improves the mixing properties of the generated samples when compared
to a fixed proposal distribution.
Let be the current sample and the proposed one.
A Gaussian proposal centered on the current sample with unitary variance is used for the rw-MH algorithm,
Fig. 3 displays the mean square error (MSE) between the target GG pdf and the histogram of the generated samples with respect to
the number of sampled coefficients.
This figure shows slightly faster convergence for the proposed ns-HMC scheme compared to the rw-MH algorithm.
To further investigate the sampling efficiency, Fig. 4 displays the autocorrelation functions (ACFs) of the sampled chains for the same values of . This figure clearly shows that samples generated using the ns-HMC scheme are less correlated than those generated using rw-MH, which corroborates the fast convergence of the ns-HMC scheme. In fact, the proposed technique does not need any adjustment of the proposal variance contrary to the rw-HM algorithm while giving acceptable level of intra-chain correlation. For the sake of comparison, Fig. 4 also displays the ACFs of chains sampled using a standard MH algorithm with a centered Gaussian proposal (). Indeed, it has been reported that rw-MH increases the correlation level within sampled chains , while an MH algorithm provides uncorrelated samples. The comparison between the ACFs corresponding to ns-HMC and MH shows that chains sampled using ns-HMC are as less correlated as the standard MH algorithm with proposal.
Iv-B Experiment 2: multivariate sampling
In this experiment, sampling is performed according to a multivariate GG distribution. First, sampling using rw-MH is performed with a large number of iterations (long burn-in period) so that the target distribution is guaranteed to be reached. The histogram of the obtained samples is calculated and corresponds to our ground truth. Then, samplings using the proposed ns-HMC method and the rw-MH scheme are performed on the same dataset. The MSE between the obtained histograms and the ground truth are computed versus the number of iterations of the sampler. Fig. 5 displays this MSE for the multivariate GG sampling and for different values of the scale and shape parameters ( and ). Note that simulations have been performed for the 2D, 3D and 4D cases.
The reported MSE values show that the target distribution is more rapidly approached using ns-HMC compared to rw-MH. Specifically, for the 2D case, the ns-HMC scheme converges after about 500 iterations for (resp. 1000 iterations for ), while the rw-MH sampling needs about 3500 iterations for and . When looking at the same curves in Fig. 5 for the 3D and 4D cases, it is worth noticing that the gap between the two methods in terms of convergence speed increases with the problem dimensionality. This effect is confirmed for the shape and scale parameters. This corroborates the usefulness of the proposed ns-HMC scheme especially for large data sampling where the convergence speed of the standard MH or rw-MH algorithms is altered by the size of the data.
Iv-C Experiment 3: denoising
In this experiment, the performance of the proposed ns-HMC sampling algorithm is analyzed for an image denoising problem. The 2D image of size displayed in Fig. 6[top-left] has been used as a ground truth for this example. An independent identically distributed additive Gaussian noise of variance has been added to this image to obtain the noisy image depicted in Fig. 6[top-right]. The objective of this third experiment is to promote the sparsity of the wavelet coefficients associated with the target image. To this end, we express the image formation model as function of the wavelet coefficients which are related to the ground truth image through the relation where denotes the dual frame operator. The analysis frame operator thus corresponds to and as orthonormal bases are considered here, the dual frame operator reduces to the inverse operator yielding . The observation model can thus be expressed as
where is the observed image, contains the unknown wavelet coefficients and is
the additive noise. Note that the denoised image can be easily recovered from the estimated wavelet coefficients by taking .
Based on this model and the Gaussian likelihood assumption, a hierarchical Bayesian model has been built using an independent Laplace prior for the wavelet coefficients [29, 30]
where is the gamma function, and and are fixed hyperparameters
(in our experiments these hyperparameters have been set to ).
Using a Jeffrey’s prior for the noise variance (), the full posterior of this denoising model can be derived. The associated Gibbs sampler generates samples according to the conditional distributions of the posterior. The conditional distribution of the wavelet coefficients writes
where the energy function is defined by . Sampling according to this distribution is performed using the proposed ns-HMC scheme, which requires the calculation of the proximity operator of its energy function given by
where and can easily be calculated using standard properties of the proximity
operators [23, 33, 19]. See Appendix -B for more details.
Regarding the noise variance and the prior hyperparameter, straightforward calculations lead to the following conditional distributions which are easy to sample
where is the inverse gamma distribution. The estimation of the denoised image is performed based on the sampled wavelet coefficients after an appropriate burn-in perior, i.e., after convergence of the Gibbs sampler in Algorithm 3.
An example of denoised image using Algorithm 3 is displayed in Fig. 6[bottom-left]. For the sake of comparison, a denoised image using the Wiener filter is displayed in Fig. 6[bottom-right]. From a visual point of view, we can easily notice that Algorithm 3 provides a better denoised image compared to the Wiener filter. Quantitatively speaking, the evaluation of the noisy and denoised images is based on both SNR (signal to noise ratio) and SSIM  (structural similarity). These values are directly reported in the figure and show the efficiency of the denoising algorithm based on the proposed ns-HMC technique to sample from the conditional distribution of the wavelet coefficients . As regards the computational time, only 1000 iterations are necessary for the proposed algorithm involving a burn-in period of 500 iterations, taking around 9 seconds on a 64-bit 2.00GHz i7-3667U architecture with a Matlab implementation. For the ns-HMC step, the second scheme has been used with .
This paper proposed a solution to make feasible the use of Hamiltonian dynamics for sampling according to log-concave probability distributions with non-smooth energy functions. The proposed sampling technique relies on some interesting results from convex optimization and Hamiltonian Monte Carlo methods. More precisely, proximity operators were investigated to address the non-differentiability problem of the energy function related to the target distribution. Validation results showed that the proposed technique provides faster convergence and interesting decorrelation properties for the sampled chains when compared to more standard methods such as the random walk Metropolis Hastings algorithm. The proposed technique was evaluated on synthetic data and applied to an image denoising problem. Our results showed that the use of proximity operators in a Hamiltonian Monte Carlo method allows faster convergence to the target distribution to be obtained. This conclusion is particularly important for large scale data sampling since the gain in convergence speed increases with the problem dimensionality. In a future work, we will focus on the investigation of this technique for sparse signal recovery where a non-tight linear operator is involved in the observation model.
-a Volume preservation
This appendix studies the volume preservation condition both for the Hamiltonian dynamics and the proposed discretization schemes. Volume preservation is a key property of sampling algorithms involving an acceptance-rejection step such as the Metropolis within Gibbs algorithm since it allows simpler acceptance probability to be obtained.
-A1 Volume preservation for the Hamiltonian dynamics
Akin to , we consider here volume preservation for Hamiltonian dynamics for the one dimensional case. The multi-dimensional case can then be handled through simple generalizations. Let us denote by (see Section III-A) the mapping between the state at time , denoted by , and the state at time . For small enough, can be approximated by 
where involves terms of order or higher. After replacing the time derivatives of (III-A) in (28), and accounting for the fact that the Hamiltonian may be non-differentiable with respect to , the generalized Jacobian matrix can be written as
where denotes the sub-gradient and is an element of the second-order sub differential with respect to and . The determinant of this matrix can therefore be written as
where is the determinant of the matrix .
Following the construction proposed in ,
it turns out that for some time interval that is not close to zero, , which means that the
transformation ensures volume preservation.
-A2 Volume preservation for the proposed discretization schemes
It is woth noticing here that the two modified leapfrog discretization schemes and defined respectively in (12)-(14) and (16)-(18), as the original leapfrog scheme defined in (6)-(8), preserve volume since they are shear transformations. The interested reader can refer to  or [35, page 121] for more details.
-B Proximity operator calculation for the experiment of Section Iv-C
The energy function considered in this appendix is the one involved in the conditional distribution of the wavelet coefficients in (24), i.e.,
where and . In order to use the proposed ns-HMC sampling algorithm, the proximity operator of the function has to be calculated. Following the standard definition of the proximity operator [26, 23], we can write
which proves the expresssion of the proximity operator given in (25).
-  D. L Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
-  C. Chaux, P. L. Combettes, J.-C Pesquet, and V. Wajs, “Iterative image deconvolution using overcomplete representations,” in Proc. European Signal Processing Conference (EUSIPCO), Florence, Italy, Sep. 4-8 2006.
-  L. Chaari, P. Ciuciu, S. Mériaux, and J.-C. Pesquet, “Spatio-temporal wavelet regularization for parallel MRI reconstruction: application to functional MRI,” Mag. Reson. Mater. in Phys., Biol. and Med. (MAGMA), vol. 27, pp. 509–529, 2014.
-  L. Boubchir and B. Boashash, “Wavelet denoising based on the MAP estimation using the BKF prior with application to images and EEG signals,” IEEE Trans. Signal Process., vol. 61, no. 8, pp. 1880–1894, Apr. 2013.
-  C. Robert and G. Castella, Monte Carlo statistical methods, Springer, New York, 2004.
-  J.-F. Richard and Wei Zhang, “Efficient high-dimensional importance sampling,” J. Econom., vol. 141, no. 2, pp. 1385 – 1411, 2007.
-  G. O. Roberts and R. L. Tweedie, “Exponential convergence of Langevin distributions and their discrete approximations,” Bernouilli, vol. 1, no. 4, pp. 341–363, 1996.
-  M. Pereyra, “Proximal markov chain monte carlo algorithms,” Arxiv, 2013, http://arxiv.org/abs/1306.0187.
-  F. Orieux, O. FÃ©ron, and J.-F. Giovannelli, “Sampling high-dimensional Gaussian distributions for general linear inverse problems,” Signal Process. Lett., vol. 19, no. 5, pp. 251–254, May 2012.
-  C. Gilavert, S. Moussaoui, and J. Idier, “Efficient Gaussian Sampling for Solving Large-Scale Inverse Problems using MCMC Methods,” IEEE Trans. Signal Process., vol. 63, no. 1, Aug. 2014.
-  K. M. Hanson, “Markov chain Monte Carlo posterior sampling with the Hamiltonian method,” in SPIE Medical Imaging: Image Processing, M. Sonka and K. M. Hanson, eds., 2001, pp. 456–467.
-  R. M. Neal, “MCMC using Hamiltonian dynamics,” in Handbook of Markov Chain Monte Carlo, G. Jones X. L. Meng S. Brooks, A. Gelman, Ed., chapter 5. Chapman and Hall/CRC Press, 2010.
-  A. Pakman and L. Paninski, “Exact Hamiltonian Monte Carlo for truncated multivariate Gaussians,” http://arxiv.org/abs/1208.4118.
-  Y. Altmann, N. Dobigeon, and J.-Y. Tourneret, “Unsupervised post-nonlinear unmixing of hyperspectral images using a Hamiltonian Monte Carlo algorithm,” IEEE Trans. Image Process., vol. 23, no. 6, pp. 2663–2675, 2014.
-  H. P. Wai and B. D. Jeffs, “Adaptive image restoration using a generalized Gaussian model for unknown noise,” IEEE Trans. Image Process., vol. 4, no. 10, pp. 1451–1456, Oct. 1995.
-  E. P. Simoncelli and E. H. Adelson, “Noise removal via Bayesian wavelet coring,” in IEEE Int. Conf. on Image Process. (ICIP), Lausanne, Switzerland, Sep. 16-19 1996, pp. 379–382.
-  P. Moulin and J. Liu, “Analysis of multiresolution image denoising schemes using generalized-Gaussian priors,” in IEEE Int. Symp. on Time-Frequency and Time-Scale Analysis, Pittsburgh, USA, Oct. 1998, pp. 633–636.
-  L. Chaari, J.-C. Pesquet, J.-Y. Tourneret, Ph. Ciuciu, and A. Benazza-Benyahia, “A hierarchical Bayesian model for frame representation,” IEEE Trans. Signal Process., vol. 18, no. 11, pp. 5560–5571, Nov. 2010.
-  H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2011.
-  V. Jeyakumar and D. T. Luc, “Approximate Jacobian matrices for nonsmooth continuous maps and C optimization,” SIAM J. Control and Optim., vol. 36, no. 5, pp. 1815–1832, 1998.
-  F. H. Clarke, Optimization and nonsmooth analysis, Wiley-Interscience, New York, 1983.
-  P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model. Simul., vol. 4, pp. 1168–1200, 2005.
-  C. Chaux, P. Combettes, J.-C. Pesquet, and V.R Wajs, “A variational formulation for frame-based inverse problems,” Inv. Prob., vol. 23, no. 4, pp. 1495–1518, Aug. 2007.
-  L. Chaari, J.-C. Pesquet, A. Benazza-Benyahia, and P. Ciuciu, “A wavelet-based regularized reconstruction algorithm for SENSE parallel MRI with applications to neuroimaging,” Med. Image Anal., vol. 15, no. 2, pp. 185–201, Nov. 2011.
-  Y. F. Atchade, G. Fort, and E. Moulines, “On stochastic proximal gradient algorithms,” Arxiv, 2014, http://arxiv.org/abs/1402.2365.
-  J.-J. Moreau, “Proximité et dualité dans un espace hilbertien,” Bulletin de la Société Mathématique de France, vol. 93, pp. 273–299, 1965.
-  P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, P. L. Combettes V. Elser D. R. Luke H. H. Bauschke, R. Burachik and H. Wolkowicz, Eds., pp. 185–212. Springer-Verlag, New York, 2011.
-  J. Douglas and H. H. Rachford, “On the numerical solution of heat conduction problems in two or three space variables,” Trans. Amer. Math. Soc., vol. 82, pp. 421–439., 1956.
-  M. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, 2008.
-  M. Seeger, “Bayesian inference and optimal design in the sparse linear model,” J. Mach. Learn. Res., vol. 9, pp. 759–813, 2008.
-  L. Chaari, J.-Y. Tourneret, and H. Batatia, “Sparse Bayesian regularization using Bernoulli-Laplacian priors,” in Proc. European Signal Process. Conf. (EUSIPCO), Marrakech, Morocco, September 9-13, 2013.
-  N. Dobigeon, A. O. Hero, and J.-Y. Tourneret, “Hierarchical Bayesian sparse image reconstruction with application to MRFM,” IEEE Trans. Image Process., vol. 19, no. 9, pp. 2059–2070, Sept. 2009.
-  P. L. Combettes and J.-C. Pesquet, “A proximal decomposition method for solving convex variational inverse problems,” Inv. Prob., vol. 24, no. 6, Dec. 2008, 27 p.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
-  S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Handbook of Markov Chain Monte Carlo, Chapman & Hall/CRC, Boston, USA, 2011.