SUREmap: Predicting Uncertainty in CNN-based Image Reconstructions using Stein’s Unbiased Risk Estimate

SUREmap: Predicting Uncertainty in CNN-based Image Reconstructions using Stein’s Unbiased Risk Estimate


Convolutional neural networks (CNN) have emerged as a powerful tool for solving computational imaging reconstruction problems. However, CNNs are generally difficult-to-understand black-boxes. Accordingly, it is challenging to know when they will work and, more importantly, when they will fail. This limitation is a major barrier to their use in safety-critical applications like medical imaging: Is that blob in the reconstruction an artifact or a tumor?

In this work we use Stein’s unbiased risk estimate (SURE) to develop per-pixel confidence intervals, in the form of heatmaps, for compressive sensing reconstruction using the approximate message passing (AMP) framework with CNN-based denoisers. These heatmaps tell end-users how much to trust an image formed by a CNN, which could greatly improve the utility of CNNs in various computational imaging applications.


Ruangrawee Kitichotkul, Christopher A. Metzler, Frank Ong, Gordon Wetzstein1 \addressDepartment of Electrical Engineering at Stanford University
{rk22,cmetzler} \addressStanford University
Department of Electrical Engineering
350 Jane Stanford Way, Stanford, CA


Compressive Sensing, Approximate Message Passing, CNN, MRI

1 Introduction

Computational imaging (CI) systems, like magnetic resonance imaging (MRI), can generally be described by the equation


where denotes the measurements, models the linear measurement operator/matrix, is the vectorized latent image, and is additive noise. The goal of a computational imaging reconstruction algorithm is to reconstruct from .

When the reconstruction problem is underdetermined, and is known as compressive sensing (CS). CS reconstruction algorithms impose a prior, implicitly or explicitly, to form a reconstruction, , of from . While historically this prior was sparsity in some basis [5], the sparsity model has largely been superseded: Modern “hand-designed” methods achieve far better performance by imposing more elaborate priors, such as non-local self-similarity [7]. Meanwhile, learning-based methods, which impose priors with convolutional neural networks (CNNs), offer better performance still [31].

CNNs learn priors from vast quantities of training data, which they use to tune thousands to millions of parameters. In general, it is unclear how each parameter contributes to the performance of the algorithm and it is difficult to know if and when a CNN-based method will successfully reconstruct an image.

Expected mean squared error (MSE), i.e. risk, is the gold standard for evaluating a CS reconstruction algorithm. However, in general computing the risk requires access to the ground truth image – which defeats the point of reconstruction in the first place.

In this work, we demonstrate that when used in conjunction with the approximate message passing (AMP) framework [9], which decouples the CS reconstruction problem into a series of additive white Gaussian noise (AWGN) denoising problems, one can accurately calculate the expected per-pixel MSE associated with CS reconstruction using Stein’s unbiased risk estimate (SURE) [30]. Consequently, we can generate heatmaps of low-pass filtered per-pixel MSE estimates without requiring access to the latent image. We also apply this framework to the Variable Density AMP (VDAMP) algorithm [26], an MRI reconstruction algorithm which decouples the problem into a series of additive colored Gaussian noise denoising problems. These uncertainty heatmaps could inform end-users about the reliability of image reconstructions and could also serve as supplementary information for an artifact-removal algorithm [15] or to guide an adaptive sampling strategy [17].

2 Related Work

Researchers have long sought to qualify the uncertainty associated CNN-based reconstructions. The importance of this problem was recently highlighted in [2, 13], where the authors showed how slight perturbations to a compressively sampled MRI signal can lead to vastly different, but still plausible looking, reconstructions.

If one assumes the latent image lies in the range of a generative network, one can use RIP-like conditions to guarantee recovery when the network is sufficiently expansive [4, 16] or invertible [12]. By looking at the distribution of an invertible network’s latent variables, one can then estimate the uncertainty associated with a reconstruction [3].

Alternatively, when dealing with probabilistic neural networks, as exemplified by variational autoencoders [18], one can sample from , and thereby reason about the variance, but not the bias, associated with the reconstruction  [1]. Similarly, bootstrap and jacknife resampling methods [33] as well as a combination of variational dropout and input-dependent noise models [32] can be used to estimate the variance of a reconstruction. One can even train a CNN to identify motion artifacts [19].

The majority of these method however can only characterize the variance associated with the reconstruction. They do not accurately predict the mean squared error, which is effected by bias as well.

Recently, Edupuganti et al. predicted the per-pixel mean-squared error associated with reconstructed MRI images using SURE [10]. However, in order to apply SURE, their method assumes that the difference between the true signal and an initial estimate, formed with density compensated least squares (DCLS), follows a distribution that is both Gaussian and white. As demonstrated in Figure 1, the latter assumption does not hold in practice: The “effective noise”, i.e., the difference between the estimate and the truth, demonstrates obvious structure when represented in the wavelet domain. These correlations invalidate the standard SURE approach, which applies only to i.i.d. Gaussian noise.

3 Background

3.1 Stein’s Unbiased Risk Estimate

SURE was first developed by its namesake several decades ago [30]. Given a noisy signal , where follows a Gaussian distribution with known covariance , SURE states that one can form an unbiased estimate of the mean squared error (MSE), , via the expression


where denotes its divergence, defined as


Since its introduction, SURE has been used extensively to tune algorithms. It is at the heart of the well-known SURE-shrink denoising algorithm [8] and has been used extensively for tuning the parameters within various iterative reconstruction algorithms as well [27, 14, 26, 25]. SURE has also been combined with deep learning to train CNNs without ground truth data [23, 29, 35] and been used to predict the error associated with a denoising algorithm’s reconstruction [6].



Figure 1: DCLS effective noise. An illustration of the effective noise in the wavelet domain following a density compensated least squares reconstruction of a compressively sampled MRI image: The noise does not follow an i.i.d. Gaussian distribution.


Figure 2: VDAMP effective noise. An illustration of the effective noise in the wavelet domain within an iteration of VDAMP while reconstructing a compressively sampled MRI image: The effective noise is approximately i.i.d. within each wavelet subband.

3.2 Approximate Message Passing

Approximate message passing (AMP), presented in Algorithm 1, is a simple iterative algorithm for reconstructing a signal from i.i.d. Gaussian measurements [9], i.e.,  for all . AMP resembles a projected gradient descent algorithm but comes with an additional term, , known as the Onsager correction. The Onsager correction ensures that at every iteration the effective noise, that is the difference between and the ground truth signal , follows a white Gaussian distribution with variance .

Input : Observation , Denoiser , Measurement matrix
Output : Reconstructed image
Initialize ;
for  do
end for
Algorithm 1 AMP

Variable Density AMP (VDAMP) is a recent extension to AMP designed to solve the CS reconstruction problem when dealing with variable density sampled Fourier measurements [26]. Through multiscale updates in the wavelet domain, it ensures that the effective noise follows a colored Gaussian distribution with a known covariance matrix. This covariance matrix is diagonal when represented in the wavelet domain. Figure 2 illustrates the empirical distribution of the effective noise associated with VDAMP.

While originally designed with simple, soft-thresholding based denoisers , both AMP and VDAMP can be extended to incorporate more advanced denoisers, such as CNNs. The resulting Denoising-based AMP (D-AMP) and VDAMP (D-VDAMP) algorithms offer state-of-the-art performance when dealing with i.i.d. Gaussian and variable density sampled Fourier measurements, respectively [21, 22, 24].

4 Method

In this work, we combined SURE with the denoising-based version of AMP and VDAMP to generate per-pixel mean-squared error estimates associated with reconstructions of compressively sampled images.

4.1 Uncertainty Quantification for D-AMP

At each iteration, D-AMP solves a denoising problem described by


where is the noise at the -th iteration, and is the corresponding noisy image. The final estimate formed by iterations of AMP is . Because this is the output of a simple AWGN denoising problem, SURE can be used to estimate the mean squared error associated with this reconstruction.

When a closed form expression for is not available, it can be estimated with the following Monte-Carlo estimate [28]


where and is a small number, chosen to be in this work, and is the number of Monte-Carlo samples used in the approximation. To generate a per-pixel SURE heatmap, we compute SURE for overlapping patches of the reconstruction and average the result.

4.2 Uncertainty Quantification for D-VDAMP

At each iteration, D-VDAMP solves a denoising problem described by


where denotes a circular Gaussian distribution with independent real and imaginary parts, each of which has mean and the covariance matrix , and denotes the wavelet transform matrix. (We use a four-level 2-D Haar transform throughout this paper.) As before, the final estimate associated with the denoising-based version of VDAMP is .2

As demonstrated in the Generalized SURE work [11], an unbiased risk estimate for removing colored Gaussian noise is


where .

We can extend this estimate to the complex case by noting that We next note that with , the similarity invariance of the trace function implies . We are then left with


where .

To estimate the divergence, we let . Now we have

and can use the Monte-Carlo approximation (5) to obtain


which we apply independently to both the real and imaginary parts of .

As before, we generate per-pixel SURE heatmaps by averaging the overlapping estimated risks of square patches.

5 Experiment

5.1 Setting

We test our SURE heatmap generation method with CS reconstructions using D-AMP (Gaussian measurement matrices) and D-VDAMP (subsampled Fourier measurement matrices). For D-AMP, the sampling rate, , is 5% and the SNRs are 23dB and 18dB for the natural image and the MR image, respectively. For D-VDAMP, the sampling rate is 25% and the SNR is 20dB. The Fourier coefficients were selected using polynomial variable density sampling [20]. Both D-AMP and D-VDAMP used a collection of DnCNN [34] denoisers trained for multiple noise levels from to . The natural images were while the MR images were .



Figure 4: Normalized absolute difference between the SURE heatmap and the patch-average (effectively low-pass filtered) MSE heatmap, which is generated by averaging overlapping patches of MSEs in the same fashion as the SURE heatmap generation. Data is for a CS reconstruction using D-AMP.

5.2 Accuracy-resolution trade-off


[width=.32]figures/hand-damp-mse-pwise.png MSE{overpic}[width=.32]figures/hand-damp-w1-k3-sure.png Patch width = 1{overpic}[width=.32]figures/hand-damp-w16-k3-sure.png Patch width = 16

Figure 5: SURE heatmaps with small patch sizes. The left heatmap is the MSE. The middle and the right heatmaps are SURE heatmaps of a CS reconstruction with D-AMP generated by using patch widths of 1 pixel and 16 pixels respectively. The number of Monte-Carlo samples, , is 3 for both heatmaps.

We first investigate the accuracy of the MSE estimate as a function of patchsize. Figure 4 compares the average difference squared between the SURE estimate and the true MSE of the image as one increases the patch sizes used in the SURE estimates. We observe that, due primarily to the reduced variance of the data fidelity term (), the SURE heatmaps become more accurate as the patch size increases. Increasing the number of Monte-Carlo samples, , has only a slight effect on the accuracy of the estimate. Figure 5 compares the heatmaps formed with various patch sizes. While smaller patch-sizes are higher resolution, larger patch sizes result in more accurate MSE estimates. We found patches provided a nice trade-off between resolution and accuracy.

5.3 Results

Figure LABEL:fig:heatmap generates the SURE heatmaps for D-AMP and D-VDAMP reconstructions using a patch size of pixels. While somewhat low resolution, the shapes and magnitudes of the heatmaps closely follow the true pixelwise MSEs. These heatmaps, which do not require the ground truth, could prove valuable for medical diagnosis and other safety-critical applications.

6 References



  1. thanks: R.K. was supported by the Stanford Research Experience for Undergraduates (REU) program. C.M. was supported by an appointment to the Intelligence Community Postdoctoral Research Fellowship Program at Stanford University administered by Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy and the Office of the Director of National Intelligence (ODN). G.W. was supported by an NSF CAREER Award (IIS 1553333), a Sloan Fellowship, and a PECASE by the ARL.
  2. The original VDAMP work, which was based on soft wavelet thresholding, included an additional gradient step after denoising  [26]. In [24], the authors found this term hurts the algorithm’s performance when dealing with more advanced denoising algorithms, and so we do not adopt it here.


  1. J. Adler and O. Öktem (2019) Deep posterior sampling: uncertainty quantification for large scale inverse problems. In International Conference on Medical Imaging with Deep Learning–Extended Abstract Track, Cited by: §2.
  2. V. Antun, F. Renna, C. Poon, B. Adcock and A. C. Hansen (2019) On instabilities of deep learning in image reconstruction-does ai come at a cost?. arXiv preprint arXiv:1902.05300. Cited by: §2.
  3. L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E. W. Pellegrini, R. S. Klessen, L. Maier-Hein, C. Rother and U. Köthe (2018) Analyzing inverse problems with invertible neural networks. arXiv preprint arXiv:1808.04730. Cited by: §2.
  4. A. Bora, A. Jalal, E. Price and A. G. Dimakis (2017) Compressed sensing using generative models. In International Conference on Machine Learning, pp. 537–546. Cited by: §2.
  5. E. J. Candès, J. Romberg and T. Tao (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory 52 (2), pp. 489–509. Cited by: §1.
  6. C. Deledalle, V. Duval and J. Salmon (2012) Non-local methods with shape-adaptive patches (nlm-sap). Journal of Mathematical Imaging and Vision 43 (2), pp. 103–120. Cited by: §3.1.
  7. W. Dong, G. Shi, X. Li, Y. Ma and F. Huang (2014) Compressive sensing via nonlocal low-rank regularization. IEEE Transactions on Image Processing 23 (8), pp. 3618–3632. Cited by: §1.
  8. D. L. Donoho and I. M. Johnstone (1995) Adapting to unknown smoothness via wavelet shrinkage. Journal of the american statistical association 90 (432), pp. 1200–1224. Cited by: §3.1.
  9. D. L. Donoho, A. Maleki and A. Montanari (2009) Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106 (45), pp. 18914–18919. External Links: Document, ISSN 0027-8424, Link, Cited by: §1, §3.2.
  10. V. Edupuganti, M. Mardani, S. Vasanawala and J. Pauly (2020) Uncertainty quantification in deep mri reconstruction. IEEE Transactions on Medical Imaging. Cited by: §2.
  11. Y. C. Eldar (2009) Generalized sure for exponential families: applications to regularization. IEEE Transactions on Signal Processing 57 (2), pp. 471–481. Cited by: §4.2.
  12. A. C. Gilbert, Y. Zhang, K. Lee, Y. Zhang and H. Lee (2017) Towards understanding the invertibility of convolutional neural networks. arXiv preprint arXiv:1705.08664. Cited by: §2.
  13. N. M. Gottschling, V. Antun, B. Adcock and A. C. Hansen (2020) The troublesome kernel: why deep learning for inverse problems is typically unstable. arXiv preprint arXiv:2001.01258. Cited by: §2.
  14. C. Guo and M. E. Davies (2015) Near optimal compressed sensing without priors: parametric sure approximate message passing. IEEE Transactions on Signal Processing 63 (8), pp. 2130–2141. Cited by: §3.1.
  15. S. Guo, Z. Yan, K. Zhang, W. Zuo and L. Zhang (2019) Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1712–1722. Cited by: §1.
  16. P. Hand and V. Voroninski (2018) Global guarantees for enforcing deep generative priors by empirical risk. In Conference On Learning Theory, pp. 970–978. Cited by: §2.
  17. S. Ji, Y. Xue and L. Carin (2008) Bayesian compressive sensing. IEEE Transactions on signal processing 56 (6), pp. 2346–2356. Cited by: §1.
  18. D. P. Kingma and M. Welling (2019) An introduction to variational autoencoders. arXiv preprint arXiv:1906.02691. Cited by: §2.
  19. T. Küstner, A. Liebgott, L. Mauch, P. Martirosian, F. Bamberg, K. Nikolaou, B. Yang, F. Schick and S. Gatidis (2018) Automated reference-free detection of motion artifacts in magnetic resonance images. Magnetic Resonance Materials in Physics, Biology and Medicine 31 (2), pp. 243–256. Cited by: §2.
  20. M. Lustig, D. Donoho and J. M. Pauly (2007) Sparse mri: the application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. Cited by: §5.1.
  21. C. A. Metzler, A. Maleki and R. G. Baraniuk (2016) From denoising to compressed sensing. IEEE Transactions on Information Theory 62 (9), pp. 5117–5144. Cited by: §3.2.
  22. C. A. Metzler, A. Mousavi and R. Baraniuk (2017) Learned d-amp: principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan and R. Garnett (Eds.), pp. 1772–1783. External Links: Link Cited by: §3.2.
  23. C. A. Metzler, A. Mousavi, R. Heckel and R. G. Baraniuk (2018) Unsupervised learning with stein’s unbiased risk estimator. External Links: 1805.10531 Cited by: §3.1.
  24. C. A. Metzler and G. Wetzstein D-vdamp: denoising-based approximate message passing for compressive mri. Under Review. Cited by: §3.2, footnote 1.
  25. C. Millard, A. T. Hess, B. Mailhe and J. Tanner (2020) An approximate message passing algorithm for rapid parameter-free compressed sensing mri. In 2020 IEEE International Conference on Image Processing (ICIP), pp. 91–95. Cited by: §3.1.
  26. C. Millard, A. T. Hess, B. Mailhé and J. Tanner (2020) Approximate message passing with a colored aliasing model for variable density fourier sampled images. arXiv preprint arXiv:2003.02701. Cited by: §1, §3.1, §3.2, footnote 1.
  27. A. Mousavi, A. Maleki and R. G. Baraniuk (2013) Parameterless optimal approximate message passing. arXiv preprint arXiv:1311.0035. Cited by: §3.1.
  28. S. Ramani, T. Blu and M. Unser (2008) Monte-carlo sure: a black-box optimization of regularization parameters for general denoising algorithms. IEEE Transactions on Image Processing 17 (9), pp. 1540–1554. Cited by: §4.1.
  29. S. Soltanayev and S. Y. Chun (2018) Training deep learning based denoisers without ground truth data. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi and R. Garnett (Eds.), pp. 3257–3267. External Links: Link Cited by: §3.1.
  30. C. M. Stein (1981) Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9 (6), pp. 1135–1151. Cited by: §1, §3.1.
  31. J. Sun, H. Li and Z. Xu (2016) Deep admm-net for compressive sensing mri. In Advances in neural information processing systems, pp. 10–18. Cited by: §1.
  32. R. Tanno, D. Worrall, E. Kaden, A. Ghosh, F. Grussu, A. Bizzi, S. N. Sotiropoulos, A. Criminisi and D. C. Alexander (2019) Uncertainty quantification in deep learning for safer neuroimage enhancement. arXiv preprint arXiv:1907.13418. Cited by: §2.
  33. M. Tygert, R. Ward and J. Zbontar (2018) Compressed sensing with a jackknife and a bootstrap. arXiv preprint arXiv:1809.06959. Cited by: §2.
  34. K. Zhang, W. Zuo, Y. Chen, D. Meng and L. Zhang (2017) Beyond a gaussian denoiser: residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §5.1.
  35. M. Zhussip, S. Soltanayev and S. Y. Chun (2019) Training deep learning based image denoisers from undersampled measurements without ground truth and without image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10255–10264. Cited by: §3.1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description