# Deep Probabilistic Imaging: Uncertainty Quantification and Multi-modal Solution Characterization for Computational Imaging

## Abstract

Computational image reconstruction algorithms generally produce a single image without any measure of uncertainty or confidence. Regularized Maximum Likelihood (RML) and feed-forward deep learning approaches for inverse problems typically focus on recovering a point estimate. This is a serious limitation when working with underdetermined imaging systems, where it is conceivable that multiple image modes would be consistent with the measured data. Characterizing the space of probable images that explain the observational data is therefore crucial. In this paper, we propose a variational deep probabilistic imaging approach to quantify reconstruction uncertainty. Deep Probabilistic Imaging (DPI) employs an untrained deep generative model to estimate a posterior distribution of an unobserved image. This approach does not require any training data; instead, it optimizes the weights of a neural network to generate image samples that fit a particular measurement dataset. Once the network weights have been learned, the posterior distribution can be efficiently sampled. We demonstrate this approach in the context of interferometric radio imaging, which is used for black hole imaging with the Event Horizon Telescope.

## Introduction

Uncertainty quantification and multi-modal solution characterization are essential for analyzing the results of underdetermined imaging systems. In computational imaging, reconstruction methods solve an inverse problem to recover a hidden image from measured data. When this inverse problem is ill-posed there are infinite image solutions that fit the observed data. Occasionally these varied images lead to different scientific interpretations; thus it is important to be able to characterize the distribution of possibilities. In a Bayesian framework, this problem could ideally be addressed by accurately modeling the measurement noise, formulating an estimation problem, and computing the posterior distribution of the hidden image. However, this analytical approach is only tractable in simple cases. When the inverse problem is non-convex or the measurement noise is complicated (e.g., non-Gaussian) the posterior distribution can quickly become intractable to compute analytically.

In this paper, we propose Deep Probabilistic Imaging (DPI), a new variational inference approach that employs a deep generative model to learn a reconstructed image’s posterior distribution. More specifically, we represent the posterior probability distribution using an invertible flow-based generative model. By training with a traditional maximum a posteriori (MAP) loss, along with a loss that encourages distribution entropy, the network converges to a generative model that approximately samples from the image posterior; the model returns a posterior sample image as output for a random Gaussian sample input. Our proposed approach enables uncertainty quantification and multi-modal solution characterization in non-convex inverse imaging problems. We demonstrate our method on both synthetic and real astronomical radio interferometric imaging data. High resolution radio interferometric imaging often requires a highly non-convex forward model, occasionally leading to multi-modal solutions.

## Related Work

### Computational Imaging

The goal of a computational image reconstruction method is to recover a hidden image from measured data. Imaging systems are often represented by a deterministic forward model, , where are the observed measurements of a hidden image, . A regularized maximum likelihood (RML) optimization can be solved to reconstruct the image:

(1) |

where is the reconstructed image, is the data fitting loss between the measured data and the forward model prediction, is a regularization function on the image, and is the hyper-parameter balancing the data fitting loss and the regularization term. The regularization function is necessary for obtaining a unique solution in under-constrained systems. Commonly used regularization functions include L1 norm, total variation (TV) or total squared variation (TSV) [5, 19], maximum entropy (MEM)[25], multivariate Gaussian [29], and sparsity in the wavelet transformed domain [9].

Assuming the forward model and measurement noise statistics are known, one can derive the probability of measurements, , conditioned on the hidden image, , as . In a Bayesian estimation framework, the regularized inverse problem can be interpreted as a maximum a posteriori (MAP) estimation problem:

(2) |

where the log-likelihood of the measurements correspond to the negative data fitting loss in Eq. 1, and the prior distribution of the image defines a regularization function (also referred to as an image prior).

#### Deep Image Prior

In lieu of constraining the image via an explicit regularization function, , Deep Image Prior approaches[27] parameterize the image as a deep neural network and implicitly force the image to respect low-level image statistics via the structure of a neural network:

(3) |

In this approach, is a deep generator, are the neural network weights, are the optimized weights, and is a randomly chosen, but fixed, hidden state. The network weights are randomly initialized and optimized using gradient descent. This approach has been demonstrated in many applications, including inpainting, denoising and phase retrieval[27, 4]. Note that since the hidden state, , is fixed, only a single reconstructed image is derived after optimization.

### Uncertainty Quantification

Uncertainty quantification is important for understanding the confidence of recovered solutions in inverse problems.

#### MCMC Sampling

One widely used approach for quantifying the uncertainty in image reconstruction is the Markov Chain Monte Carlo (MCMC) method. MCMC provides a way to approximate the posterior distribution of a hidden image via sampling [2, 7]. However, the MCMC approach can be prohibitively slow for high dimensional inverse problems [8].

#### Variational Bayesian Methods

Variational inference is widely used to approximate intractable posterior distributions [3, 1]. Instead of directly computing the exact posterior distribution, variational Bayesian methods posit a simple family of density functions and solve an optimization problem to find a member of this family closest to the target posterior distribution. Variational Bayesian methods are much faster than sampling methods (e.g., MCMC), and typically achieve comparable performance [15]. The performance of variational methods depends on the modeling capacity of the density function family and the complexity of the target distribution. A commonly used technique for simplifying the density function is the mean-field approximation[17], where the distributions of each hidden variables are assumed independent. The density function can also be parameterized using neural networks, such as flow-based generative models [23].

#### Bayesian Neural Networks

Deep learning has become a powerful tool for computational imaging reconstruction[22]. Most current deep learning approaches only focus on the point estimate of the hidden image; however, Bayesian neural networks [21, 13] have been developed to quantify the uncertainty of deep learning image reconstructions[28]. In particular, the weights of a Bayesian network are modeled as probability distributions, so that different predictions are obtained every time the network is executed. Although these approaches can achieve impressive performance, they rely on supervised learning and only describe the reconstruction uncertainty conditioned on a training set.

#### Empirical Sampling

An alternative empirical approach for obtaining multiple candidate reconstructions is to solve a regularized inverse problem (Eq. 1) multiple times with different choices of regularizer hyper-parameters (e.g. in Eq. 1) and image initializations. This approach was used in [26] to characterize the uncertainty of the reconstructed black hole image M87*. Although the mean and standard deviation of these images provide a measure of uncertainty, there is no expectation that these samples satisfy a posterior distribution defined by the measurement data. In fact, this method only quantifies the reconstruction uncertainty due to choices in the reconstruction methods, such as regularizer hyper-parameters, instead of the uncertainty due to measurement noise and sparsity.

### Flow-based Generative Models

Flow-based generative models are a class of generative models used in machine learning and computer vision for probability density estimation. These models approximate an arbitrary probability distribution by learning a invertible transformation of a generic distribution (e.g. normal distribution). Mathematically, a flow-based generative model is represented as

(4) |

where is an invertible deep neural network parameterized by that links a sample from the target distribution, , with a hidden state, . The application of invertible transformations enables both efficient sampling, as well as exact log-likelihood computation. The log-likelihood of a sample can be analytically computed based on the “change of variables theorem”:

(5) |

where is the determinant of the generative model’s Jacobian matrix.

To keep the computation of the second term tractable, the neural network function, , is restricted to forms such as NICE[11], Real-NVP[12] and Glow[18]. In these network architectures, the Jacobian matrix is a multiplication of only lower triangular matrices or quasi-diagonal matrices, which leads to efficient computation of the determinant.

## Method

In this paper, we propose a variational Bayesian method to learn an approximate posterior distribution for the purpose of efficiently characterizing uncertainty in underdetermined imaging systems. We parameterize the latent image distribution using a flow-based generative model,

(6) |

and learn the model’s weights by minimizing the KullbackâLeibler (KL) divergence between the generative model distribution, , and the image posterior distribution, :

(7) | ||||

Note that this loss can be interpreted as an an expectation over the maximum a posteriori (MAP) loss from Eq. 2 with an added term to encourage entropy on the image distribution. Minimizing the negative entropy term, , prevents the generative model from collapsing to a deterministic solution.

For most deep generative models the sample likelihood, , cannot be evaluated exactly. However, since a sample’s likelihood can be computed according to Eq. 5 for flow-based models, this stochastic optimization problem can be rewritten as

(8) |

Approximating the expectation using a Monte Carlo method, and replacing the data likelihood and prior terms with the data fitting loss and regularization functions from Eq. 1, we obtain the optimization problem

(9) |

where , is the number of Monte Carlo samples, and the term is omitted since its expectation is constant. The expectation of the data fitting loss and image regularization loss are optimized by sampling images from the generative model . Note that when does not define the true data likelihood, or does not define an image prior, the learned network only models an approximate image posterior instead of the true posterior distribution.

The data fitting loss and the regularization function are often empirically defined and may not match reality perfectly. Recalling that the third term is related to the entropy of the learned distribution, similar to -VAE[16], we introduce another tuning parameter to control the diversity of the generative model samples,

(10) |

When the uncertainty of the reconstructed images seems smaller than expected, we can increase to encourage higher entropy of the generative distribution; otherwise, we can reduce to reduce the diversity of reconstructions. Larger also encourages more exploration during training, which can be used to accelerate convergence.

## Toy Examples

We first study our method using two-dimensional toy examples. Assuming is two-dimensional, and the joint distribution is given exactly by the potential function , Eq. Method can be simplified to

(11) |

For the following toy tests the generative model is designed using a Real-NVP architecture with 32 affine coupling layers [12].

We test the approach on 3 joint distribution functions: (a) a Gaussian mixture potential, (b) a bowtie potential, and (c) a Sinusoidal potential. Fig. 1 shows the true and learned probability density function in these three cases for . Qualitatively, the learned generative model distributions match the true distributions well.

As derived in Eq. 7, the posterior should be best learned by for entropy weight . To test this claim, we adjust from to to see how the entropy term influences the learned generative model distribution. According to the graph of KL divergence versus in Fig. 1, the learned distributions match the true distributions best when the entropy weight equals . This verifies the theoretical expectation presented in the Method section.

Since the generative model is a transformation of a continuous multivariate Gaussian distribution, the learned distribution is also continuous. This leads to a common issue in flow-based generative models: there are often a few samples located in the high loss regions when the modes are not connected (see Fig. 1 distribution (a) and (b)). Some approaches[14] have been proposed recently to solve this problem, however, in this paper we neglect this issue and leave it for future work.

## Interferometric Imaging Case Study

We demonstrate our proposed approach on the problem of radio interferometric astronomical imaging [24]. In interferometric imaging, radio telescopes are linked to obtain sparse spatial frequency measurements of an astronomical target. These Fourier measurements are then used to recover the underlying astronomical image. Since the measurements are often very sparse and noisy, there can be multiple image modes that fit the observed data well. The Event Horizon Telescope (EHT) used this technique to take the first picture of a black hole, by linking telescopes across the globe[26]. Fig. 2 shows the spatial frequency (Fourier) samples that can be measured by a 9-telescope EHT array when observing the black hole Sagittarius A* (Sgr A*)^{1}

Each Fourier measurement, or so-called “visibility”, is obtained by correlating the data from a pair of telescopes. The measurement equation for each visibility is given by

(12) |

where and index the telescopes, represents time, and extracts the Fourier component from image corresponding to the baseline between telescope and at time . The measurement noise comes from three sources: (1) time-dependent telescope-based gain error, and , (2) time-dependent telescope-based phase error, and , and (3) baseline-based Gaussian thermal noise, . The standard derivation of thermal noise depends on each telescope’s System Equivalent Flux Density (SEFD):

(13) |

When the gain and phase errors are reasonably small, the interferometric imaging problem is approximately a convex inverse problem. However, when the gain errors and the phase errors (caused by atmospheric turbulence and instrument miscalibration) are large, the noisy visibilities can be combined into robust data products that are invariant to telescope-based errors, termed closure phase, , and closure amplitude, [10]:

(14) |

These nonlinear “closure quantities” can be used to constrain non-convex image reconstruction

(15) |

where and are the standard deviations of the corresponding closure term computed based on SEFDs. Note the closure quantities are not independent and the corresponding standard deviations are derived from linearization, so Eq. 15 only approximates the true data log-likelihood.

In the following contents, we demonstrate our Deep Probabilistic Imaging (DPI) approach on both convex reconstruction with complex visibilities and non-convex reconstruction with closure quantities using both synthetic and real datasets. With this new approach, we successfully quantify the uncertainty of interferometric images, as well as detect multiple modes in some data sets.

#### Implementation

For all interferometric imaging results we use a DPI flow-based Real-NVP [12] network architecture with 48 affine coupling layers. The scale-shift function of each layer is modeled as a U-Net style fully connected network with five hidden layers and skip connections. We train the model using Adam with a batch size of 32 for 20,000 epochs. We note that a limitation of our general approach is the modeling capacity of the flow-based generative model used. We find the proposed architecture is satisfactory for characterizing uncertainty in images of size 32 32 pixels.

### Convex Imaging with Visibilities

In this section, we demonstrate DPI on convex interferometric imaging. In particular, the gain and phase errors in Eq. 12 are assumed to be zero (i.e., and ) so that complex visibilities are a linear function of the latent image. Since the thermal noise on is independent and Gaussian, we write the conditional likelihood as

(16) |

where is a column vectorized image with pixels, is a column vector of complex visibility measurements, is a under-sampled Fourier transform matrix of size , and is a measurement covariance matrix derived according to the telescopes’ SEFD.

In order to verify that the flow-based generative model can learn the posterior distribution of reconstructed images, we employ a multivariate Gaussian image prior,

(17) |

where is a mean image, and is the covariance matrix defined by the empirical power spectrum of an image[6]. Since both the measurement and image prior follow Gaussian distributions, the reconstructed image’s posterior distribution is also a Gaussian distribution and can be analytically derived as

(18) |

Using the specified data likelihood and prior, we train a DPI flow-based network to produce image samples of size 3232 pixels with a field of view of 160 micro-arcseconds (). Fig. 3 demonstrates DPI on a synthetic interferometric imaging example, and compares the learned generative model distribution with the analytical posterior distribution. Visibility measurements are derived from the synthetic black hole image shown in the top of Fig. 3 (with a total flux of 2 Janskys). The second row of the Fig. 3 shows the analytic posterior’s mean, standard deviation, and full covariance. The third and fourth rows of the figure show the mean, standard deviation, and full covariance empirically derived from DPI samples under two slightly different architectures: (A) the third row uses a model with 48 affine coupling layers, and (B) the fourth row adds an additional Softplus layer to the model to enforce the physical constraint of non-negativity in the image distribution. Without a non-negativity constraint, Architecture A’s learned distribution is very similar to the analytical posterior distribution, since it is solving a same Bayesian estimation problem as defined in Eq. Convex Imaging with Visibilities. However, this Bayesian estimation problem does not constrain the image to be positive; as a result, the central depression in the image has an inflated standard deviation. Architecture B’s non-negative model results in a more constraining uncertainty map while achieving a slightly higher resolution reconstruction. This example also demonstrates how DPI can introduce implicit regularization through the neural network architecture.

Image distributions with different levels of sample diversity can be learned by adjusting the entropy loss weight, . As expected, both generative models reach lowest KL divergence with the analytic distribution when .

### Non-convex Imaging with Closure Quantities

In this section, we demonstrate DPI on non-convex interferometric imaging, where we reconstruct images using the closure quantities defined in Eq. 14. With this non-convex forward model, the posterior of reconstructed images cannot be analytically computed, but it can be estimated using DPI. In all DPI reconstructions, the resulting images are 3232 pixels and result from the non-negative Real-NVP model discussed above (Architecture B).

#### Multi-modal Posterior Distributions

A serious challenge for non-convex image reconstruction is the potential for multi-modal posterior distributions: visually-different solutions fit the measurement data reasonably well. In some cases, multiple modes can be identified by optimizing a regularized maximum likelihood (RML) loss with different image initializations; for example, Fig. 5 (top) shows ten RML reconstructed images obtained using the closure quantities (Eq. 15) from the target shown in Fig. 3 and the multivariate Gaussian regularizer defined in Eq. 17. From these results two potential image modes, which appear to be roughly 180 degree rotations of one another, clearly stand out as fitting the data well. Fig. 5 (bottom) shows ten images sampled from a DPI flow-based generative model learned with multivariate Gaussian regularization. Note that the single generative model has captured the two modes identified by multiple runs of RML.

Fig. 4 analyzes 1024 generative samples from a DPI model learned with multivariate Gaussian regularization. The dimensionality reduction t-SNE plot[20] indicates a clustering of samples into two modes. Figure 4 (middle) shows the pixel-wise mean, standard deviation and fractional standard deviation of samples for each mode.
The distributions of data fitting loss (reduced ) for images in each mode are shown for both closure phase and log closure amplitude constraints; a reduced value of 1 is optimal. Although it can be difficult to tell which image is correct by inspecting the statistics of a single image, by analyzing the histogram of statistics for each mode it becomes clearer which mode corresponds with the true image.
In the supplemental material^{2}

#### Real Interferometric Data

In Fig. 7 we demonstrate the performance of DPI using the publicly available EHT 2017 data, which was used to produce the first image of the black hole in M87.
In accordance with [26], we use a data fitting loss with not only closure quantities (Eq. 15) but also roughly-calibrated visibility amplitudes. We pair this data likelihood with a combined maximum entropy (MEM) and total squared variation (TSV) regularizer (see the supplementary material for details^{3}

### Visualizing Uncertainty

DPI sampling provides a means to visualizing uncertainty, especially in cases where closed form approximations are insufficient. By embedding samples from our DPI model in a two-dimensional space, we are able to visualize the posterior’s approximate structure. Fig. 6 plots the embedding of DPI samples obtained using t-SNE, and shows the images corresponding to some of these samples. By plotting the blurred truth image in the same embedding we see that the truth often lies close to the posterior samples in the embedded space, though it is not guaranteed. The posterior is sensitive to the choice of image prior, and is most informative when the true image is a sample from the imposed prior.

The mean and standard deviation of DPI samples, as well as the average absolute error with respect to the blurred truth, are shown for each learned distribution in Fig. 6. Since closure quantities do not constrain the absolute flux or position of the reconstructed source, we first normalize and align samples from the closure-constrained DPI model to account for scaled and shifted copies of similar images. Note that the pixel-wise standard deviation aligns well with areas of high error in the generative model’s samples.

## Conclusion

In this paper, we present deep probabilistic imaging (DPI): a new framework for uncertainty quantification and multi-modal solution characterization for underdetermined image reconstruction problems. The method parameterizes the posterior distribution of the reconstructed image as an untrained flow-based generative model, and learns the neural network weights using a loss that incorporates the conditional data likelihood, prior of image samples, and the model’s distribution entropy.

We demonstrate the proposed method on toy examples as well as both synthetic and real interferometric imaging problems. Experiments show the proposed method can approximately learn the image posterior distribution in both convex and non-convex inverse problems, which enables efficiently quantifying the uncertainty of reconstructed images and detecting multi-modal solutions.

### Footnotes

- The largest sampled spatial frequency determines the interferometer’s nominal resolution of 25 for the EHT. In this paper, we neglect evolution of Sgr A* and assume it is static over the course of a night.
- http://users.cms.caltech.edu/~hesun/DPIsupplement.pdf
- footnotemark:

### References

- (2019-07) Unified radio interferometric calibration and imaging with joint uncertainty quantification. \aap 627, pp. A134. External Links: Document, 1903.11169 Cited by: Variational Bayesian Methods.
- (2012) MCMC-based image reconstruction with uncertainty quantification. SIAM Journal on Scientific Computing 34 (3), pp. A1316–A1332. Cited by: MCMC Sampling.
- (2017) Variational inference: a review for statisticians. Journal of the American statistical Association 112 (518), pp. 859–877. Cited by: Variational Bayesian Methods.
- (2020) Deep phase decoder: self-calibrating phase microscopy with an untrained deep neural network. Optica 7 (6), pp. 559–562. Cited by: Deep Image Prior.
- (1993) A generalized gaussian image model for edge-preserving map estimation. IEEE Transactions on image processing 2 (3), pp. 296–310. Cited by: Computational Imaging.
- (2018) Reconstructing video of time-varying sources from radio interferometric measurements. IEEE Transactions on Computational Imaging 4 (4), pp. 512–527. Cited by: Convex Imaging with Visibilities.
- (2020) Hybrid very long baseline interferometry imaging and modeling with themis. The Astrophysical Journal 898 (1), pp. 9. Cited by: MCMC Sampling.
- (2018) Uncertainty quantification for radio interferometric imaging–i. proximal mcmc methods. Monthly Notices of the Royal Astronomical Society 480 (3), pp. 4154–4169. Cited by: MCMC Sampling.
- (2007) Sparsity and incoherence in compressive sampling. Inverse problems 23 (3), pp. 969. Cited by: Computational Imaging.
- (2018) Interferometric imaging directly with closure phases and closure amplitudes. The Astrophysical Journal 857 (1), pp. 23. Cited by: Interferometric Imaging Case Study.
- (2014) Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516. Cited by: Flow-based Generative Models.
- (2016) Density estimation using real nvp. arXiv preprint arXiv:1605.08803. Cited by: Flow-based Generative Models, Toy Examples, Implementation.
- (2016) Uncertainty in deep learning. University of Cambridge 1 (3). Cited by: Bayesian Neural Networks.
- (2020) Flow contrastive estimation of energy-based models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7518–7528. Cited by: Toy Examples.
- (2012) Nonparametric variational inference. arXiv preprint arXiv:1206.4665. Cited by: Variational Bayesian Methods.
- (2016) Beta-vae: learning basic visual concepts with a constrained variational framework. Cited by: Method.
- (1999) An introduction to variational methods for graphical models. Machine learning 37 (2), pp. 183–233. Cited by: Variational Bayesian Methods.
- (2018) Glow: generative flow with invertible 1x1 convolutions. In Advances in neural information processing systems, pp. 10215–10224. Cited by: Flow-based Generative Models.
- (2018) Superresolution interferometric imaging with sparse modeling using total squared variation: application to imaging the black hole shadow. The Astrophysical Journal 858 (1), pp. 56. Cited by: Computational Imaging.
- (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: Multi-modal Posterior Distributions.
- (1995) Bayesian neural networks and density networks. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 354 (1), pp. 73–80. Cited by: Bayesian Neural Networks.
- (2020) Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory. Cited by: Bayesian Neural Networks.
- (2015) Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. Cited by: Variational Bayesian Methods.
- (2017) Interferometry and synthesis in radio astronomy. Springer Nature. Cited by: Interferometric Imaging Case Study.
- (1984) Maximum entropy image reconstruction-general algorithm. Monthly notices of the royal astronomical society 211, pp. 111. Cited by: Computational Imaging.
- (2019) First m87 event horizon telescope results. iv. imaging the central supermassive black hole. The Astrophysical Journal Letters 875 (1), pp. L4. Cited by: Empirical Sampling, Figure 7, Real Interferometric Data, Interferometric Imaging Case Study.
- (2018) Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: Deep Image Prior.
- (2019) Reliable deep-learning-based phase imaging with uncertainty quantification. Optica 6 (5), pp. 618–629. Cited by: Bayesian Neural Networks.
- (2011) From learning models of natural image patches to whole image restoration. In 2011 International Conference on Computer Vision, pp. 479–486. Cited by: Computational Imaging.