Learning a Probabilistic Strategy for Computational Imaging Sensor Selection
Optimized sensing is important for computational imaging in low-resource environments, when images must be recovered from severely limited measurements. In this paper, we propose a physics-constrained, fully differentiable, autoencoder that learns a probabilistic sensor-sampling strategy for optimized sensor design. The proposed method learns a system’s preferred sampling distribution that characterizes the correlations between different sensor selections as a binary, fully-connected Ising model. The learned probabilistic model is achieved by using a Gibbs sampling inspired network architecture, and is trained end-to-end with a reconstruction network for efficient co-design. The proposed framework is applicable to sensor selection problems in a variety of computational imaging applications. In this paper, we demonstrate the approach in the context of a very-long-baseline-interferometry (VLBI) array design task, where sensor correlations and atmospheric noise present unique challenges. We demonstrate results broadly consistent with expectation, and draw attention to particular structures preferred in the telescope array geometry that can be leveraged to plan future observations and design array expansions.
Computational imaging systems tightly integrate algorithm and sensor design, making it possible to observe phenomena previously difficult or impossible to see with traditional sensors. A common constraint in these imaging systems is that they must operate in low-resource environments. For instance, the measurements collected are severely limited due to radiation dosage in computed tomography (CT) , speed in Magnetic Resonance Imaging (MRI)  and microscopy , and cost in very-long-baseline-interferometry (VLBI) . Imaging methods are then designed to intelligently fill in the gaps of missing information in order to recover the targeted image. We propose a framework to jointly learn the ideal sampling structure of non-linear, correlated measurements simultaneously with the image recovery procedure, for the purpose of sensor design. We apply this framework to study optimal VLBI telescope array geometries in different observing conditions.
A computational imaging system typically consists of two primary components: the sensing system and the image reconstruction method. The sensing system is most often described by a physics-based forward model , which defines how the selected measurements, , are related to the underlying image, . These measurements often experience noise and do not contain enough information on their own to fully specify . Therefore, imaging methods are designed to recover an image, , from the observed measurements by imposing pre-defined assumptions, such as image priors or hard physical constraints.
In addition to developing image reconstruction methods, optimization of the sensor is critical for producing high-quality reconstructions. In many applications the placement (i.e., sampling) of sensors controls the nature of information gathered in measurements. It is especially important that this information be optimized for a specific task when the measurement space can’t be fully observed. Previous work in 3D reconstruction , microscopy [33, 17], and astronomical imaging[35, 4] have proposed strategies to optimize sensor placement to achieve maximally informative measurements. Seminal work in compressed sensing showed that, under particular sampling distributions, it is possible to recover a sparse signal with near perfect accuracy ; this idea revolutionized a number of real-world applications, including MRI . However, these theoretical results only hold under certain assumptions, such as a linear forward model or limited noise. Consequently, these sampling strategies are surely under-optimized and could be improved in complex imaging systems.
The optimal sensing problem is typically idealized and considered independently from the image reconstruction problem. However, these two problems are intimately related: existing imaging algorithms inform what data we are capable of interpreting and therefore should collect, and the particular data we measure informs what imaging algorithms we should use. Inspired by this insight, we address co-designing the sensor-sampling strategy and the image reconstruction algorithm. The complexity of real-world measurements makes it difficult to tackle this co-design problem analytically. Therefore, we propose to solve a joint optimization problem using deep learning, where the optimal sampling policy is represented as a trainable joint probability distribution over candidate sensors. The proposed framework enables 1) tackling complicated sampling problems where measurements are correlated and experience substantial noise, 2) exploring multiple plausible sampling strategies simultaneously rather than focus on a single deterministic design, and 3) qualitatively understanding the importance of each sensor and correlation among sensors.
The content of the paper is organized as follows. In Sec. 2, we review related work on reconstruction algorithms and optimal sensing. In Sec. 3, we present our method that jointly optimizes the sensor-sampling pattern and image reconstruction. In Sec. 4, we demonstrate our method on the astronomical imaging VLBI array design problem to identify which telescopes are most important to observe with in different conditions. In Sec 5, we summarize our work and discuss additional sensor-selection applications.
2 Related Work
2.1 Image Reconstruction Techniques
Regularized Inverse Model
A common approach taken in computational imaging is to reconstruct the target image from limited sensor measurements by solving a regularized inverse problem:
where is the forward model of the underlying imaging system, which produces measurements from the true image , is inspired by the data negative log-likelihood, is a regularization function, is the coefficient balancing the two terms, and is the reconstructed image. The and terms quantify the image’s fit with the measurements and the prior knowledge, respectively. The regularization term shrinks the solution domain, consequently making the solution unique even with an incomplete set of measurements. Typically, this inverse problem can be solved using iterative optimization algorithms when explicit image regularizer functions are imposed, such as total variation (TV), maximum entropy (MEM), sparsity in Fourier or wavelet transformed domain, or locally-learned distributions.
Learned Inverse Model
Deep learning techniques have also been used for computational imaging reconstruction by either directly training a deep reconstruction neural network  or inserting deep learning “priors” into model-based optimization frameworks . Unlike many classic computer vision tasks, such as face or object recognition, computational imaging problems are fundamentally related to the physics of a known imaging system. Therefore, recent deep learning based reconstruction methods focus on incorporating known physics to improve computational efficiency, reconstruction accuracy, and robustness. For example, this can be done by adding physics-informed regularization to the neural network loss function , only learning the residuals between ground truth and physics-based solutions , or unrolling the physics-based optimization procedure as a neural network [14, 15].
2.2 Sensor-Sampling Strategies
Conventionally, sparse sensor sampling is achieved by adopting the Nyquist-Shannon sampling theorem. For instance, in imaging applications with Fourier component measurements, such as VLBI and MRI, since the images are approximately band-limited, the signal sampling rate in the Fourier domain (e.g., -space) can be reduced to twice the maximum frequency . This observation can be used to define metrics that characterize sampling quality independent of image reconstruction techniques . Modern advances in the theory of compressed sensing demonstrate that, when the underlying image is sparse, certain sampling schemes can achieve the same quality reconstruction with an even smaller number of samples [8, 9]. However, both Nyquist sampling and compressed sensing typically consider only linear forward models and assume no noise or only i.i.d. Gaussian noise, which is unrealistic for many real-world problems.
Recently, approaches have been proposed to explore the joint optimization of a sensing strategy and image reconstruction method using deep neural networks. This idea has been used to design the color multiplexing pattern of camera sensors , the LED illumination pattern in Fourier ptychography , the optical parameters of a camera lens , and the exposure pattern in video compressed sensing . However, all these approaches yield only a single deterministic sensing pattern that ignores the possibility of multiple, equally good sensing strategies. In a recent study of “compressed sensing” MRI, the sensing sampling strategy in the -space is modeled as Bernoulli distributions, so that multiple sensing patterns are explored simultaneously . This formulation helps characterize which sampling frequencies are important for different anatomy. However, none of these prior approaches address complicated non-Gaussian noise, or explicitly model the correlations among measurements.
3 Proposed Method
We propose a learning method that jointly optimizes the sensor-sampling distribution and the image reconstruction method for forward models that contain correlated measurements and complicated noise. In Sec. 3.1, we discuss the problem setup. In Sec. 3.2 we introduce the sensor-sampling distribution, formulated as an Ising model. In Sec. 3.3 we discuss the image reconstruction sub-network architecture.
3.1 Joint Sensing and Imaging Optimization
We formulate the joint sensing and imaging optimization as a deep autoencoder. This autoencoder is trained to identify an efficient image representation that is then used to make limited observations, from which an image can be recovered. The encoder represents the sensing strategy and the decoder represents the image reconstruction method. Both the encoder and decoder are formulated as deep neural networks in this framework. However, to ensure the sensing system design obeys physical constraints, the encoder samples from all possible measurements generated by the physics simulator.
This optimization problem can be written as
where is a sensor-sampling pattern, whose distribution is parameterized by , is a function defining the similarity between the reconstruction result and the true image in the image training set, is the regularization on the sensing samples, is the image reconstruction neural network parameterized by , and is the sensing function that produces the observed measurements. In this loss function, the first term quantifies the image reconstruction performance, while the second term quantifies desired properties of the sensor-sampling pattern (e.g., sparsity; see Sec. 3.2.3). These two terms are balanced by the hyper-parameter . The sampling pattern is a binary vector that indicates which of the sensor sources is selected or excluded from observations. This is equivalent to selecting elements from the forward model’s vector of measurements, i.e.
where is the binary selection function based on the mask , whose dimensionality is the same as the forward model . This formulation is a generalization of , which used a similar optimization designed specifically for MRI acquisition.
Based on Eq. (2) and Eq. (3), we build a joint sensing and imaging autoencoder, as shown in Fig. 1. The trainable blocks in our neural network are the sensor-sampling distribution (from which the mask is sampled) and the reconstruction network , where and are the trainable network parameters. In the following sections we describe how to build Eq. (2) as a single, differentiable neural network and solve for and simultaneously.
3.2 Physics-Constrained Encoder
The goal of the encoder in Fig. 1 is to select measurements from the physics-based forward model to pass on to the decoder. The forward model and procedure for masking measurements are both well defined; the challenge is to design a network architecture that can sample a mask from a parameterized distribution. In order to capture the correlations among observations, we model the proposed sensor-sampling distribution in Eq. (2) as a fully-connected, Ising model (binary Markov random field). In this section we explain the Ising model sensor-sampling distribution and describe how to sample from this model using a differentiable network architecture.
In an Ising model, the state of an -element system is represented by a set of binary random variables, . The total energy of a fully-connected Ising model is given by the Hamiltonian function,
where are the Ising model parameters. The first term models the energy of single elements and the second term models the potential from the interactions between pairs of elements. The state’s probability can be derived according to the Boltzmann distribution
where is the partition function,
The lower the total energy of a state, the higher the probability it will occur.
We use a fully-connected Ising model to model the presence (+1) or absence (-1) of a particular sensor, and capture correlations among sensors. Parameter characterizes how important it is to include the -th sensor, and characterizes how the -th and the -th sensor are positively or negatively correlated: if , observing the -th and the -th sensor are favored or disfavored together; while if , the -th sensor is disfavored when the -th is selected, or visa versa. Although an Ising model only explicitly defines pairwise correlations, higher order correlations among sensors are implicitly captured by the model. These higher order correlations can be partially extracted by interpreting the Ising model as an affinity graph and performing clique analysis on the pruned edges (see an example in Sec. 4.3.3).
Solving the optimization problem in Eq. 2 requires sampling the Ising model according to parameters }. However, unlike simple distributions such as Bernoulli, Uniform, or Gaussian, it is difficult to sample an Ising model directly, because the binary variables are correlated and the number of possible states grows exponentially with the dimension of elements. To build a computationally efficient and fully differentiable sampling network, we propose a Markov chain Monte Carlo (MCMC) inspired method.
Differentiable Gibbs Sampling Sub-Network
We design a sub-network to perform successive MCMC sampling iterations of an Ising model (parameterized by ), which enables sampling masks from and facilitates computing Eq. (2).
MCMC methods have been widely used for approximating complicated statistical distributions. In a commonly used MCMC method, Gibbs sampling, one samples from the joint probability distribution by conditionally sampling each variable in turn until convergence. The detailed procedure of Gibbs sampling for a binary state space can be summarized as:
Randomly draw an initial sample
from an independent Bernoulli distribution.
Compute the next sample based on with each component sequentially updated using its conditional distribution given the other states
For each , draw a random number from a uniform distribution
If , assign ; otherwise, assign .
Repeat step 2 for iterations.
Gibbs sampling is often used to sample from Ising models, as the conditional probability of an Ising model can also be described as an Ising model:
where and are the known and unknown/to-be-updated subsystems respectively, and defines the new Ising model parameter conditioned on the known subsystem. Eq. (10) indicates the conditional distribution of a single element is a Bernoulli distribution, making it possible to implement each Gibbs sampling iteration as a neural network layer.
In order to sample a mask from the distribution , we design a neural network where each dense layer corresponds to an iteration of Gibbs sampling for an Ising model. Each Gibbs sampling layer can be mathematically described as a function of :
where is a sign step function and is a sigmoid function,
The initial sample is initialized using uniform random values, . Eq. (11) defines a layered Ising sampling function, which converges to a sample from the Ising model after a sufficient number of iterations. However, the sign function in Eq. (11) has no gradient (zero or undefined), rendering back-propagation strategies impossible. To address this, we replace the sign function with a hyperbolic tangent function
where . Using a moderate slope enables us to approximately binarize the states to between successive Gibbs layers, while still allowing gradients to propagate through the network.
To obtain the sampling mask, , we apply a strong sigmoid function to the final sampling layer of the Ising model to convert the states from approximately to nearly binary states :
where is a slope of the sigmoid function that should be large enough to approximate the Heaviside function well.
For notational simplicity, we represent this sampling procedure as a nested function,
with Ising model coefficient as its parameters, uniformly distributed random numbers, as inputs, and the binary samples in as outputs. The architecture of the Gibbs sampling neural network is shown at the bottom of Fig. 1.
Ising Model Regularization
To encourage sparse samples and promote sample diversity (in order to explore multiple observational strategies), we define the regularization on Ising samples () as consisting of two terms:
where the first term is a regularization on the number of participating sensors (sparsity loss), and the second term is the negative Hamiltonian of the Ising samples (diversity loss). The Hamiltonian of an Ising model is a good approximation of the distribution entropy as it only differs in its log partition function,
Therefore, maximizing the Hamiltonian typically increases the system entropy, thus diversifying the samples from our Ising model. Substituting Eq. (16) into Eq. (2) and approximating the expectation using the empirical average via MCMC samples, we derive a stochastic formulation of the joint optimization problem under Ising sampling,
3.3 Image Reconstruction Decoder
The image reconstruction decoder sub-network is not restricted to any particular neural architecture, but should have enough model capacity to be able to reconstruct with multiple sensing scenarios simultaneously. For imaging problems with a complicated forward model and noise, physics-based neural networks (PbNN)  and unrolled neural networks [14, 31] have proven successful. However, with sufficient training data, general deep learning frameworks such as the U-Net and its variants [26, 20] often perform sufficiently well. The U-Net combines local and global information during image reconstruction by incorporating hierarchical down-sampling and up-sampling blocks and multiple skip connections in its architecture. In our VLBI case study, two variants of the U-Net are used for reconstruction. We discuss the details of their architectures in Sec. 4.2.3.
4 Case Study: VLBI Array Design
Very-Long-Baseline-Interferometry (VLBI) is a radio interferometry technique for high-resolution astronomical imaging. In VLBI, telescopes are linked to form an imperfect, virtual telescope with a maximum resolution defined by the longest baseline between participating telescopes . Recently, this technique was used by Event Horizon Telescope (EHT) to capture the first image of a black hole, the image of M87* , by joining telescopes (observing at 230 GHz) from across the globe. Building on this recent success, the EHT is continuing to improve its telescope array by adding new sites, in order to improve M87*’s image quality as well as image additional astronomical sources, such as the Milky Way’s black hole, Sagittarius A*.
It is important to carefully study which new sites should be added to the next-generation EHT to best improve future results. Adding a new telescope to the existing EHT VLBI array is extremely expensive: equipping a pre-built telescope to work as part of the array costs over two million dollars, and the construction of a new telescope can easily cost over 10 million dollars. Therefore, limited budgets require selecting only a handful of new sites that can be added to the future array. Since only a select number of ground sites are suitable for constructing telescopes that can observe at 230 GHz (e.g., due to altitude and weather), the problem of choosing telescope locations for the next-generation EHT reduces to a site selection problem. Studying the impact of a telescope on imaging quality is not only important for planning an EHT expansion but also becomes a powerful tool during an observing campaign; when coordinating future observations it is essential to evaluate the impact of losing a particular telescope – should costly observations proceed if weather is poor or there are instrumental problems at a particular site?
We optimize the design of a VLBI array by solving a site selection problem. The goal is to decide which sites from a given list we should build into the array, and which sites can be excluded without substantial loss to the reconstructed image. Although many factors are ultimately important for site selection (e.g., construction cost, weather stability), in this paper we focus solely on the location of sites, as the array geometry usually has the largest influence over resulting image quality. As explained in Sec. 4.1, VLBI measurements are produced from data collected at multiple telescopes, so it is important to consider how telescopes interact with one another. We capture the telescope correlations through the fully-connected binary Ising model, and optimize the telescope sampling distribution jointly with the image reconstruction method using the proposed approach.
4.1 Problem Setup
In radio interferometry (e.g., VLBI), data simultaneously collected from a pair of telescopes is used to produce a single measurement. Each measurement corresponds to a two-dimensional Fourier component of the image, with the frequency related to the baseline between the two participating telescopes, projected in the direction of the observed astronomical source. This measurement is typically referred to as a visibility. For a telescope array, there are possible visibility measurements at a single time. Due to Earth’s rotation, the projected baselines change over time, causing sparse elliptical measurements in the spatial-frequency domain to be collected over the course of a night (see Fig. 2).
The measurement equation for each visibility, , is given by
for telescope pair . extracts a Fourier component from the image corresponding to the baseline between telescopes and at time . Each measurement experiences time-dependent telescope-based gain, , and phase error, , and baseline-based thermal noise, , where . The standard deviation of the thermal noise depends on each telescope’s “system equivalent flux density” (SEFD):
where a higher SEFD indicates the measurements using that telescope contain more thermal noise (i.e., lower SNR) . When atmospheric turbulence is not considered, and . Although the absolute gains can often be sufficiently calibrated, phase errors due to atmospheric turbulence cannot be calibrated a priori since varies rapidly and is uniformly sampled from . Thus, in the case of atmospheric error, recovering an image amounts to solving a phase retrieval problem.
In the case of atmospheric noise, data products called closure quantities can be used to constrain the image reconstruction . Closure phase is obtained by multiplying three visibilities from telescopes in a closed loop:
Since the atmospheric errors are site dependent, is robust to any phase errors . Data products called closure amplitudes are robust to gain errors . Since amplitudes can often be much better calibrated than phases, in this paper we do not consider closure amplitudes but make use of visibility amplitudes, .
|PV||Pico Veleta, Spain||1400|
|PDB||Plateau de Bure, France||1500|
|ALMA||Atacama Desert, Chile||90|
|APEX||Atacama Desert, Chile||3500|
|LMT||Sierra Negra, Mexico||600|
|SMT||Mt. Graham, Arizona||5000|
|OVRO||Owens Valley, California||10000|
|GLT||Thule Air Base, Greenland||10000|
4.2 Implementation Details
In this paper, we consider two arrays of radio telescopes: “EHT+” and “FUTURE” (Fig. 3). “EHT+” includes twelve telescopes sites, as listed in Table I. Eight of these sites (marked with a star) were used by the EHT in 2017 , while the other four sites host existing radio telescopes that plan to eventually join the EHT. Each telescope’s SEFD (i.e., noise coefficient) is reported in the table. The “FUTURE” array consists of nine additional sites that do not currently host a suitable telescope, but have sufficient atmospheric conditions for a constructed telescope to successfully observe at 230 GHz with the EHT.
Ising Model Sampler
The VLBI forward model produces as a vector of complex visibilities or visibility amplitudes and closure phases. The mask, , is a binary vector representing whether each telescope is selected for observation. We define the masking function for each VLBI measurement in as for a visibility or visibility amplitude , and for a closure phase . The mask is generated from the Ising model sampler in Section 3.2, which is built using five MCMC layers. The sampling order of telescopes for each MCMC layer is randomly shuffled for each training trial. In Section 4.3, we run the joint optimization five times for each experiment to approximate the mean and standard deviation of the learned Ising model parameters. The slopes for binary conversion layers (Eq. 13 and Eq. 14) are empirically set to and .
Image Reconstruction Architectures
Two different neural network architectures (Fig. 4) are used for the image reconstructions (Section 3.3), with input corresponding to either complex visibilities or visibility amplitudes and closure phases, as shown in Fig. 4. Both networks output a pixel image.
In the case of no atmospheric turbulence, Reconstruction network A is used (Fig. 4). The input to this network are complex visibilities. The network architecture consists of one fully connected layer and a U-Net (four down-size layers + four up-size layers). Ideally, the fully connected layer transforms the Fourier measurements from the frequency domain back to the image domain. Then, the U-Net removes aliasing effects caused by missing data and noise. Since the image’s phase information is preserved when there is no atmospheric turbulence, we can define the image similarity loss, as an norm of the difference between the reconstructed and blurred original image:
where is a Gaussian kernel that defines our desired reconstruction resolution. The nominal resolution of our telescope array ( as full-width-half-max (FWHM)) is found using the longest baseline. Image reconstructions that recover structure smaller than the nominal resolution are super-resolving the target.
In the case of atmospheric turbulence, reconstruction network B is used (Fig. 4). The input is constructed from visibility amplitudes and closure phases (rather than complex visibilities). The reconstruction network architecture is designed through physical intuition in order to handle the more challenging phase-retrieval problem. First, the visibility amplitudes and closure phases are sent to three general dense layers whose purpose is to disentangle the corrupted visibility phases. Second, the disentangled phases are combined with the visibility amplitudes to produce modified complex visiblities. These modified complex visibilities are then passed to the same architecture used above (Dense + U-Net). Since the absolute phase is lost in the presence of atmospheric noise, we use a shift-invariant loss that represents the negative inner product between the true and reconstructed images:
We train our joint optimization problem using images from the “Fashion-MNIST” dataset, and validate on the rest of “Fashion-MNIST”, classic “MNIST”, and simple geometric model images meant to resemble a black hole. Each image is pixels and corresponds to a 100 -arcsecond field of view (FOV) with a total flux of 1 Jansky (Jy), unless otherwise specified. The training set is augmented by introducing local deformations and random rotations to the original “Fashion-MNIST” images. Data augmentation is found to be especially important for training reconstruction network B to avoid overfitting. Each network is trained until convergence with Adam in TensorFlow, requiring between 50-300 epochs with a learning rate of . VLBI visibity measurements are produced using the eht-imaging Python library.
We demonstrate the proposed optimization framework in a variety of different VLBI scenarios. The purpose of the proposed method is to learn sensor-sampling designs in complex scenarios where intuition is limited. We choose to show some examples in simple scenarios where intuition can be used to confirm that our method provides reasonable results (e.g., Section 4.3.2). However, what is most interesting are the more realistic, complicated cases that cannot be as easily judged from intuition (e.g., Section 4.3.3).
Fig. 5 shows a representative Ising model learned from our proposed method. It has been trained using the complex visibility reconstruction network, with representative coefficients . In this example, we considered the “EHT+” array with the black hole Sagittarius A(Sgr A*) as the science target. We define in as corresponding to a Gaussian kernel with a FWHM of 0.5 the nominal beam. To investigate the influence of baseline geometry on the recovered distributions, we did not include any noise on the measurements.
Note that in the site correlation plot , the two pairs of telescopes at same location, ALMA-APEX in Chile and JCMT-SMA in Hawaii, are both highly negatively correlated. This agrees with intuition, as the baselines from any telescope to either co-located telescope will produce identical measurements (when ignoring noise). The parameters, , represent the activity of each telescope site, and is roughly proportional to how frequently a telescope is sampled. As can be seen, the most important sites in the EHT network for Sgr A* observations are ALMA, APEX, SPT and LMT, which produce measurements covering low to high spatial-frequency. To the contrary, since GLT provides no measurements (it is occluded by Earth since Sgr A* is in the southern hemisphere), it is the least important site and has a negative activity parameter.
Effect of Sparsity and Diversity Regularization
We conduct a grid search on the sparsity and diversity regularization coefficients and explore their influence on the learned optimal sensor-sampling distributions. In our experiments, the diversity regularization coefficient () ranges from to , and the sparsity regularization coefficient () ranges from to . Negative sparsity regularization favors observing with more telescopes. Apart from the regularization weights, we use the representative parameters described in the beginning of Section 4.3.
Fig. 8 shows the diversity-sparsity relationship recovered from different combinations of regularization weights. The diversity regularizer characterizes the entropy of the Ising model: when the diversity (i.e., the Ising model energy) is large (large ), all telescope sites have around a chance of being selected or excluded. Therefore, the mean number of selected telescopes gradually converges to six out of twelve with increasing . As shown by the array size histogram, increasing the sparsity regularizer () results in a fewer number of sampled telescopes.
Effect of Reconstruction Resolution
We investigate the optimal sensor-sampling for different reconstruction resolution requirements by optimizing the joint sensing and imaging network at four different fractions (1, 0.75, 0.5 and 0.25) of the nominal resolution in . Apart from , we use the representative parameters described in the beginning of Section 4.3. Fig 7 reports the activity of four telescopes (ALMA, APEX, LMT and SPT) that change most with the target resolution, and the learned correlation matrix at 1x, 0.75x and 0.25x the nominal resolution. The correlation matrix at 0.5x the nominal resolution can be found in Fig. 5.
As the resolution becomes higher (right to left in the activity plot), LMT becomes increasingly less important, but ALMA, APEX and especially SPT become more important. This agrees with intuition, as SPT baselines measure the highest spatial-frequency information for Sgr A* and LMT is contained in many of the shorter baselines probing low spatial-frequencies. The correlation coefficients between short-baseline pairs (such as APEX-ALMA, JCMT-SMA) become lower at higher resolution, while the correlation coefficients between long-baseline pairs (such as SPT-APEX, SPT-ALMA, SPT-JCMT and SPT-SMA) become higher. This satisfies physical intuition as long baselines are most important for high-resolution reconstructions.
Fig. 7 shows the blurred images and reconstruction results at different resolutions. We select one example from each type of validation dataset (Fashion-MNIST, MNIST and geometric models). More precisely, we show the reconstruction mean and standard deviation obtained by taking 1000 telescope samples from each learned model. The higher resolution reconstructions are more challenging as they attempt to super-resolve the image, and as expected they have larger standard deviations. Although the goal of this work is not to develop a new image reconstruction method, we find that the proposed neural network appears to produce promising results for the simplified VLBI reconstruction problem, and is powerful enough to be able to tackle a variety of possible array configurations simultaneously.
Effect of Noise
The previous experiments assume no noise in the VLBI measurements. In this section, we compare the optimal telescope sensor-sampling under six different noise assumptions, (1) no noise, (2) equal thermal noise (using the average thermal noise derived from Table I), (3) site-varying thermal noise (as derived from Table I), (4) atmospheric phase noise, (5) atmospheric phase noise & equal thermal noise, and (6) atmospheric phase noise & site-varying thermal noise. For the latter three cases, the visibility amplitudes and closure phases are used for the image reconstruction. All cases assume a target source of Sgr A* with a constant total flux of 1 Jy using the “EHT+” array.
Fig. 9 shows the learned Ising model parameters for these cases. As can be seen in the site activity plots, the sensor-sampling distributions are not significantly influenced by thermal noise only: (1), (2) and (3) are almost identical. However, the atmospheric noise significantly impacts telescope placement. With the presence of atmospheric phase error, LMT and OVRO become much more important, while SPT becomes less important. This effect is further amplified when both thermal noises and atmospheric errors appear. Short baselines, such as OVRO-LMT, are sampled more frequently, suggesting that it is especially important to include shorter baselines in high-resolution VLBI imaging when dealing with atmospheric phase errors.
Based on the undirected graph defined by the Ising model correlation matrix, we can also analyze all “cliques” of size three to understand the importance of each closure triangle (i.e., closure phase measurement). In the affinity graph, telescopes are vertices and the pairwise correlations define edges. We define a subset of telescopes as a “three-clique” if the correlations between each pair are significantly positive (larger than a threshold ), i.e.
For instance, in case (6), eight three-cliques can be found in the corresponding graph given . After ranking them with a simple metric,
we find the five most important three-cliques are “LMT-OVRO-JCMT”, “LMT-OVRO-SMA”, “APEX-LMT-JCMT”, “ALMA-LMT-JCMT” and “LMT-JCMT-KP”, all of which form triangles with similar baseline lengths (see Fig. 3). This suggests that closure phases from sites that are nearly colinear may not be as important for image reconstruction with atmospheric phase error.
EHT Design for Different Science Targets
In this section, we compare the results of two different targets, Sgr A* and M87*. They are in the southern and northern celestial hemisphere, respectively (Sgr A* declination: -29.24, M87* declination: 12.39). Therefore Sgr A* and M87* should have different preferences for telescope site selections. Fig. 11 reports the learned EHT sensor-sampling distributions considering no noise or only atmospheric phase noise. For both noise assumptions, SPT (occluded for M87* observations) becomes less important for M87* compared with Sgr A*, while PV, PDB and GLT become more important.
Site Selection for “FUTURE” array
We study the site selection for a “FUTURE” array, which includes nine additional telescope sites in addition to the twelve in “EHT+”. The same representative parameters described at the beginning of Sec. 4.3 are applied and no noise is introduced. The learned Ising model parameters for both Sgr A* and M87 are shown in Fig. 11. As can be seen, the results successfully capture the negative correlations among co-located telescopes, such as ALMA-APEX-VLT-BOL and JCMT-SMA-KAUAI.
Since the conditional distribution of a subset of Ising model elements is also an Ising model (shown in Eq. 10), we also report the conditional activities and correlations of nine additional sites assuming all the “EHT+” telescopes are selected. Although nearly all nine sites have negative activities (due to the sparsity constraint of the original Ising model), the rank of these sites’ importance still holds. We find that telescopes in Africa, particularly DRAK, would be helpful in constraining both Sgr A* and M87* reconstructions. However, we caution that more factors should be considered in the model before drawing conclusions about the importance of any one site, such as realistic atmospheric noise, weather assumptions, the science target’s expected evolution, and both social and economic costs.
Swapping Sampling Strategies
In this section, we study the interaction of learned sampling strategies (encoders) and reconstruction methods (decoders) in order to demonstrate the advantage of co-design. In particular, we swap the sampling strategy learned for different types of noise (no noise atmospheric phase noise) and science targets (Sgr A* M87*), while keeping the reconstruction methods fixed. For instance, for two co-designed networks, we evaluate the quality of images reconstructed with the decoder of network 1, when the telescope measurements were sampled from the policy learned by network 2, and visa versa. To meaningfully compare networks, and were tuned so that the expected number of sampled telescopes in the EHT+ array for the different networks were nearly equal ( 8).
Table II reports shift-invariant reconstruction losses (Eq. 23) of various “recombined” autoencoders for a FWHM of 0.5 the nominal beam. Each row represents a learned reconstruction method and each column represents a learned sensor sampling distribution. The reconstruction metric is computed based on 1000 test images in the “MNIST” dataset. As expected, the optimal sampling strategy and reconstruction method are those that are learned jointly: a particular VLBI array sampling policy works best with its corresponding reconstruction method.
Fig. 12 shows the mean reconstruction and standard deviation resulting from 1000 sampling trials of the same truth image (see Fig. 7) for each “recombined” autoencoder. The standard deviation is significantly lower for autoencoders that were co-designed. This simple tests help to confirm the co-designed sensor-sampling and reconstruction strategies outperform independently learned strategies.
Optimal sensing is important for resource-limited image reconstruction. We presented an approach to learn an optimal sensor-sampling distribution for a computational imaging system. This method optimizes an Ising sensor-sampling distribution jointly with an image reconstruction method, implemented in a physics-constrained, fully differentiable, autoencoder.
We demonstrated the proposed framework on a VLBI telescope array design task, where sensor correlations and atmospheric noise present unique challenges. Experiments show that the proposed technique can be helpful in planing future telescope array designs and developing observation strategies. We also believe the proposed framework is applicable to a variety of other sensor selection/sampling problems in computational imaging, such as the optimization of LED array illumination patterns in Fourier ptychography  and the sampling pattern in -space for fast-MRI . Changing the application simply requires changing the physics-based forward model, , the decoder, , in Eq. 2 and the training data. By co-designing the Ising model sampling distribution simultaneously with the reconstruction decoder, we can better optimize the design of future computational imaging systems.
The authors would like to thank Lindy Blackburn, Alexander Raymond, Michael Johnson, and Sheperd Doeleman for helpful discussions on the constraints of a next-generation EHT array, and Michael Kellman for helpful discussions on Fourier ptychography. This work was supported by NSF award 1935980: “Next Generation Event Horizon Telescope Design,” and Beyond Limits.
He Sun is a postdoctoral researcher in Computing and Mathematical Sciences (CMS) at the California Institute of Technology. He received his PhD from Princeton University in 2019 and his BS degree from Peking University in 2014. His research focuses on adaptive optics and computational imaging, especially their applications in astrophysical and biomedical sciences.
Adrian V. Dalca is an assistant professor at Harvard Medical School, Massachusetts General Hospital, and a Research Scientist in CSAIL, Massachusetts Institute of Technology. His research focuses on machine learning techniques and probabilistic models for medical image analysis. Driven by clinical questions, he develops core learning algorithms, as well as registration, segmentation and imputation methods aimed at clinical-sourced datasets and broadly applicable in image analysis.
Katherine L. Bouman is an assistant professor in Computing and Mathematical Sciences (CMS) and Electrical Engineering (EE) and a Rosenberg Scholar at the California Institute of Technology (Caltech) in Pasadena, California. Her group combines ideas from signal processing, computer vision, machine learning, and physics to design computational imaging systems. These systems tightly integrate algorithm and sensor design, making it possible to observe phenomena previously difficult or impossible to measure with traditional approaches. She was recognized as a co-recipient of the Breakthrough Prize in fundamental physics, and was named the 2020 imaging âScientist of the Yearâ by the Society for Imaging Science and Technology for her work capturing the first picture of a black hole with the international Event Horizon Telescope Collaboration.
- Our analysis conservatively assumes each new telescope in the “FUTURE” array has an SEFD of 10000.
- (2019) First m87 event horizon telescope results. iv. imaging the central supermassive black hole. The Astrophysical Journal Letters 875 (1), pp. L4. Cited by: §4.2.1, §4.
- (2019) Adaptive compressed sensing mri with unsupervised learning. arXiv preprint arXiv:1907.11374. Cited by: §2.2.2, §3.1, §5.
- (2019) Studying black holes on horizon scales with vlbi ground arrays. arXiv preprint arXiv:1909.01411. Cited by: §4.
- (2001) Interferometric array design: optimizing the locations of the antenna pads. Astronomy & Astrophysics 377 (1), pp. 368–376. Cited by: §1.
- (1993) A generalized gaussian image model for edge-preserving map estimation. IEEE Transactions on image processing 2 (3), pp. 296–310. Cited by: §2.1.1.
- (2018) Reconstructing video of time-varying sources from radio interferometric measurements. IEEE Transactions on Computational Imaging 4 (4), pp. 512–527. Cited by: §2.1.1.
- (2016) Computational imaging for vlbi image reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 913–922. Cited by: §1.
- (2006-02) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52 (2), pp. 489–509. External Links: Cited by: §1, §2.2.1.
- (2008) An introduction to compressive sampling [a sensing/sampling paradigm that goes against the common knowledge in data acquisition]. IEEE signal processing magazine 25 (2), pp. 21–30. Cited by: §2.2.1.
- (2007) Sparsity and incoherence in compressive sampling. Inverse problems 23 (3), pp. 969. Cited by: §2.1.1.
- (2018) Interferometric imaging directly with closure phases and closure amplitudes. The Astrophysical Journal 857 (1), pp. 23. Cited by: §4.1, §4.2.4.
- (2016) Learning sensor multiplexing design through back-propagation. In Advances in Neural Information Processing Systems, pp. 3081–3089. Cited by: §2.2.2.
- (2018) Reblur2deblur: deblurring videos via self-supervised learning. In 2018 IEEE International Conference on Computational Photography (ICCP), pp. 1–9. Cited by: §2.1.2.
- (2017) Unrolled optimization with deep priors. arXiv preprint arXiv:1705.08041. Cited by: §2.1.2, §3.3.
- (2018) Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine 79 (6), pp. 3055–3071. Cited by: §2.1.2.
- (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. Cited by: §2.1.2.
- (2019) Data-driven design for fourier ptychographic microscopy. In 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–8. Cited by: §1, §2.2.2, §3.3, §5.
- (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.4.
- (2018) Variational deep learning for low-dose computed tomography. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6687–6691. Cited by: §1.
- (2017) Deep residual learning for compressed sensing mri. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 15–18. Cited by: §3.3.
- (2008) Compressed sensing mri. IEEE signal processing magazine 25 (2), pp. 72. Cited by: §1.
- (2007) Sparse mri: the application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. External Links: Cited by: §1.
- (2002) Optimal camera placement for accurate reconstruction. Pattern recognition 35 (4), pp. 927–944. Cited by: §1.
- (2019) Metrics and motivations for earth-space vlbi: time-resolving sgr a* with the event horizon telescope. arXiv preprint arXiv:1906.08828. Cited by: §2.2.1.
- (2017) Deep learning microscopy. Optica 4 (11), pp. 1437–1443. Cited by: §1.
- (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §3.3.
- (1949) Communication in the presence of noise. Proceedings of the IRE 37 (1), pp. 10–21. Cited by: §2.2.1.
- (2017) Lensless computational imaging through deep learning. Optica 4 (9), pp. 1117–1125. Cited by: §2.1.2.
- (2018) End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging. ACM Trans. Graph. (SIGGRAPH). Cited by: §2.2.2.
- (1984) Maximum entropy image reconstruction-general algorithm. Monthly notices of the royal astronomical society 211, pp. 111. Cited by: §2.1.1.
- (2016) Deep admm-net for compressive sensing mri. In Advances in neural information processing systems, pp. 10–18. Cited by: §3.3.
- (1986) Interferometry and synthesis in radio astronomy. Wiley New York et al.. Cited by: §4.1, §4.
- (2015) Computational illumination for high-speed in vitro fourier ptychographic microscopy. Optica 2 (10), pp. 904–911. Cited by: §1.
- (2013) Plug-and-play priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing, pp. 945–948. Cited by: §2.1.2.
- (2001) Radio interferometer array point spread functions i. theory and statistics. ALMA Memo Series 389. Cited by: §1.
- (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §4.2.4.
- (2018) Joint optimization for compressive video sensing and reconstruction under hardware constraints. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 634–649. Cited by: §2.2.2.