Learning a Probabilistic Strategy for Computational Imaging Sensor Selection
Abstract
Optimized sensing is important for computational imaging in lowresource environments, when images must be recovered from severely limited measurements. In this paper, we propose a physicsconstrained, fully differentiable, autoencoder that learns a probabilistic sensorsampling strategy for optimized sensor design. The proposed method learns a system’s preferred sampling distribution that characterizes the correlations between different sensor selections as a binary, fullyconnected Ising model. The learned probabilistic model is achieved by using a Gibbs sampling inspired network architecture, and is trained endtoend with a reconstruction network for efficient codesign. The proposed framework is applicable to sensor selection problems in a variety of computational imaging applications. In this paper, we demonstrate the approach in the context of a verylongbaselineinterferometry (VLBI) array design task, where sensor correlations and atmospheric noise present unique challenges. We demonstrate results broadly consistent with expectation, and draw attention to particular structures preferred in the telescope array geometry that can be leveraged to plan future observations and design array expansions.
1 Introduction
Computational imaging systems tightly integrate algorithm and sensor design, making it possible to observe phenomena previously difficult or impossible to see with traditional sensors. A common constraint in these imaging systems is that they must operate in lowresource environments. For instance, the measurements collected are severely limited due to radiation dosage in computed tomography (CT) [19], speed in Magnetic Resonance Imaging (MRI) [21] and microscopy [25], and cost in verylongbaselineinterferometry (VLBI) [7]. Imaging methods are then designed to intelligently fill in the gaps of missing information in order to recover the targeted image. We propose a framework to jointly learn the ideal sampling structure of nonlinear, correlated measurements simultaneously with the image recovery procedure, for the purpose of sensor design. We apply this framework to study optimal VLBI telescope array geometries in different observing conditions.
A computational imaging system typically consists of two primary components: the sensing system and the image reconstruction method. The sensing system is most often described by a physicsbased forward model , which defines how the selected measurements, , are related to the underlying image, . These measurements often experience noise and do not contain enough information on their own to fully specify . Therefore, imaging methods are designed to recover an image, , from the observed measurements by imposing predefined assumptions, such as image priors or hard physical constraints.
In addition to developing image reconstruction methods, optimization of the sensor is critical for producing highquality reconstructions. In many applications the placement (i.e., sampling) of sensors controls the nature of information gathered in measurements. It is especially important that this information be optimized for a specific task when the measurement space can’t be fully observed. Previous work in 3D reconstruction [23], microscopy [33, 17], and astronomical imaging[35, 4] have proposed strategies to optimize sensor placement to achieve maximally informative measurements. Seminal work in compressed sensing showed that, under particular sampling distributions, it is possible to recover a sparse signal with near perfect accuracy [8]; this idea revolutionized a number of realworld applications, including MRI [22]. However, these theoretical results only hold under certain assumptions, such as a linear forward model or limited noise. Consequently, these sampling strategies are surely underoptimized and could be improved in complex imaging systems.
The optimal sensing problem is typically idealized and considered independently from the image reconstruction problem. However, these two problems are intimately related: existing imaging algorithms inform what data we are capable of interpreting and therefore should collect, and the particular data we measure informs what imaging algorithms we should use. Inspired by this insight, we address codesigning the sensorsampling strategy and the image reconstruction algorithm. The complexity of realworld measurements makes it difficult to tackle this codesign problem analytically. Therefore, we propose to solve a joint optimization problem using deep learning, where the optimal sampling policy is represented as a trainable joint probability distribution over candidate sensors. The proposed framework enables 1) tackling complicated sampling problems where measurements are correlated and experience substantial noise, 2) exploring multiple plausible sampling strategies simultaneously rather than focus on a single deterministic design, and 3) qualitatively understanding the importance of each sensor and correlation among sensors.
The content of the paper is organized as follows. In Sec. 2, we review related work on reconstruction algorithms and optimal sensing. In Sec. 3, we present our method that jointly optimizes the sensorsampling pattern and image reconstruction. In Sec. 4, we demonstrate our method on the astronomical imaging VLBI array design problem to identify which telescopes are most important to observe with in different conditions. In Sec 5, we summarize our work and discuss additional sensorselection applications.
2 Related Work
2.1 Image Reconstruction Techniques
Regularized Inverse Model
A common approach taken in computational imaging is to reconstruct the target image from limited sensor measurements by solving a regularized inverse problem:
(1) 
where is the forward model of the underlying imaging system, which produces measurements from the true image , is inspired by the data negative loglikelihood, is a regularization function, is the coefficient balancing the two terms, and is the reconstructed image. The and terms quantify the image’s fit with the measurements and the prior knowledge, respectively. The regularization term shrinks the solution domain, consequently making the solution unique even with an incomplete set of measurements. Typically, this inverse problem can be solved using iterative optimization algorithms when explicit image regularizer functions are imposed, such as total variation (TV)[5], maximum entropy (MEM)[30], sparsity in Fourier or wavelet transformed domain[10], or locallylearned distributions[6].
Learned Inverse Model
Deep learning techniques have also been used for computational imaging reconstruction by either directly training a deep reconstruction neural network [28] or inserting deep learning “priors” into modelbased optimization frameworks [34]. Unlike many classic computer vision tasks, such as face or object recognition, computational imaging problems are fundamentally related to the physics of a known imaging system. Therefore, recent deep learning based reconstruction methods focus on incorporating known physics to improve computational efficiency, reconstruction accuracy, and robustness. For example, this can be done by adding physicsinformed regularization to the neural network loss function [13], only learning the residuals between ground truth and physicsbased solutions [16], or unrolling the physicsbased optimization procedure as a neural network [14, 15].
2.2 SensorSampling Strategies
Analytic Strategies
Conventionally, sparse sensor sampling is achieved by adopting the NyquistShannon sampling theorem. For instance, in imaging applications with Fourier component measurements, such as VLBI and MRI, since the images are approximately bandlimited, the signal sampling rate in the Fourier domain (e.g., space) can be reduced to twice the maximum frequency [27]. This observation can be used to define metrics that characterize sampling quality independent of image reconstruction techniques [24]. Modern advances in the theory of compressed sensing demonstrate that, when the underlying image is sparse, certain sampling schemes can achieve the same quality reconstruction with an even smaller number of samples [8, 9]. However, both Nyquist sampling and compressed sensing typically consider only linear forward models and assume no noise or only i.i.d. Gaussian noise, which is unrealistic for many realworld problems.
Learned Strategies
Recently, approaches have been proposed to explore the joint optimization of a sensing strategy and image reconstruction method using deep neural networks. This idea has been used to design the color multiplexing pattern of camera sensors [12], the LED illumination pattern in Fourier ptychography [17], the optical parameters of a camera lens [29], and the exposure pattern in video compressed sensing [37]. However, all these approaches yield only a single deterministic sensing pattern that ignores the possibility of multiple, equally good sensing strategies. In a recent study of “compressed sensing” MRI, the sensing sampling strategy in the space is modeled as Bernoulli distributions, so that multiple sensing patterns are explored simultaneously [2]. This formulation helps characterize which sampling frequencies are important for different anatomy. However, none of these prior approaches address complicated nonGaussian noise, or explicitly model the correlations among measurements.
3 Proposed Method
We propose a learning method that jointly optimizes the sensorsampling distribution and the image reconstruction method for forward models that contain correlated measurements and complicated noise. In Sec. 3.1, we discuss the problem setup. In Sec. 3.2 we introduce the sensorsampling distribution, formulated as an Ising model. In Sec. 3.3 we discuss the image reconstruction subnetwork architecture.
3.1 Joint Sensing and Imaging Optimization
We formulate the joint sensing and imaging optimization as a deep autoencoder. This autoencoder is trained to identify an efficient image representation that is then used to make limited observations, from which an image can be recovered. The encoder represents the sensing strategy and the decoder represents the image reconstruction method. Both the encoder and decoder are formulated as deep neural networks in this framework. However, to ensure the sensing system design obeys physical constraints, the encoder samples from all possible measurements generated by the physics simulator.
This optimization problem can be written as
(2) 
where is a sensorsampling pattern, whose distribution is parameterized by , is a function defining the similarity between the reconstruction result and the true image in the image training set, is the regularization on the sensing samples, is the image reconstruction neural network parameterized by , and is the sensing function that produces the observed measurements. In this loss function, the first term quantifies the image reconstruction performance, while the second term quantifies desired properties of the sensorsampling pattern (e.g., sparsity; see Sec. 3.2.3). These two terms are balanced by the hyperparameter . The sampling pattern is a binary vector that indicates which of the sensor sources is selected or excluded from observations. This is equivalent to selecting elements from the forward model’s vector of measurements, i.e.
(3) 
where is the binary selection function based on the mask , whose dimensionality is the same as the forward model . This formulation is a generalization of [2], which used a similar optimization designed specifically for MRI acquisition.
Based on Eq. (2) and Eq. (3), we build a joint sensing and imaging autoencoder, as shown in Fig. 1. The trainable blocks in our neural network are the sensorsampling distribution (from which the mask is sampled) and the reconstruction network , where and are the trainable network parameters. In the following sections we describe how to build Eq. (2) as a single, differentiable neural network and solve for and simultaneously.
3.2 PhysicsConstrained Encoder
The goal of the encoder in Fig. 1 is to select measurements from the physicsbased forward model to pass on to the decoder. The forward model and procedure for masking measurements are both well defined; the challenge is to design a network architecture that can sample a mask from a parameterized distribution. In order to capture the correlations among observations, we model the proposed sensorsampling distribution in Eq. (2) as a fullyconnected, Ising model (binary Markov random field). In this section we explain the Ising model sensorsampling distribution and describe how to sample from this model using a differentiable network architecture.
Ising model
In an Ising model, the state of an element system is represented by a set of binary random variables, . The total energy of a fullyconnected Ising model is given by the Hamiltonian function,
(4) 
where are the Ising model parameters. The first term models the energy of single elements and the second term models the potential from the interactions between pairs of elements. The state’s probability can be derived according to the Boltzmann distribution
(5) 
where is the partition function,
(6) 
The lower the total energy of a state, the higher the probability it will occur.
We use a fullyconnected Ising model to model the presence (+1) or absence (1) of a particular sensor, and capture correlations among sensors. Parameter characterizes how important it is to include the th sensor, and characterizes how the th and the th sensor are positively or negatively correlated: if , observing the th and the th sensor are favored or disfavored together; while if , the th sensor is disfavored when the th is selected, or visa versa. Although an Ising model only explicitly defines pairwise correlations, higher order correlations among sensors are implicitly captured by the model. These higher order correlations can be partially extracted by interpreting the Ising model as an affinity graph and performing clique analysis on the pruned edges (see an example in Sec. 4.3.3).
Solving the optimization problem in Eq. 2 requires sampling the Ising model according to parameters }. However, unlike simple distributions such as Bernoulli, Uniform, or Gaussian, it is difficult to sample an Ising model directly, because the binary variables are correlated and the number of possible states grows exponentially with the dimension of elements. To build a computationally efficient and fully differentiable sampling network, we propose a Markov chain Monte Carlo (MCMC) inspired method.
Differentiable Gibbs Sampling SubNetwork
We design a subnetwork to perform successive MCMC sampling iterations of an Ising model (parameterized by ), which enables sampling masks from and facilitates computing Eq. (2).
MCMC methods have been widely used for approximating complicated statistical distributions. In a commonly used MCMC method, Gibbs sampling, one samples from the joint probability distribution by conditionally sampling each variable in turn until convergence. The detailed procedure of Gibbs sampling for a binary state space can be summarized as:

Randomly draw an initial sample
(7) from an independent Bernoulli distribution.

Compute the next sample based on with each component sequentially updated using its conditional distribution given the other states
(8) For each , draw a random number from a uniform distribution
(9) If , assign ; otherwise, assign .

Repeat step 2 for iterations.
Gibbs sampling is often used to sample from Ising models, as the conditional probability of an Ising model can also be described as an Ising model:
(10) 
where and are the known and unknown/tobeupdated subsystems respectively, and defines the new Ising model parameter conditioned on the known subsystem. Eq. (10) indicates the conditional distribution of a single element is a Bernoulli distribution, making it possible to implement each Gibbs sampling iteration as a neural network layer.
In order to sample a mask from the distribution , we design a neural network where each dense layer corresponds to an iteration of Gibbs sampling for an Ising model. Each Gibbs sampling layer can be mathematically described as a function of :
(11) 
where is a sign step function and is a sigmoid function,
(12) 
The initial sample is initialized using uniform random values, . Eq. (11) defines a layered Ising sampling function, which converges to a sample from the Ising model after a sufficient number of iterations. However, the sign function in Eq. (11) has no gradient (zero or undefined), rendering backpropagation strategies impossible. To address this, we replace the sign function with a hyperbolic tangent function
(13) 
where . Using a moderate slope enables us to approximately binarize the states to between successive Gibbs layers, while still allowing gradients to propagate through the network.
To obtain the sampling mask, , we apply a strong sigmoid function to the final sampling layer of the Ising model to convert the states from approximately to nearly binary states :
(14) 
where is a slope of the sigmoid function that should be large enough to approximate the Heaviside function well.
For notational simplicity, we represent this sampling procedure as a nested function,
(15) 
with Ising model coefficient as its parameters, uniformly distributed random numbers, as inputs, and the binary samples in as outputs. The architecture of the Gibbs sampling neural network is shown at the bottom of Fig. 1.
Ising Model Regularization
To encourage sparse samples and promote sample diversity (in order to explore multiple observational strategies), we define the regularization on Ising samples () as consisting of two terms:
(16) 
where the first term is a regularization on the number of participating sensors (sparsity loss), and the second term is the negative Hamiltonian of the Ising samples (diversity loss). The Hamiltonian of an Ising model is a good approximation of the distribution entropy as it only differs in its log partition function,
(17) 
Therefore, maximizing the Hamiltonian typically increases the system entropy, thus diversifying the samples from our Ising model. Substituting Eq. (16) into Eq. (2) and approximating the expectation using the empirical average via MCMC samples, we derive a stochastic formulation of the joint optimization problem under Ising sampling,
(18) 
3.3 Image Reconstruction Decoder
The image reconstruction decoder subnetwork is not restricted to any particular neural architecture, but should have enough model capacity to be able to reconstruct with multiple sensing scenarios simultaneously. For imaging problems with a complicated forward model and noise, physicsbased neural networks (PbNN) [17] and unrolled neural networks [14, 31] have proven successful. However, with sufficient training data, general deep learning frameworks such as the UNet and its variants [26, 20] often perform sufficiently well. The UNet combines local and global information during image reconstruction by incorporating hierarchical downsampling and upsampling blocks and multiple skip connections in its architecture. In our VLBI case study, two variants of the UNet are used for reconstruction. We discuss the details of their architectures in Sec. 4.2.3.
4 Case Study: VLBI Array Design
VeryLongBaselineInterferometry (VLBI) is a radio interferometry technique for highresolution astronomical imaging. In VLBI, telescopes are linked to form an imperfect, virtual telescope with a maximum resolution defined by the longest baseline between participating telescopes [32]. Recently, this technique was used by Event Horizon Telescope (EHT) to capture the first image of a black hole, the image of M87* [1], by joining telescopes (observing at 230 GHz) from across the globe. Building on this recent success, the EHT is continuing to improve its telescope array by adding new sites, in order to improve M87*’s image quality as well as image additional astronomical sources, such as the Milky Way’s black hole, Sagittarius A*.
It is important to carefully study which new sites should be added to the nextgeneration EHT to best improve future results. Adding a new telescope to the existing EHT VLBI array is extremely expensive: equipping a prebuilt telescope to work as part of the array costs over two million dollars, and the construction of a new telescope can easily cost over 10 million dollars[3]. Therefore, limited budgets require selecting only a handful of new sites that can be added to the future array. Since only a select number of ground sites are suitable for constructing telescopes that can observe at 230 GHz (e.g., due to altitude and weather), the problem of choosing telescope locations for the nextgeneration EHT reduces to a site selection problem. Studying the impact of a telescope on imaging quality is not only important for planning an EHT expansion but also becomes a powerful tool during an observing campaign; when coordinating future observations it is essential to evaluate the impact of losing a particular telescope – should costly observations proceed if weather is poor or there are instrumental problems at a particular site?
We optimize the design of a VLBI array by solving a site selection problem. The goal is to decide which sites from a given list we should build into the array, and which sites can be excluded without substantial loss to the reconstructed image. Although many factors are ultimately important for site selection (e.g., construction cost, weather stability), in this paper we focus solely on the location of sites, as the array geometry usually has the largest influence over resulting image quality. As explained in Sec. 4.1, VLBI measurements are produced from data collected at multiple telescopes, so it is important to consider how telescopes interact with one another. We capture the telescope correlations through the fullyconnected binary Ising model, and optimize the telescope sampling distribution jointly with the image reconstruction method using the proposed approach.
4.1 Problem Setup
In radio interferometry (e.g., VLBI), data simultaneously collected from a pair of telescopes is used to produce a single measurement. Each measurement corresponds to a twodimensional Fourier component of the image, with the frequency related to the baseline between the two participating telescopes, projected in the direction of the observed astronomical source. This measurement is typically referred to as a visibility. For a telescope array, there are possible visibility measurements at a single time. Due to Earth’s rotation, the projected baselines change over time, causing sparse elliptical measurements in the spatialfrequency domain to be collected over the course of a night (see Fig. 2).
The measurement equation for each visibility, , is given by
(19) 
for telescope pair . extracts a Fourier component from the image corresponding to the baseline between telescopes and at time . Each measurement experiences timedependent telescopebased gain, , and phase error, , and baselinebased thermal noise, , where . The standard deviation of the thermal noise depends on each telescope’s “system equivalent flux density” (SEFD):
(20) 
where a higher SEFD indicates the measurements using that telescope contain more thermal noise (i.e., lower SNR) [32]. When atmospheric turbulence is not considered, and . Although the absolute gains can often be sufficiently calibrated, phase errors due to atmospheric turbulence cannot be calibrated a priori since varies rapidly and is uniformly sampled from [32]. Thus, in the case of atmospheric error, recovering an image amounts to solving a phase retrieval problem.
In the case of atmospheric noise, data products called closure quantities can be used to constrain the image reconstruction [11]. Closure phase is obtained by multiplying three visibilities from telescopes in a closed loop:
(21) 
Since the atmospheric errors are site dependent, is robust to any phase errors . Data products called closure amplitudes are robust to gain errors [11]. Since amplitudes can often be much better calibrated than phases, in this paper we do not consider closure amplitudes but make use of visibility amplitudes, .
Sites  Location  SEFD 
PV  Pico Veleta, Spain  1400 
PDB  Plateau de Bure, France  1500 
ALMA  Atacama Desert, Chile  90 
APEX  Atacama Desert, Chile  3500 
LMT  Sierra Negra, Mexico  600 
SMT  Mt. Graham, Arizona  5000 
SPT  South Pole  5000 
OVRO  Owens Valley, California  10000 
JCMT  Maunakea, Hawai’i  6000 
SMA  Maunakea, Hawai’i  4900 
KP  Tucson, Arizona  10000 
GLT  Thule Air Base, Greenland  10000 
4.2 Implementation Details
Telescope Arrays
In this paper, we consider two arrays of radio telescopes: “EHT+” and “FUTURE” (Fig. 3). “EHT+” includes twelve telescopes sites, as listed in Table I. Eight of these sites (marked with a star) were used by the EHT in 2017 [1], while the other four sites host existing radio telescopes that plan to eventually join the EHT. Each telescope’s SEFD (i.e., noise coefficient) is reported in the table. The “FUTURE” array consists of nine additional sites that do not currently host a suitable telescope, but have sufficient atmospheric conditions for a constructed telescope to successfully observe at 230 GHz with the EHT.
Ising Model Sampler
The VLBI forward model produces as a vector of complex visibilities or visibility amplitudes and closure phases. The mask, , is a binary vector representing whether each telescope is selected for observation. We define the masking function for each VLBI measurement in as for a visibility or visibility amplitude , and for a closure phase . The mask is generated from the Ising model sampler in Section 3.2, which is built using five MCMC layers. The sampling order of telescopes for each MCMC layer is randomly shuffled for each training trial. In Section 4.3, we run the joint optimization five times for each experiment to approximate the mean and standard deviation of the learned Ising model parameters. The slopes for binary conversion layers (Eq. 13 and Eq. 14) are empirically set to and .
Image Reconstruction Architectures
Two different neural network architectures (Fig. 4) are used for the image reconstructions (Section 3.3), with input corresponding to either complex visibilities or visibility amplitudes and closure phases, as shown in Fig. 4. Both networks output a pixel image.
In the case of no atmospheric turbulence, Reconstruction network A is used (Fig. 4). The input to this network are complex visibilities. The network architecture consists of one fully connected layer and a UNet (four downsize layers + four upsize layers). Ideally, the fully connected layer transforms the Fourier measurements from the frequency domain back to the image domain. Then, the UNet removes aliasing effects caused by missing data and noise. Since the image’s phase information is preserved when there is no atmospheric turbulence, we can define the image similarity loss, as an norm of the difference between the reconstructed and blurred original image:
(22) 
where is a Gaussian kernel that defines our desired reconstruction resolution. The nominal resolution of our telescope array ( as fullwidthhalfmax (FWHM)) is found using the longest baseline. Image reconstructions that recover structure smaller than the nominal resolution are superresolving the target.
In the case of atmospheric turbulence, reconstruction network B is used (Fig. 4). The input is constructed from visibility amplitudes and closure phases (rather than complex visibilities). The reconstruction network architecture is designed through physical intuition in order to handle the more challenging phaseretrieval problem. First, the visibility amplitudes and closure phases are sent to three general dense layers whose purpose is to disentangle the corrupted visibility phases. Second, the disentangled phases are combined with the visibility amplitudes to produce modified complex visiblities. These modified complex visibilities are then passed to the same architecture used above (Dense + UNet). Since the absolute phase is lost in the presence of atmospheric noise, we use a shiftinvariant loss that represents the negative inner product between the true and reconstructed images:
(23) 
Training
We train our joint optimization problem using images from the “FashionMNIST” dataset[36], and validate on the rest of “FashionMNIST”, classic “MNIST”, and simple geometric model images meant to resemble a black hole. Each image is pixels and corresponds to a 100 arcsecond field of view (FOV) with a total flux of 1 Jansky (Jy), unless otherwise specified. The training set is augmented by introducing local deformations and random rotations to the original “FashionMNIST” images. Data augmentation is found to be especially important for training reconstruction network B to avoid overfitting. Each network is trained until convergence with Adam in TensorFlow[18], requiring between 50300 epochs with a learning rate of . VLBI visibity measurements are produced using the ehtimaging Python library[11].
4.3 Experiments
We demonstrate the proposed optimization framework in a variety of different VLBI scenarios. The purpose of the proposed method is to learn sensorsampling designs in complex scenarios where intuition is limited. We choose to show some examples in simple scenarios where intuition can be used to confirm that our method provides reasonable results (e.g., Section 4.3.2). However, what is most interesting are the more realistic, complicated cases that cannot be as easily judged from intuition (e.g., Section 4.3.3).
Fig. 5 shows a representative Ising model learned from our proposed method. It has been trained using the complex visibility reconstruction network, with representative coefficients . In this example, we considered the “EHT+” array with the black hole Sagittarius A(Sgr A*) as the science target. We define in as corresponding to a Gaussian kernel with a FWHM of 0.5 the nominal beam. To investigate the influence of baseline geometry on the recovered distributions, we did not include any noise on the measurements.
Note that in the site correlation plot , the two pairs of telescopes at same location, ALMAAPEX in Chile and JCMTSMA in Hawaii, are both highly negatively correlated. This agrees with intuition, as the baselines from any telescope to either colocated telescope will produce identical measurements (when ignoring noise). The parameters, , represent the activity of each telescope site, and is roughly proportional to how frequently a telescope is sampled. As can be seen, the most important sites in the EHT network for Sgr A* observations are ALMA, APEX, SPT and LMT, which produce measurements covering low to high spatialfrequency. To the contrary, since GLT provides no measurements (it is occluded by Earth since Sgr A* is in the southern hemisphere), it is the least important site and has a negative activity parameter.
Effect of Sparsity and Diversity Regularization
We conduct a grid search on the sparsity and diversity regularization coefficients and explore their influence on the learned optimal sensorsampling distributions. In our experiments, the diversity regularization coefficient () ranges from to , and the sparsity regularization coefficient () ranges from to . Negative sparsity regularization favors observing with more telescopes. Apart from the regularization weights, we use the representative parameters described in the beginning of Section 4.3.
Fig. 8 shows the diversitysparsity relationship recovered from different combinations of regularization weights. The diversity regularizer characterizes the entropy of the Ising model: when the diversity (i.e., the Ising model energy) is large (large ), all telescope sites have around a chance of being selected or excluded. Therefore, the mean number of selected telescopes gradually converges to six out of twelve with increasing . As shown by the array size histogram, increasing the sparsity regularizer () results in a fewer number of sampled telescopes.
Effect of Reconstruction Resolution
We investigate the optimal sensorsampling for different reconstruction resolution requirements by optimizing the joint sensing and imaging network at four different fractions (1, 0.75, 0.5 and 0.25) of the nominal resolution in . Apart from , we use the representative parameters described in the beginning of Section 4.3. Fig 7 reports the activity of four telescopes (ALMA, APEX, LMT and SPT) that change most with the target resolution, and the learned correlation matrix at 1x, 0.75x and 0.25x the nominal resolution. The correlation matrix at 0.5x the nominal resolution can be found in Fig. 5.
As the resolution becomes higher (right to left in the activity plot), LMT becomes increasingly less important, but ALMA, APEX and especially SPT become more important. This agrees with intuition, as SPT baselines measure the highest spatialfrequency information for Sgr A* and LMT is contained in many of the shorter baselines probing low spatialfrequencies. The correlation coefficients between shortbaseline pairs (such as APEXALMA, JCMTSMA) become lower at higher resolution, while the correlation coefficients between longbaseline pairs (such as SPTAPEX, SPTALMA, SPTJCMT and SPTSMA) become higher. This satisfies physical intuition as long baselines are most important for highresolution reconstructions.
Fig. 7 shows the blurred images and reconstruction results at different resolutions. We select one example from each type of validation dataset (FashionMNIST, MNIST and geometric models). More precisely, we show the reconstruction mean and standard deviation obtained by taking 1000 telescope samples from each learned model. The higher resolution reconstructions are more challenging as they attempt to superresolve the image, and as expected they have larger standard deviations. Although the goal of this work is not to develop a new image reconstruction method, we find that the proposed neural network appears to produce promising results for the simplified VLBI reconstruction problem, and is powerful enough to be able to tackle a variety of possible array configurations simultaneously.
Effect of Noise
The previous experiments assume no noise in the VLBI measurements. In this section, we compare the optimal telescope sensorsampling under six different noise assumptions, (1) no noise, (2) equal thermal noise (using the average thermal noise derived from Table I), (3) sitevarying thermal noise (as derived from Table I), (4) atmospheric phase noise, (5) atmospheric phase noise & equal thermal noise, and (6) atmospheric phase noise & sitevarying thermal noise. For the latter three cases, the visibility amplitudes and closure phases are used for the image reconstruction. All cases assume a target source of Sgr A* with a constant total flux of 1 Jy using the “EHT+” array.
Fig. 9 shows the learned Ising model parameters for these cases. As can be seen in the site activity plots, the sensorsampling distributions are not significantly influenced by thermal noise only: (1), (2) and (3) are almost identical. However, the atmospheric noise significantly impacts telescope placement. With the presence of atmospheric phase error, LMT and OVRO become much more important, while SPT becomes less important. This effect is further amplified when both thermal noises and atmospheric errors appear. Short baselines, such as OVROLMT, are sampled more frequently, suggesting that it is especially important to include shorter baselines in highresolution VLBI imaging when dealing with atmospheric phase errors.
Based on the undirected graph defined by the Ising model correlation matrix, we can also analyze all “cliques” of size three to understand the importance of each closure triangle (i.e., closure phase measurement). In the affinity graph, telescopes are vertices and the pairwise correlations define edges. We define a subset of telescopes as a “threeclique” if the correlations between each pair are significantly positive (larger than a threshold ), i.e.
(24) 
For instance, in case (6), eight threecliques can be found in the corresponding graph given . After ranking them with a simple metric,
(25) 
we find the five most important threecliques are “LMTOVROJCMT”, “LMTOVROSMA”, “APEXLMTJCMT”, “ALMALMTJCMT” and “LMTJCMTKP”, all of which form triangles with similar baseline lengths (see Fig. 3). This suggests that closure phases from sites that are nearly colinear may not be as important for image reconstruction with atmospheric phase error.
EHT Design for Different Science Targets
In this section, we compare the results of two different targets, Sgr A* and M87*. They are in the southern and northern celestial hemisphere, respectively (Sgr A* declination: 29.24, M87* declination: 12.39). Therefore Sgr A* and M87* should have different preferences for telescope site selections. Fig. 11 reports the learned EHT sensorsampling distributions considering no noise or only atmospheric phase noise. For both noise assumptions, SPT (occluded for M87* observations) becomes less important for M87* compared with Sgr A*, while PV, PDB and GLT become more important.
Site Selection for “FUTURE” array
We study the site selection for a “FUTURE” array, which includes nine additional telescope sites in addition to the twelve in “EHT+”. The same representative parameters described at the beginning of Sec. 4.3 are applied and no noise is introduced. The learned Ising model parameters for both Sgr A* and M87 are shown in Fig. 11. As can be seen, the results successfully capture the negative correlations among colocated telescopes, such as ALMAAPEXVLTBOL and JCMTSMAKAUAI.
Since the conditional distribution of a subset of Ising model elements is also an Ising model (shown in Eq. 10), we also report the conditional activities and correlations of nine additional sites assuming all the “EHT+” telescopes are selected. Although nearly all nine sites have negative activities (due to the sparsity constraint of the original Ising model), the rank of these sites’ importance still holds. We find that telescopes in Africa, particularly DRAK, would be helpful in constraining both Sgr A* and M87* reconstructions. However, we caution that more factors should be considered in the model before drawing conclusions about the importance of any one site, such as realistic atmospheric noise, weather assumptions, the science target’s expected evolution, and both social and economic costs.
Swapping Sampling Strategies
In this section, we study the interaction of learned sampling strategies (encoders) and reconstruction methods (decoders) in order to demonstrate the advantage of codesign. In particular, we swap the sampling strategy learned for different types of noise (no noise atmospheric phase noise) and science targets (Sgr A* M87*), while keeping the reconstruction methods fixed. For instance, for two codesigned networks, we evaluate the quality of images reconstructed with the decoder of network 1, when the telescope measurements were sampled from the policy learned by network 2, and visa versa. To meaningfully compare networks, and were tuned so that the expected number of sampled telescopes in the EHT+ array for the different networks were nearly equal ( 8).
Table II reports shiftinvariant reconstruction losses (Eq. 23) of various “recombined” autoencoders for a FWHM of 0.5 the nominal beam. Each row represents a learned reconstruction method and each column represents a learned sensor sampling distribution. The reconstruction metric is computed based on 1000 test images in the “MNIST” dataset. As expected, the optimal sampling strategy and reconstruction method are those that are learned jointly: a particular VLBI array sampling policy works best with its corresponding reconstruction method.
Fig. 12 shows the mean reconstruction and standard deviation resulting from 1000 sampling trials of the same truth image (see Fig. 7) for each “recombined” autoencoder. The standard deviation is significantly lower for autoencoders that were codesigned. This simple tests help to confirm the codesigned sensorsampling and reconstruction strategies outperform independently learned strategies.
5 Discussion
Optimal sensing is important for resourcelimited image reconstruction. We presented an approach to learn an optimal sensorsampling distribution for a computational imaging system. This method optimizes an Ising sensorsampling distribution jointly with an image reconstruction method, implemented in a physicsconstrained, fully differentiable, autoencoder.
We demonstrated the proposed framework on a VLBI telescope array design task, where sensor correlations and atmospheric noise present unique challenges. Experiments show that the proposed technique can be helpful in planing future telescope array designs and developing observation strategies. We also believe the proposed framework is applicable to a variety of other sensor selection/sampling problems in computational imaging, such as the optimization of LED array illumination patterns in Fourier ptychography [17] and the sampling pattern in space for fastMRI [2]. Changing the application simply requires changing the physicsbased forward model, , the decoder, , in Eq. 2 and the training data. By codesigning the Ising model sampling distribution simultaneously with the reconstruction decoder, we can better optimize the design of future computational imaging systems.
\diagbox[innerwidth=1.0cm]recon.samp. 







0.0072  0.0088  0.0168  0.0165  

0.0199  0.0163  0.0252  0.0230  

0.0155  0.0164  0.0061  0.0079  

0.0171  0.0152  0.0127  0.0116 
Acknowledgments
The authors would like to thank Lindy Blackburn, Alexander Raymond, Michael Johnson, and Sheperd Doeleman for helpful discussions on the constraints of a nextgeneration EHT array, and Michael Kellman for helpful discussions on Fourier ptychography. This work was supported by NSF award 1935980: “Next Generation Event Horizon Telescope Design,” and Beyond Limits.
He Sun is a postdoctoral researcher in Computing and Mathematical Sciences (CMS) at the California Institute of Technology. He received his PhD from Princeton University in 2019 and his BS degree from Peking University in 2014. His research focuses on adaptive optics and computational imaging, especially their applications in astrophysical and biomedical sciences. 
Adrian V. Dalca is an assistant professor at Harvard Medical School, Massachusetts General Hospital, and a Research Scientist in CSAIL, Massachusetts Institute of Technology. His research focuses on machine learning techniques and probabilistic models for medical image analysis. Driven by clinical questions, he develops core learning algorithms, as well as registration, segmentation and imputation methods aimed at clinicalsourced datasets and broadly applicable in image analysis. 
Katherine L. Bouman is an assistant professor in Computing and Mathematical Sciences (CMS) and Electrical Engineering (EE) and a Rosenberg Scholar at the California Institute of Technology (Caltech) in Pasadena, California. Her group combines ideas from signal processing, computer vision, machine learning, and physics to design computational imaging systems. These systems tightly integrate algorithm and sensor design, making it possible to observe phenomena previously difficult or impossible to measure with traditional approaches. She was recognized as a corecipient of the Breakthrough Prize in fundamental physics, and was named the 2020 imaging âScientist of the Yearâ by the Society for Imaging Science and Technology for her work capturing the first picture of a black hole with the international Event Horizon Telescope Collaboration. 
Footnotes
 Our analysis conservatively assumes each new telescope in the “FUTURE” array has an SEFD of 10000.
References
 (2019) First m87 event horizon telescope results. iv. imaging the central supermassive black hole. The Astrophysical Journal Letters 875 (1), pp. L4. Cited by: §4.2.1, §4.
 (2019) Adaptive compressed sensing mri with unsupervised learning. arXiv preprint arXiv:1907.11374. Cited by: §2.2.2, §3.1, §5.
 (2019) Studying black holes on horizon scales with vlbi ground arrays. arXiv preprint arXiv:1909.01411. Cited by: §4.
 (2001) Interferometric array design: optimizing the locations of the antenna pads. Astronomy & Astrophysics 377 (1), pp. 368–376. Cited by: §1.
 (1993) A generalized gaussian image model for edgepreserving map estimation. IEEE Transactions on image processing 2 (3), pp. 296–310. Cited by: §2.1.1.
 (2018) Reconstructing video of timevarying sources from radio interferometric measurements. IEEE Transactions on Computational Imaging 4 (4), pp. 512–527. Cited by: §2.1.1.
 (2016) Computational imaging for vlbi image reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 913–922. Cited by: §1.
 (200602) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52 (2), pp. 489–509. External Links: Document, ISSN 15579654 Cited by: §1, §2.2.1.
 (2008) An introduction to compressive sampling [a sensing/sampling paradigm that goes against the common knowledge in data acquisition]. IEEE signal processing magazine 25 (2), pp. 21–30. Cited by: §2.2.1.
 (2007) Sparsity and incoherence in compressive sampling. Inverse problems 23 (3), pp. 969. Cited by: §2.1.1.
 (2018) Interferometric imaging directly with closure phases and closure amplitudes. The Astrophysical Journal 857 (1), pp. 23. Cited by: §4.1, §4.2.4.
 (2016) Learning sensor multiplexing design through backpropagation. In Advances in Neural Information Processing Systems, pp. 3081–3089. Cited by: §2.2.2.
 (2018) Reblur2deblur: deblurring videos via selfsupervised learning. In 2018 IEEE International Conference on Computational Photography (ICCP), pp. 1–9. Cited by: §2.1.2.
 (2017) Unrolled optimization with deep priors. arXiv preprint arXiv:1705.08041. Cited by: §2.1.2, §3.3.
 (2018) Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine 79 (6), pp. 3055–3071. Cited by: §2.1.2.
 (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. Cited by: §2.1.2.
 (2019) Datadriven design for fourier ptychographic microscopy. In 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–8. Cited by: §1, §2.2.2, §3.3, §5.
 (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.4.
 (2018) Variational deep learning for lowdose computed tomography. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6687–6691. Cited by: §1.
 (2017) Deep residual learning for compressed sensing mri. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 15–18. Cited by: §3.3.
 (2008) Compressed sensing mri. IEEE signal processing magazine 25 (2), pp. 72. Cited by: §1.
 (2007) Sparse mri: the application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. External Links: Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1002/mrm.21391 Cited by: §1.
 (2002) Optimal camera placement for accurate reconstruction. Pattern recognition 35 (4), pp. 927–944. Cited by: §1.
 (2019) Metrics and motivations for earthspace vlbi: timeresolving sgr a* with the event horizon telescope. arXiv preprint arXiv:1906.08828. Cited by: §2.2.1.
 (2017) Deep learning microscopy. Optica 4 (11), pp. 1437–1443. Cited by: §1.
 (2015) Unet: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pp. 234–241. Cited by: §3.3.
 (1949) Communication in the presence of noise. Proceedings of the IRE 37 (1), pp. 10–21. Cited by: §2.2.1.
 (2017) Lensless computational imaging through deep learning. Optica 4 (9), pp. 1117–1125. Cited by: §2.1.2.
 (2018) Endtoend optimization of optics and image processing for achromatic extended depth of field and superresolution imaging. ACM Trans. Graph. (SIGGRAPH). Cited by: §2.2.2.
 (1984) Maximum entropy image reconstructiongeneral algorithm. Monthly notices of the royal astronomical society 211, pp. 111. Cited by: §2.1.1.
 (2016) Deep admmnet for compressive sensing mri. In Advances in neural information processing systems, pp. 10–18. Cited by: §3.3.
 (1986) Interferometry and synthesis in radio astronomy. Wiley New York et al.. Cited by: §4.1, §4.
 (2015) Computational illumination for highspeed in vitro fourier ptychographic microscopy. Optica 2 (10), pp. 904–911. Cited by: §1.
 (2013) Plugandplay priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing, pp. 945–948. Cited by: §2.1.2.
 (2001) Radio interferometer array point spread functions i. theory and statistics. ALMA Memo Series 389. Cited by: §1.
 (2017) Fashionmnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §4.2.4.
 (2018) Joint optimization for compressive video sensing and reconstruction under hardware constraints. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 634–649. Cited by: §2.2.2.