Deep Mesh Projectors for Inverse Problems
Abstract
We develop a new learningbased approach to illposed inverse problems. Instead of directly learning the complex mapping from the measured data to the reconstruction, we learn an ensemble of simpler mappings from data to projections of the unknown model into random lowdimensional subspaces. We form the reconstruction by combining the estimated subspace projections. Structured subspaces of piecewiseconstant images on random Delaunay triangulations allow us to address inverse problems with extremely sparse data and still get good reconstructions of the unknown geometry. This choice also makes our method robust against arbitrary data corruptions not seen during training. Further, it marginalizes the role of the training dataset which is essential for applications in geophysics where groundtruth datasets are exceptionally scarce.
1 Introduction
Deep neural networks produce impressive results on a variety of inverse problems, as documented by a whopping number of recent papers (cf. Section 2). Typically, a neural network is trained to remove artifacts due to sparse data or noise, for example in lowdose CT imaging Chen et al. [2017]. Classical approaches use universal regularization principles such as smoothness or sparsity. Although they give good results, deep neural networks often do better.
In certain domains, however, problems are so illposed and the data is so sparse that the artifact removal paradigm is not appropriate: even a coarse reconstruction of the unknown model is hard to get. Unlike in the typical biomedical setting where applying a regularized pseudoinverse of the imaging operator to the measurements (in the linear case) already brings out considerable structure, in applications of our interest standard techniques cannot produce a reasonable image. We illustrate this point in Figure 1. This highly unresolved regime is common in geophysics and it requires alternative, more involved strategies Galetti et al. [2017]. The sought reconstructions are accordingly much less detailed.
We propose a new way to regularize illposed inverse problems using convolutional neural networks that map measured data to lowdimensional projections of the unknown model.
Concretely, we are concerned with the following operator equation,
(1) 
with the domain and and being Hilbert spaces. In applications, models the data.^{1}^{1}1We refer to as the data, and as the model as is common in the inverse problems literature.
For illposed problems, attempting to learn , even when it formally exists, is a dubious proposal since the inverse mapping satisfies very poor stability estimates; learning it is not meaningful. A discretization of (1) might give rise to systems that are not singular in theory, but that are too illconditioned for practical computation. Discretizing the problem can be interpreted as projecting into some high (but finite) dimensional subspace . Generally, one can show that the projected mapping is Lipschitz stable, but with a very poor constant Beretta et al. [2013], Mandache [2001]. Thus, even if we could learn it would lead to a brittle result.
Instead, we have to learn a regularized inverse. One possibility is to restrict the inversion to the model manifold but this requires many ground truth training examples and leads to considerable model bias. A good universal strategy would be to learn the best Lipschitz approximation for some favorable . Alas, we do not know how to translate Lipschitzness into an optimization constraint on the network weights.
Main contributions.
We suggest an alternative strategy: one can show that the Lipschitz constant of projected into a carefully chosen lowdimensional subspace is exponentially smaller than that of a highdimensional projection . Thus, instead of learning , we learn for a collection of projections onto lowdimensional subspaces, . Each projection is easier to learn, for example, in terms of sample complexity Cooper [1995] or achievable supnorm error, and each has a controlled Lipschitz constant Beretta et al. [2013]. We then construct a regularized approximation of from estimates of .
We test our ideas on the problem of linearized seismic traveltime tomography Bording et al. [1987], Hole [1992] and show that the proposed method outperforms learned direct inversion in terms of achieved reconstructions, robustness to errors in the data, and independence of training dataset. The latter is essential in domains with few available ground truth images. Finally, we propose a new architecture, the SubNet, which receives as input the lowdimensional subspace in which to compute the reconstruction. This dramatically shortens the training time and it allows us to quickly generate many projections to subspaces that were not seen at training time, instead of training a network for each subspace.
2 Related work
Although neural networks have long been used to address inverse problems Ogawa et al. [1998], Hoole [1993], Schiller and Doerffer [2010], the past few years have seen the number of related deep learning papers grow exponentially. The majority address biomedical imaging Güler and Übeylı [2005], Hudson and Cohen [2000] with several special issues^{2}^{2}2IEEE Transactions on Medical Imaging, May 2016 Greenspan et al. [2016]; IEEE Signal Processing Magazine, November 2017, January 2018 Porikli et al. [2017, 2018]. and review papers Lucas et al. [2018], McCann et al. [2017] dedicated to the topic. All these papers address reconstruction from subsampled or lowquality data, often motivated by reduced scanning time or lower radiation doses. Beyond biomedical imaging, machine learning techniques are emerging in geophysical imaging ArayaPolo et al. [2017], Lewis et al. [2017], Bianco and Gertoft [2017], though at a slower pace, perhaps partly due to the lack of standard open datasets.
Existing methods can be grouped into noniterative methods that learn a feedforward mapping from the measured data (or some standard manipulation such as adjoint or a pseudoinverse) to the model Jin et al. [2016], Pelt and Batenburg [2013], Zhu et al. [2018], Wang [2016], Antholzer et al. [2017], Han et al. [2016], Zhang et al. [2016]; and iterative energy minimization methods, with either the regularizer being a neural network Li et al. [2018], or neural networks replacing various iteration components such as gradients, projectors, or proximal mappings Kelly et al. [2017], Adler and Öktem [2017b, a], Rick Chang et al. [2017]. These are further related to the notion of plugandplay regularization Tikhonov A.N. [2013], Chambolle [2004], Mallat [1999], as well as early uses of neural nets to unroll and adapt standard sparse reconstruction algorithms Gregor and LeCun [2010], Xin et al. [2016]. An advantage of the first group of methods is that they are fast; an advantage of the second group is that they are better at enforcing data consistency.
A rather different take was proposed by Bora et al. Bora et al. [2017, 2018] in the context of compressed sensing where the reconstruction is constrained to lie in the range manifold of a pretrained generative network. Their scheme achieves impressive results on compressed sensing and comes with theoretical guarantees. However, training generative networks requires many examples of ground truth and the method is inherently subject to dataset bias.
Our work is further related to sketching Gribonval et al. [2017], Pilanci and Research [2016] where the learning problem is also simplified by random lowdimensional projections of some object—either the data or the unknown reconstruction itself Yurtsever et al. [2017]. This also exposes natural connections with learning via random features Ali Rahimi [2008, 2009]. Estimating projections and asking for consistency across the various subspaces can also be considered a variant of multitask learning Zhang et al. [2014], Collobert and Weston [2008], Seltzer and Droppo [2013].
3 Lipschitz Stability of Inverse Problems
In inverse problems, one is concerned with whether determines , and whether it does so stably. We assume that is continuous and locally Fréchet differentiable. One way to analyze the uniqueness and stability of an inverse problem is to couple them to a construction of a local solution based on the Landweber iteration Landweber [1951]. The radius of convergence is then a quantitative measure of wellposedness.
Let denote a closed ball centered at with radius , such that , . We let generate the data , that is,
(2) 
and we assume that .
Assumption 1.
Let denote the operator modelling the data. Then

The Fréchet derivative, , of is Lipschitz continuous locally in and
(3) Moreover
(4) 
is weakly sequentially closed, that is,

The inversion has the uniform Lipschitz stability, that is, there exists a constant, , such that
(5)
Then the inversion is stable in the sense that the Landweber iteration has some finite radius of convergence, say, given by de Hoop et al. [2012]
(6) 
With a twice Fréchetdifferentiable operator , the essence of the constants is more transparent. Let stand for the secondorder Fréchet derivative of ; then
(7) 
In other words, the constant is a curvature to gradient condition, which degenerates for linear operators to zero. Thus any bound on restricts the nonconvexity of .
The Lipschitz stability condition (5) is implied by a lower bound of the Fréchet derivative . More precisely, if there exists a constant such that
(8) 
for sufficiently small, then it can be shown that
for some constant depending on and . We note that a very similar condition specialized to linear operators plays a central role in Bora et al. [2017].
While in the above discussion the stability estimate is assumed to hold on the entire ball , in realistic problems this estimate typically holds on a convex compact subset only. A projected gradient descent can still yield an approximate reconstruction in with error determined by the smallest such that , say, while the true model, , lies outside ; the expression for the radius of convergence depends on de Hoop et al. [2012].
Unfortunately, (8) fails for illposed problems. In fact, it can equally fail for both linear and nonlinear problems. On the other hand, a Lipschitz stability estimate commonly holds on finitedimensional subspaces. Growth of the stability constant, typically exponential, reflects the illposedness of the inverse problem. This motivates the approach developed in this paper enforcing the dimension to remain relatively small.
3.1 Decomposing Lipschitz Maps by Random Mesh Projections
We begin with a simple randomization argument. Suppose that we wish to reconstruct a highresolution image with pixels. If is large, then the inverse mapping projected into this dimensional subspace, , is Lipschitz, but with a poor constant ,
Consider instead the map from the data to a projection of the model into some dimensional subspace , where . Denote the projection by and assume is chosen uniformly at random.^{3}^{3}3One way to construct the corresponding projection matrix is as , where is a matrix with standard iid Gaussian entries. We want to evaluate the expected Lipschitz constant of the map from to , noting that it can be written as :
where the first inequality is Jensen’s inequality, and the second one follows from
and the observation that . In other words, random projections reduce the Lipschitz constant by a factor of on average. This simple computation already suggests exponential gains in terms of sample complexity when learning the projected mapping Cooper [1995]. However, the inverse problem theory tells us that a careful choice of subspace family can give exponential improvements in Lipschitz stability. In particular, it is favorable to consider subspaces of piecewise constant images, with being a characteristic function of some domain subset Beretta et al. [2013].
The Case for Delaunay Triangulations.
Motivated by this observation, we use subspaces of piecewiseconstant functions over random Delaunay triangle meshes. The Delaunay triangulations enjoy a number of desirable learningtheoretic properties. In the context of function learning it was shown that given a set of vertices, piecewise linear functions on Delaunay triangulations achieve the smallest supnorm error among all triangulations Omohundro [1989].
Lipschitz Constant of the Composite Map.
Fix a collection of dimensional subspaces . Suppose that for each subspace we have an Lipschitz map that maps the data to an estimate of the expansion coefficients of in some orthonormal basis for , ascribed to the columns of . Let , and ; then we can estimate as
Denote the mapping by . Then we have the following simple estimate:
with the smallest (nonzero) singular value of . We observe empirically that grows exponentially with the number of meshes which is consistent with the theory. However, each individual mesh projection gives “correct” local information which can be used to form the final estimate.
Learning Lipschitz Functions.
It is a standard result in statistical learning theory Cooper [1995] that the number of samples required to learn a variate Lipschitz function to some prescribed accuracy in the supnorm is of the order . While this result is proved for scalarvalued multivariate functions, it is reasonable to expect that the same scaling in should hold for vectorvalued maps if we treat pixelized images as collections of scalar maps. Thus, a reduction of the Lipschitz constant by any factor allows us to work with exponentially fewer samples. Conversely, given a fixed training dataset, we obtain much more accurate estimates.
3.2 Summary of the Proposed Scheme
We decompose a hard learning task into an ensemble of easier problems of estimating projections of the unknown model in random piecewiseconstant subspaces. The subspace estimates are then combined to get a higherresolution reconstruction, as illustrated in Figure 2.
Consider a set of random Delaunay triangulations, and let be the map from to . Instead of learning the hard inverse mapping , with being the highdimensional “pixel” space, we learn an ensemble of simpler mappings . Each is approximated by a convolutional neural network parameterized by a set of weights . The weights are chosen by minimizing empirical risk:
where is a set of training models and measurements.
Recall that is an orthonormal basis for . We then compute an estimate of the expansion coefficients of in as
and use those to get a final estimate as
(9) 
The total variation (TV) seminorm. is used primarily for visualization purposes. Used directly on the data , it fails to recover any geometry (Figure 1).
4 Numerical Results and Discussion
4.1 Application: Traveltime Tomography
In this work, we restrict ourselves to linear illposed inverse problems with sparse data, , . We discretize the domain into pixels so that . Concretely, we consider linearized traveltime tomography Hole [1992], Bording et al. [1987], but we note that the method applies to any inverse problem.
In travel time tomography, we measure wave travel times between sensors as in Figure 3. Travel times depend on the medium property called slowness (inverse of speed) and the task is to reconstruct the spatial slowness map. In the linearized regime, the problem becomes that of straightline tomography with data modeled as
(10) 
where is the continuous slowness map and are sensor locations. We use a pixel grid with sensors placed uniformly in an inscribed circle, and corrupt the measurements with zeromean iid Gaussian noise.
4.2 Architectures and Reconstruction
We generate random Delaunay meshes each with 50 triangles. The corresponding projector matrices compute average intensity over triangles to yield a piecewise constant approximation of . We test two distinct architectures: (i) ProjNet, tasked with estimating the projection into a single subspace; and (ii) SubNet, tasked with estimating the projection over multiple subspaces.
The ProjNet architecture is inspired by the FBPConvNet Jin et al. [2016] and the UNet Ronneberger et al. [2015] as shown in Figure 4a. Similar to Jin et al. [2016], we do not use the data directly as this would require the network to first learn to map back to the image domain; we rather warmstart the reconstruction by a nonnegative least squares reconstruction. The network consists of a sequence of downsampling layers followed by upsampling layers, with skip connections He et al. [2016b, a] between the downsampling and upsampling layers. Crucially, we constrain the network output to live in by fixing the last layer of the network to be a projector, (Figure 4a). A similar trick in a different context was proposed in Sønderby et al. [2016].
We combine projection estimates from many ProjNets by regularized linear leastsquares (9) to get the reconstructed model (cf. Figure 2) with the regularization parameter determined on five heldout images. A drawback of this approach is that a separate ProjNet must be trained for each subspace. That is the motivation for the SubNet, shown in Figure 4b. Each input to SubNet is the concatenation of a nonnegative least squares reconstruction and 50 basis functions, one for each triangle. This approach scales to any number of subspaces which allows us to get visually smoother reconstructions without any further regularization as in (9). On the other hand, the projections are inexact which can lead to slightly degraded performance. Both networks are trained using the Adam optimizer Kingma and Ba [2014].^{4}^{4}4Code available at https://github.com/swingresearch/deepmesh
As a quantitative figure of merit we use the signaltonoise ratio (SNR). The input SNR is defined as where and are the signal and noise variance; the output SNR is defined as with the ground truth and the reconstruction.
130 ProjNets are trained with measurements at various SNRs; SubNet is trained with 350 different triangular meshes. We compare the ProjNet and SubNet reconstructions with a baseline convolutional neural network that was built to directly reconstruct images from their nonnegative least squares reconstructions. We pick the best performing baseline network from multiple networks (inspired by Jin et al. [2016]) which were designed to have a comparable number of trainable parameters to SubNet. We test on patches from the BP2004 model.^{5}^{5}5http://software.seg.org/datasets/2D/2004_BP_Vel_Benchmark/
Robustness to Corruptions.
To demonstrate that our subspace regularization gives results that are robust against arbitrary assumptions made at training time, we consider two experiments. First, we corrupt the measured data with the same type of noise as the training data, but at a different SNR. In Figure 5a, we summarize the results with reconstructions of geo images with the network arbitrarily trained on the LSUN bridges dataset Yu et al. [2015]. In all cases our method reports better SNRs compared with the direct reconstruction network. In fact, when trained without noise and tested with a 10 dB input SNR, the direct method is unable to produce a workable result and instead hallucinates structures seen in training. For applications in geophysics it is essential that our method correctly captures the shape of the cavities unlike the direct inversion which produces sharp but wrong geometries (see annotations in the figure).
Second, we consider a different corruption where traveltime measurements are erased (set to zero) independently with , and use networks trained with 10 dB input SNR to reconstruct. Figure 5b summarizes our findings. Unlike with Gaussian noise (Figure 5a) the direct method completely fails to recover coarse geometry in all test cases.
Robustness Against Dataset Overfitting.
Finally, in Figure 6 we show that the training dataset has only a marginal influence on reconstructions—a desirable property in applications where real ground truth is unavailable. Training with LSUN Yu et al. [2015], CelebA Liu et al. [2015] and a synthetic dataset of random overlapping shapes (as in Figure 1) all give comparable reconstructions.
5 Conclusion
We proposed a new way to solve illposed inverse problem based on decomposing a complex mapping which is hard to learn into a collection of simpler mappings. These simpler mappings correspond to reconstructions in lowdimensional subspaces of images piecewiseconstant on Delauney triangular meshes. Numerical experiments show that our method is consistently able to produce better reconstructions than a method trained to do the inversion directly, both in terms of output SNR and, more importantly, producing correct geometric features. When the data is corrupted in ways not seen at training time, our method still produces good results while the direct inversion breaks down altogether.
A simple intuitive argument can be made to explain this behavior. Instead of learning to estimate pixel values directly, we learn to estimate local averages of pixel values. Estimating averages is a much simpler task since they are considerably more invariant than pixels themselves. This statement can be made precise for many inverse problems in terms of Lipschitz stability estimates. The key is that estimating averages over triangles lets us robustly convert global traveltime measurements into local information. This has important consequences: robustness against overfitting the dataset, robustness against various unseen corruptions, and ability to get correct global geometric information without hallucinating sharp, but wrong structures.
Acknowledgement
This work utilizes resources supported by the National Science Foundation’s Major Research Instrumentation program, grant #1725729, as well as the University of Illinois at UrbanaChampaign.
References
 Adler and Öktem [2017a] Jonas Adler and Ozan Öktem. Solving illposed inverse problems using iterative deep neural networks. arXiv preprint arXiv:1704.04058v2, April 2017a.
 Adler and Öktem [2017b] Jonas Adler and Ozan Öktem. Learned Primaldual Reconstruction. arXiv preprint arXiv:1707.06474v1, July 2017b.
 Ali Rahimi [2008] Benjamin Recht Ali Rahimi. Random features for largescale kernel machines. Advances in Neural Information and Processing (NIPS), 2008.
 Ali Rahimi [2009] Benjamin Recht Ali Rahimi. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning. Advances in Neural Information and Processing (NIPS), pages 1313–1320, 2009.
 Antholzer et al. [2017] Stephan Antholzer, Markus Haltmeier, and Johannes Schwab. Deep Learning for Photoacoustic Tomography from Sparse Data. arXiv preprint arXiv:1704.04587v2, April 2017.
 ArayaPolo et al. [2017] Mauricio ArayaPolo, Joseph Jennings, Amir Adler, and Taylor Dahlke. Deeplearning tomography. The Leading Edge, December 2017.
 Beretta et al. [2013] Elena Beretta, Maarten V de Hoop, and Lingyun Qiu. Lipschitz Stability of an Inverse Boundary Value Problem for a SchrödingerType Equation. SIAM J. Math. Anal., 45(2):679–699, March 2013.
 Bianco and Gertoft [2017] Michael Bianco and Peter Gertoft. Sparse travel time tomography with adaptive dictionaries. arXiv preprint arXiv:1712.08655, 2017.
 Bora et al. [2017] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using generative models. arXiv preprint arXiv:1703.03208, 2017.
 Bora et al. [2018] Ashish Bora, Eric Price, and Alexandros G Dimakis. Ambientgan: Generative models from lossy measurements. In International Conference on Learning Representations (ICLR), 2018.
 Bording et al. [1987] R Phillip Bording, Adam Gersztenkorn, Larry R Lines, John A Scales, and Sven Treitel. Applications of seismic traveltime tomography. Geophysical Journal International, 90(2):285–303, 1987.
 Chambolle [2004] Antonin Chambolle. An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision, 20(12):89–97, 2004.
 Chen et al. [2017] Hu Chen, Yi Zhang, Weihua Zhang, Peixi Liao, Ke Li, Jiliu Zhou, and Ge Wang. Lowdose CT denoising with convolutional neural network. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pages 143–146. IEEE, 2017.
 Collobert and Weston [2008] Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, pages 160–167. ACM, 2008.
 Cooper [1995] Duane A Cooper. Learning lipschitz functions. International Journal of Computer Mathematics, 59(12):15–26, 1995.
 de Hoop et al. [2012] Maarten V de Hoop, Lingyun Qiu, and Otmar Scherzer. Local analysis of inverse problems: Hölder stability and iterative reconstruction. Inverse Problems, 28(4):045001, April 2012.
 Galetti et al. [2017] Erica Galetti, Andrew Curtis, Brian Baptie, David Jenkins, and Heather Nicolson. Transdimensional Lovewave tomography of the British Isles and shearvelocity structure of the East Irish Sea Basin from ambientnoise interferometry. Geophys. J. Int., 208(1):36–58, January 2017.
 Greenspan et al. [2016] Hayit Greenspan, Bram van Ginneken, and Ronald M Summers. Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique. IEEE Trans. Med. Imag., 35(5):1153–1159, may 2016.
 Gregor and LeCun [2010] Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pages 399–406. Omnipress, 2010.
 Gribonval et al. [2017] Rémi Gribonval, Gilles Blanchard, Nicolas Keriven, and Yann Traonmilin. Compressive statistical learning with random feature moments. arXiv preprint arXiv:1706.07180, 2017.
 Güler and Übeylı [2005] İnan Güler and Elif Derya Übeylı. ECG beat classifier designed by combined neural network model. Pattern Recognition, 38(2):199–208, 2005.
 Han et al. [2016] Yo Seob Han, Jaejun Yoo, and Jong Chul Ye. Deep Residual Learning for Compressed Sensing CT Reconstruction via Persistent Homology Analysis. arXiv preprint arXiv:1611.06391, November 2016.
 He et al. [2016a] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645. Springer, 2016a.
 He et al. [2016b] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 770–778. IEEE, 2016b.
 Hole [1992] John Hole. Nonlinear highresolution threedimensional seismic travel time tomography. Journal of Geophysical Research: Solid Earth, 97(B5):6553–6562, 1992.
 Hoole [1993] S R H Hoole. Artificial neural networks in the solution of inverse electromagnetic field problems. IEEE Trans. Magn., 29(2):1931–1934, March 1993.
 Hudson and Cohen [2000] Donna L Hudson and Maurice E Cohen. Neural networks and artificial intelligence for biomedical engineering. Wiley Online Library, 2000.
 Jin et al. [2016] Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser. Deep Convolutional Neural Network for Inverse Problems in Imaging. arXiv preprint arXiv:1611.03679v1, November 2016.
 Kelly et al. [2017] Brendan Kelly, Thomas P Matthews, and Mark A Anastasio. Deep LearningGuided Image Reconstruction from Incomplete Data. arXiv preprint arXiv:1709.00584, September 2017.
 Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Landweber [1951] Louis Landweber. An iteration formula for fredholm integral equations of the first kind. American Journal of Mathematics, 73(3):615–624, 1951.
 Lewis et al. [2017] Winston Lewis, Denes Vigh, et al. Deep learning prior models from seismic images for fullwaveform inversion. In SEG International Exposition and Annual Meeting. Society of Exploration Geophysicists, 2017.
 Li et al. [2018] Housen Li, Johannes Schwab, Stephan Antholzer, and Markus Haltmeier. NETT: Solving Inverse Problems with Deep Neural Networks. arXiv preprint arXiv:1803.00092v1, February 2018.
 Liu et al. [2015] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
 Lucas et al. [2018] Alice Lucas, Michael Iliadis, Rafael Molina, and Aggelos K Katsaggelos. Using Deep Neural Networks for Inverse Problems in Imaging: Beyond Analytical Methods. IEEE Signal Process. Mag., 35(1):20–36, 2018.
 Mallat [1999] Stéphane Mallat. A wavelet tour of signal processing. Academic Press, 1999.
 Mandache [2001] Niculae Mandache. Exponential instability in an inverse problem for the Schrodinger equation. Inverse Problems, 17(5):1435–1444, October 2001.
 McCann et al. [2017] Michael T McCann, Kyong Hwan Jin, and Michael Unser. Convolutional neural networks for inverse problems in imaging: A review. IEEE Signal Process. Mag., 34(6):85–95, 2017.
 Mery et al. [2015] Domingo Mery, Vladimir Riffo, Uwe Zscherpel, Germán Mondragón, Iván Lillo, Irene Zuccar, Hans Lobel, and Miguel Carrasco. GDXray: The Database of Xray Images for Nondestructive Testing. Journal of Nondestructive Evaluation, 34, 11 2015.
 Ogawa et al. [1998] Takehiko Ogawa, Yukio Kosugi, and Hajime Kanada. Neural network based solution to inverse problems. In Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on, volume 3, pages 2471–2476. IEEE, 1998.
 Omohundro [1989] S M Omohundro. The Delaunay triangulation and function learning, 1989.
 Pelt and Batenburg [2013] Daniel Maria Pelt and Kees Joost Batenburg. Fast tomographic reconstruction from limited data using artificial neural networks. IEEE Trans. on Image Process., 22(12):5238–5251, 2013.
 Pilanci and Research [2016] M Pilanci and MJ Wainwright Research. Iterative Hessian sketch: Fast and accurate solution approximation for constrained leastsquares. The Journal of Machine Learning, 2016.
 Porikli et al. [2017] Fatih Porikli, Shiguang Shan, Cees Snoek, Rahul Sukthankar, and Xiaogang Wang. Deep Learning for Visual Understanding [From the Guest Editors]. IEEE Signal Process. Mag., 34(6):24–25, nov 2017.
 Porikli et al. [2018] Fatih Porikli, Shiguang Shan, Cees Snoek, Rahul Sukthankar, and Xiaogang Wang. Deep Learning for Visual Understanding: Part 2 [From the Guest Editors]. IEEE Signal Process. Mag., 35(1):17–19, jan 2018.
 Rick Chang et al. [2017] JH Rick Chang, ChunLiang Li, Barnabas Poczos, BVK Vijaya Kumar, and Aswin C Sankaranarayanan. One Network to Solve Them All–Solving Linear Inverse Problems Using Deep Projection Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5888–5897, 2017.
 Ronneberger et al. [2015] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 234–241. Springer, 2015.
 Schiller and Doerffer [2010] Helmut Schiller and Roland Doerffer. Neural network for emulation of an inverse model operational derivation of Case II water properties from MERIS data. International Journal of Remote Sensing, November 2010.
 Seltzer and Droppo [2013] Michael L Seltzer and Jasha Droppo. Multitask learning in deep neural networks for improved phoneme recognition. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pages 6965–6969. IEEE, 2013.
 Sønderby et al. [2016] Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, and Ferenc Huszár. Amortised map inference for image superresolution. arXiv preprint arXiv:1610.04490, 2016.
 Tikhonov A.N. [2013] Stepanov V.V. Yagola Anatoly G Tikhonov A.N., Goncharsky A.V. Numerical methods for the solution of illposed problems, volume 328. Springer Science & Business Media, 2013.
 Wang [2016] Ge Wang. A perspective on deep imaging. IEEE Access, 4:8914–8924, 2016.
 Xin et al. [2016] Bo Xin, Yizhou Wang, Wen Gao, David Wipf, and Baoyuan Wang. Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems, pages 4340–4348, 2016.
 Yu et al. [2015] Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. LSUN: Construction of a Largescale Image Dataset using Deep Learning with Humans in the Loop. arXiv preprint arXiv:1506.03365, 2015.
 Yurtsever et al. [2017] Alp Yurtsever, Madeleine Udell, Joel A Tropp, and Volkan Cevher. Sketchy decisions: Convex lowrank matrix optimization with optimal storage. arXiv preprint arXiv:1702.06838, 2017.
 Zhang et al. [2016] Hanming Zhang, Liang Li, Kai Qiao, Linyuan Wang, Bin Yan, Lei Li, and Guoen Hu. Image Prediction for Limitedangle Tomography via Deep Learning with Convolutional Neural Network. arXiv preprint arXiv:1607.08707v1, July 2016.
 Zhang et al. [2014] Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. Facial landmark detection by deep multitask learning. In European Conference on Computer Vision, pages 94–108. Springer, 2014.
 Zhu et al. [2018] Bo Zhu, Jeremiah Z Liu, Stephen F Cauley, Bruce R Rosen, and Matthew S Rosen. Image reconstruction by domaintransform manifold learning. Nature, 555(7697):487, March 2018.