An Innerloop Free Solution to Inverse Problems using Deep Neural Networks
Abstract
We propose a new method that uses deep learning techniques to accelerate the popular alternating direction method of multipliers (ADMM) solution for inverse problems. The ADMM updates consist of a proximity operator, a least squares regression that includes a big matrix inversion, and an explicit solution for updating the dual variables. Typically, inner loops are required to solve the first two subminimization problems due to the intractability of the prior and the matrix inversion. To avoid such drawbacks or limitations, we propose an innerloop free update rule with two pretrained deep convolutional architectures. More specifically, we learn a conditional denoising autoencoder which imposes an implicit datadependent prior/regularization on groundtruth in the first subminimization problem. This design follows an empirical Bayesian strategy, leading to socalled amortized inference. For matrix inversion in the second subproblem, we learn a convolutional neural network to approximate the matrix inversion, i.e., the inverse mapping is learned by feeding the input through the learned forward network. Note that training this neural network does not require groundtruth or measurements, i.e., it is dataindependent. Extensive experiments on both synthetic data and real datasets demonstrate the efficiency and accuracy of the proposed method compared with the conventional ADMM solution using inner loops for solving inverse problems.
An Innerloop Free Solution to Inverse Problems using Deep Neural Networks
Kai Fai^{†}^{†}thanks: The authors contributed equally to this work. Duke University Durham, NC 27710 kai.fan@stat.duke.edu Qi Wei Duke University Durham, NC 27710 qi.wei@duke.edu Lawrence Carin Duke University Durham, NC 27710 lcarin@duke.edu Katherine A. Heller Duke University Durham, NC 27710 kheller@stat.duke.edu
noticebox[b]31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\end@float
1 Introduction
Most of the inverse problems are formulated directly to the setting of an optimization problem related to the a forward model [24]. The forward model maps unknown signals, i.e., the groundtruth, to acquired information about them, which we call data or measurements. This mapping, or forward problem, generally depends on a physical theory that links the groundtruth to the measurements. Solving inverse problems involves learning the inverse mapping from the measurements to the groundtruth. Specifically, it recovers a signal from a small number of degraded or noisy measurements. This is usually illposed [25, 24]. Recently, deep learning techniques have emerged as excellent models and gained great popularity for their widespread success in allowing for efficient inference techniques on applications include pattern analysis (unsupervised), classification (supervised), computer vision, image processing, etc [6]. Exploiting deep neural networks to help solve inverse problems has been explored recently [23, 1] and deep learning based methods have achieved stateoftheart performance in many challenging inverse problems like superresolution [3, 23], image reconstruction [19], automatic colorization [12]. More specifically, massive datasets currently enables learning endtoend mappings from the measurement domain to the target image/signal/data domain to help deal with these challenging problems instead of solving the inverse problem by inference. The pairs are used to learn the mapping function from to , where is the groundtruth and is its corresponding measurement. This mapping function has recently been characterized by using sophisticated networks, e.g., deep neural networks. A strong motivation to use neural networks stems from the universal approximation theorem [5], which states that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function on compact subsets of , under mild assumptions on the activation function.
More specifically, in recent work [3, 23, 12, 19], an endtoend mapping from measurements to groundtruth was learned from the training data and then applied to the testing data. Thus, the complicated inference scheme needed in the conventional inverse problem solver was replaced by feeding a new measurement through the pretrained network, which is much more efficient. One main problem for this strategy is that it requires taskspecific training of the networks, i.e., different problems require different networks. Thus, it is very expensive to solve diverse sets of problems. To improve the scope of deep neural network models, more recently, in [4], a splitting strategy was proposed to decompose an inverse problem into two optimization problems, where one subproblem, related to regularization, can be solved efficiently using trained deep neural networks, leading to an alternating direction method of multipliers (ADMM) framework [2, 16]. This method involves training a deep convolutional autoencoder network for lowlevel image modeling, which explicitly imposes regularization that spans the subspace that the groundtruth images live in. For the subproblem that requires inverting a big matrix, a conventional gradient descent algorithm was used, leading to an alternating update, iterating between feedforward propagation through a network and iterative gradient descent. Thus, an inner loop for gradient descent is still necessary in this framework.
In this work, we propose an innerloop free framework, in the sense that no iterative algorithm is required to solve subproblems, using a splitting strategy for inverse problems. The alternating updates for the two subproblems were derived by feeding through two pretrained deep neural networks, i.e., one using an amortized inference based denoising convolutional autoencoder network for the proximity operation and one using structured convolutional neural networks for the huge matrix inversion related to the forward model. Thus, the computational complexity of each iteration in ADMM is linear with respect to (w.r.t.) the dimensionality of the signals. The network for the proximity operation imposes an implicit prior learned from the training data, including the measurements as well as the groundtruth, leading to amortized inference. The network for matrix inversion is independent from the training data and can be trained from noise, i.e., a random noise image and its output from the forward model. This independence from training data allows the proposed framework to be used to accelerate almost all the existing training data/example free solutions for inverse problems based on a splitting strategy. To make training the networks for the proximity operation easier, three tricks have been employed: the first one is to use a pixel shuffling technique to equalize the dimensionality of the measurements and groundtruth; the second one is to optionally add an adversarial loss borrowed from the GAN (Generative Adversarial Nets) framework [10] for sharp image generation; the last one is to introduce a perceptual measurement loss derived from pretrained networks, such as AlexNet [11] or VGG16 Model [22]. Arguably, the speed of the proposed algorithm, which we term InfADMMADNN (Innerloop free ADMM with Auxiliary Deep Neural Network), comes from the fact that it uses two auxiliary pretrained networks to accelerate the updates of ADMM.
Contribution The main contribution of this paper is comprised of i) learning an implicit prior/regularizer using a denoising autoencoder neural network, based on amortized inference; ii) learning the inverse of a big matrix using structured convolutional neural networks, without using training data; iii) each of the above networks can be exploited to accelerate the existing ADMM solver for inverse problems.
2 Linear Inverse Problem
Notation: trainable networks by calligraphic font, e.g., , fixed networks by italic font e.g., .
As mentioned in the last section, the low dimensional measurement is denoted as , which is reduced from high dimensional ground truth by a linear operator such that . Note that usually , which makes the number of parameters to estimate no smaller than the number of data points in hand. This imposes an illposed problem for finding solution on new observation , since is an underdetermined measurement matrix. For example, in a superresolution setup, the matrix might not be invertible, such as the strided Gaussian convolution in [20, 23]. To overcome this difficulty, several computational strategies, including Markov chain Monte Carlo (MCMC) and tailored variable splitting under the ADMM framework, have been proposed and applied to different kinds of priors, e.g., the empirical Gaussian prior [28, 31], the Total Variation prior [21, 29, 30], etc. In this paper, we focus on the popular ADMM framework due to its low computational complexity and recent success in solving large scale optimization problems. More specifically, the optimization problem is formulated as
(1) 
where the introduced auxiliary variable is constrained to be equal to , and captures the structure promoted by the prior/regularization. If we design the regularization in an empirical Bayesian way, by imposing an implicit data dependent prior on , i.e., for amortized inference [23], the augmented Lagrangian for (1) is
(2) 
where is the Lagrange multiplier, and is the penalty parameter. The usual augmented Lagrange multiplier method is to minimize w.r.t. and simultaneously. This is difficult and does not exploit the fact that the objective function is separable. To remedy this issue, ADMM decomposes the minimization into two subproblems that are minimizations w.r.t. and , respectively. More specifically, the iterations are as follows:
(3)  
(4)  
(5) 
If the prior is appropriately chosen, such as , a closedform solution for , i.e., a soft thresholding solution is naturally desirable. However, for some more complicated regularizations, e.g., a patch based prior [8], solving (3) is nontrivial, and may require iterative methods. To solve (4), a matrix inversion is necessary, for which conjugate gradient descent (CG) is usually applied to update [4]. Thus, solving (3) and (4) is in general cumbersome. Inner loops are required to solve these two subminimization problems due to the intractability of the prior and the inversion, resulting in large computational complexity. To avoid such drawbacks or limitations, we propose an inner loopfree update rule with two pretrained deep convolutional architectures.
3 Innerloop free ADMM
3.1 Amortized inference for using a conditional proximity operator
Solving subproblem (3) is equivalent to finding the solution of the proximity operator
(6) 
where we incorporate the constant into without loss of generality. If we impose the first order necessary conditions [17], we have
(7) 
where is a partial derivative operator. For notational simplicity, we define another operator . Thus, the last condition in (7) indicates that . Note that the inverse here represents the inverse of an operator, i.e., the inverse function of . Thus our objective is to learn such an inverse operator which projects into the prior subspace. For simple priors like or , the projection can be efficiently computed. In this work, we propose an implicit examplebased prior, which does not have a truly Bayesian interpretation, but aids in model optimization. In line with this prior, we define the implicit proximity operator parameterized by to approximate unknown . More specifically, we propose a neural network architecture referred to as conditional Pixel Shuffling Denoising AutoEncoders (cPSDAE) as the operator , where pixel shuffling [20] means periodically reordering the pixels in each channel mapping a high resolution image to a low resolution image with scale and increase the number of channels to (see [20] for more details). This allows us to transform so that it is the same scale as , and concatenate it with as the input of cPSDAE easily. The architecture of cPSDAE is shown in Fig. 1 (d).
3.2 Inversionfree update of
While it is straightforward to write down the closedform solution for subproblem (4) w.r.t. as is shown in (8), explicitly computing this solution is nontrivial.
(8) 
In (8), is the transpose of the matrix . As we mentioned, the term in the right hand side involves an expensive matrix inversion with computational complexity . Under some specific assumptions, e.g., is a circulant matrix, this matrix inversion can be accelerated with a Fast Fourier transformation, which has a complexity of order . Usually, the gradient based update has linear complexity in each iteration and thus has an overall complexity of order , where is the number of iterations. In this work, we will learn this matrix inversion explicitly by designing a neural network. Note that is only dependent on , and thus can be computed in advance for future use. This problem can be reduced to a smaller scale matrix inversion by applying the ShermanMorrisonWoodbury formula:
(9) 
Therefore, we only need to solve the matrix inversion in dimension , i.e., estimating . We propose an approach to approximate it by a trainable deep convolutional neural network parameterized by . Note that can be considered as a twolayer fullyconnected or convolutional network as well, but with a fixed kernel. This inspires us to design two autoencoders with shared weights, and minimize the sum of two reconstruction losses to learn the inversion :
(10) 
where is sampled from a standard Gaussian distribution. The loss in (10) is clearly depicted in Fig. 1 (a) with the structure of in Fig. 1 (b) and the structure of in Fig. 1 (c). Since the matrix is symmetric, we can reparameterize as , where represents a multilayer convolutional network and is a symmetric convolution transpose architecture using shared kernels with , as shown in Fig. 1 (c) (the blocks with the same colors share the same network parameters). By plugging the learned in (9) , we obtain a reusable deep neural network as a surrogate for the exact inverse matrix . The update of at each iteration can be done by applying the same as follows:
(11) 
3.3 Adversarial training of cPSDAE
In this section, we will describe the proposed adversarial training scheme for cPSDAE to update . Suppose that we have the paired training dataset , a single cPSDAE with the input pair is trying to minimize the reconstruction error , where is a corrupted version of , i.e., where is random noise. Notice in traditional DAE is commonly defined as loss, however, loss is an alternative in practice. Additionally, we follow the idea in [18, 7] by introducing a discriminator and a comparator to help train the cPSDAE, and find that it can produce sharper or higher quality images than merely optimizing . This will wrap our conditional generative model into the conditional GAN [10] framework with an extra feature matching network (comparator). Recent advances in representation learning problems have shown that the features extracted from well pretrained neural networks on supervised classification problems can be successfully transferred to others tasks, such as zeroshot learning [14], style transfer learning [9]. Thus, we can simply use pretrained AlexNet [11] or VGG16 Model [22] on ImageNet as the comparator without finetuning in order to extract features that capture complex and perceptually important properties. The feature matching loss is usually the distance of high level image features, where represents the pretrained network. Since is fixed, the gradient of this loss can be backpropagated to .
For the adversarial training, the discriminator is a trainable convolutional network. We can keep the standard discriminator loss as in a traditional GAN, and add the generator loss of the GAN to the previously defined DAE loss and comparator loss. Thus, we can write down our two objectives as follows,
(12)  
(13) 
The optimization involves iteratively updating by minimizing keeping fixed, and then updating by minimizing keeping fixed. The proposed method, including training and inference has been summarized in Algorithm 1. Note that each update of or using neural networks in an ADMM iteration has a complexity of linear order w.r.t. the data dimensionality .
3.4 Discussion
A critical point for learningbased methods is whether the method generalizes to other problems. More specifically, how does a method that is trained on a specific dataset perform when applied to another dataset? To what extent can we reuse the trained network without retraining?
In the proposed method, two deep neural networks are trained to infer and . For the network w.r.t. , the training only requires the forward model to generate the training pairs (). The trained network for can be applied for any other datasets as long as they share the same . Thus, this network can be adapted easily to accelerate inference for inverse problems without training data. However, for inverse problems that depends on a different , a retrained network is required. It is worth mentioning that the forward model can be easily learned using training dataset , leading to a fully blind estimator associated with the inverse problem. An example of learning can be found in the supplementary materials (see Section 1). For the network w.r.t. , training requires data pairs because of the amortized inference. Note that this is different from training a prior for only using training data . Thus, the trained network for is confined to the specific tasks constrained by the pairs (). To extend the generality of the trained network, the amortized setting can be removed, i.e, the measurements is removed from the training, leading to a solution to proximity operator . This proximity operation can be regarded as a denoiser which projects the noisy version of into the subspace imposed by . The trained network (for the proximity operator) can be used as a plugandplay prior [26] to regularize other inverse problems for datasets that share similar statistical characteristics. However, a significant change in the training dataset, e.g., different modalities like MRI and natural images (e.g., ImageNet [11]), would require retraining.
Another interesting point to mention is the scalability of the proposed method to data of different dimensions. The scalability can be adapted using patchbased methods without loss of generality. For example, a neural network is trained for images of size but the test image is of size . To use this pretrained network, the full image can be decomposed as four images and fed to the network. To overcome the possible blocking artifacts, eight overlapping patches can be drawn from the full image and fed to the network. The output of these eight patches are then averaged (unweighted or weighted) over the overlapping parts. A similar strategy using patch stitching can be exploited to feed small patches to the network for higher dimensional datasets.
4 Experiments
In this section, we provide experimental results and analysis on the proposed InfADMMADNN and compare the results with a conventional ADMM using inner loops for inverse problems. Experiments on synthetic data have been implemented to show the fast convergence of our method, which comes from the efficient feedforward propagation through pretrained neural networks. Real applications using proposed InfADMMADNN have been explored, including single image superresolution, motion deblurring and joint superresolution and colorization.
4.1 Synthetic data
To evaluate the performance of proposed InfADMMADNN, we first test the neural network , approximating the matrix inversion on synthetic data. More specifically, we assume that the groundtruth is drawn from a Laplace distribution , where is the location parameter and is the scale parameter. The forward model is a sparse matrix representing convolution with a stride of . The architecture of is available in the supplementary materials (see Section 2). The noise is drawn from a standard Gaussian distribution . Thus, the observed data is generated as . Following Bayes theorem, the maximum a posterior estimate of given , i.e., maximizing can be equivalently formulated as , where and in this setting. Following (3), (4), (5), this problem is reduced to the following three subproblems:
(14)  
(15)  
(16) 
where the soft thresholding operator is defined as and sgn() extracts the sign of . The update of has a closedform solution, i.e., soft thresholding of . The update of requires the inversion of a big matrix, which is usually solved using a gradient descent based algorithm. The update of is straightforward. Thus, we compare the gradient descent based update, a closedform solution for matrix inversion^{1}^{1}1Note that this matrix inversion can be explicitly computed due to its small size in this toy experiment. In practice, this matrix is not built explicitly. and the proposed innerfree update using a pretrained neural network. The evolution of the objective function w.r.t. the number of iterations and the time has been plotted in the left and middle of Figs. 2. While all three methods perform similarly from iteration to iteration (in the left of Figs. 2), the proposed innerloop free based and closedform inversion based methods converge much faster than the gradient based method (in the middle of Figs. 2). Considering the fact that the closedform solution, i.e., a direct matrix inversion, is usually not available in practice, the learned neural network allows us to approximate the matrix inversion in a very accurate and efficient way.
4.2 Image superresolution and motion deblurring
In this section, we apply the proposed InfADMMADNN to solve the poplar image superresolution problem. We have tested our algorithm on the MNIST dataset [13] and the K images of the CaltechUCSD Birds2002011 (CUB2002011) dataset [27]. In the first two rows of Fig. 3, high resolution images, as shown in the last column, have been blurred (convolved) using a Gaussian kernel of size and downsampled every pixels in both vertical and horizontal directions to generate the corresponding low resolution images as shown in the first column. The bicubic interpolation of LR images and results using proposed InfADMMADNN on a heldout test set are displayed in column and . Visually, the proposed InfADMMADNN gives much better results than the bicubic interpolation, recovering more details including colors and edges. A similar task to superresolution is motion deblurring, in which the convolution kernel is a directional kernel and there is no downsampling. The motion deblurring results using InfADMMADNN are displayed in the bottom of Fig. 3 and are compared with the Wiener filtered deblurring result (the performance of Wiener filter has been tuned to the best by adjusting the regularization parameter). Obviously, the InfADMMADNN gives visually much better results than the Wiener filter. Due to space limitations, more simulation results are available in supplementary materials (see Section 3.1 and 3.2).
To explore the convergence speed w.r.t. the ADMM regularization parameter , we have plotted the normalized mean square error (NMSE) defined as , of superresolved MNIST images w.r.t. ADMM iterations using different values of in the right of Fig. 2. It is interesting to note that when is large, e.g., or , the NMSE of ADMM updates converges to a stable value rapidly in a few iterations (less than ). Reducing the value of slows down the decay of NMSE over iterations but reaches a lower stable value. When the value of is small enough, e.g., , the NMSE converges to the identical value. This fits well with the claim in Boyd’s book [2] that when is too large it does not put enough emphasis on minimizing the objective function, causing coarser estimation; thus a relatively small is encouraged in practice. Note that the selection of this regularization parameter is still an open problem.
4.3 Joint superresolution and colorization
While image superresolution tries to enhance spatial resolution from spatially degraded images, a related application in the spectral domain exists, i.e., enhancing spectral resolution from a spectrally degraded image. One interesting example is the socalled automatic colorization, i.e., hallucinating a plausible color version of a colorless photograph. To the best knowledge of the authors, this is the first time we can enhance both spectral and spatial resolutions from one single band image. In this section, we have tested the ability to perform joint superresolution and colorization from one single colorless LR image on the celebAdataset [15]. The LR colorless image, its bicubic interpolation and HR image are displayed in the top row of Fig. 4. The ADMM updates in the st, th and th iterations (on heldout test set) are displayed in the bottom row, showing that the updated image evolves towards higher quality. More results are in the supplementary materials (see Section 3.3).
5 Conclusion
In this paper we have proposed an accelerated alternating direction method of multipliers, namely, InfADMMADNN to solve inverse problems by using two pretrained deep neural networks. Each ADMM update consists of feedforward propagation through these two networks, with a complexity of linear order with respect to the data dimensionality. More specifically, a conditional pixel shuffling denoising autoencoder has been learned to perform amortized inference for the proximity operator. This autoencoder leads to an implicit prior learned from training data. A dataindependent structured convolutional neural network has been learned from noise to explicitly invert the big matrix associated with the forward model, getting rid of any inner loop in an ADMM update, in contrast to the conventional gradient based method. This network can also be combined with existing proximity operators to accelerate existing ADMM solvers. Experiments and analysis on both synthetic and real dataset demonstrate the efficiency and accuracy of the proposed method. In future work we hope to extend the proposed method to inverse problems related to nonlinear forward models.
Acknowledgments
The authors would like to thank NVIDIA for the GPU donations.
Appendix A Learning from training data
The objective to estimate is formulated as
(17) 
where are the training pairs and corresponds to a regularization to . Empirically, when is large enough, the regularization plays a less important role. The learned and real kernels for (of size ) are visually very similar as is shown in Fig. 5.
Appendix B Structure of matrix in Section 4.1
The degradation matrix in strided convolution can be decomposed as the product of and , i.e., , where is a square matrix corresponding to 2D convolution and represents the regular 2D downsampling. In general, the blurring matrix is a block Toeplitz matrix with Toeplitz blocks. If the convolution is implemented with periodic boundary conditions, i.e., the pixels out of an image is padded with periodic extension of itself, the matrix is a block circulant matrix with circulant blocks (BCCB). Note that for 1D case, the matrix reduces to a circulant matrix. For illustration purposes, an example of matrix for a 1D case is given as below.
An example of matrix for 2D convolution of a kernel with a image is given in the top of Fig. 6. Clearly, in this huge matrix, a circulant structure is present in the block scale as well as within each block, which clearly demonstrates the selfsimilar pattern of BCCB matrix.
The downsampling matrix corresponds to downsampling the original signal and its transpose interpolates the decimated signal with zeros. Similarly, a 1D example of downsampling matrix is shown in (18) for an illustrative purpose. An example of matrix for downsampling a image to the size of , i.e., , is displayed in the middle of Fig. 6. The resulting degradation matrix , which is the product of and is shown in the bottom of Fig. 6.
(18) 
Appendix C More experimental results
c.1 Motion deblurring
c.2 Superresolution
c.3 Joint superresolution and colorization
More results of joint superresolution and colorization on heldout testing data for CelebA dataset are displayed in Fig. 12.











Appendix D Networks Setting
d.1 Network for updating
For MNIST dataset, we did not use the pixel shuffling strategy, since each data point is a grayscale image, which is relatively small. Alternatively, we used a standard denoising autoencoder with architecture specifications in Table 1.
Input Dim  Layer  Output Dim 

Conv()Stride()‘SAME’Relu  
Conv()Stride()‘SAME’Relu  
Conv()Stride()‘VALID’Relu  
Conv()Stride()‘VALID’None  
Conv_trans()Stride()‘VALID’Relu  
Conv_trans()Stride()‘VALID’Relu  
Conv_trans()Stride()‘SAME’Relu  
Conv_trans()Stride()‘SAME’Sigmoid 
For CUB2002011 dataset, we applied a periodical pixel shuffling layer to the input image of size with the output of size . Note that we did not use any stride here since we keep the image scale in each layer identical. The architecture of the cPSDAE is given in Table 2. For CelebA dataset, we applied the periodical pixel shuffling layer to the input image of size with the output of size , and the rest of setting is the same as CUB2002011 dataset, as shown in Table 3. In terms of the discriminator, we fed the pixel shuffled images. The architecture of the disriminator is the same as the one in DCGAN.
Input Dim  Layer  Output Dim 

periodical pixel shuffling  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
Concatenate in Channel  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
periodical pixel shuffling 
Input Dim  Layer  Output Dim 

periodical pixel shuffling  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
Concatenate in Channel  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
Conv()‘SAME’Batch_NormRelu  
periodical pixel shuffling 
d.2 Network for updating
As described in Section 3.2, the neural network to update was designed to have symmetric architecture. The details of this architecture is given in Table 4. Note that represents the size of the width and height of measurement .
Input Dim  Layer  Output Dim 

3  Conv_trans(4,4,3,32, )‘SAME’Relu  32 
32  Conv_trans(4,4,32,64, )‘SAME’Relu  64 
64  Conv(4,4,3,32, )‘SAME’Relu  32 
32  Conv(4,4,32,64, )‘SAME’  3 
References
 [1] Jonas Adler and Ozan Öktem. Solving illposed inverse problems using iterative deep neural networks. arXiv preprint arXiv:1704.04058, 2017.
 [2] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
 [3] Joan Bruna, Pablo Sprechmann, and Yann LeCun. Superresolution with deep convolutional sufficient statistics. arXiv preprint arXiv:1511.05666, 2015.
 [4] JH Chang, ChunLiang Li, Barnabas Poczos, BVK Kumar, and Aswin C Sankaranarayanan. One network to solve them all—solving linear inverse problems using deep projection models. arXiv preprint arXiv:1703.09912, 2017.
 [5] Balázs Csanád Csáji. Approximation with artificial neural networks. Faculty of Sciences, Etvs Lornd University, Hungary, 24:48, 2001.
 [6] Li Deng, Dong Yu, et al. Deep learning: methods and applications. Foundations and Trends® in Signal Processing, 7(3–4):197–387, 2014.
 [7] Alexey Dosovitskiy and Thomas Brox. Generating images with perceptual similarity metrics based on deep networks. In Advances in Neural Information Processing Systems, pages 658–666, 2016.
 [8] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process., 15(12):3736–3745, 2006.
 [9] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. In Proc. IEEE Int. Conf. Comp. Vision and Pattern Recognition (CVPR), pages 2414–2423, 2016.
 [10] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
 [11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
 [12] Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Learning representations for automatic colorization. In Proc. European Conf. Comp. Vision (ECCV), pages 577–593. Springer, 2016.
 [13] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradientbased learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, 1998.
 [14] Jimmy Lei Ba, Kevin Swersky, Sanja Fidler, et al. Predicting deep zeroshot convolutional neural networks using textual descriptions. In Proc. IEEE Int. Conf. Comp. Vision (ICCV), pages 4247–4255, 2015.
 [15] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proc. IEEE Int. Conf. Comp. Vision (ICCV), pages 3730–3738, 2015.
 [16] Songtao Lu, Mingyi Hong, and Zhengdao Wang. A nonconvex splitting method for symmetric nonnegative matrix factorization: Convergence analysis and optimality. IEEE Transactions on Signal Processing, 65(12):3120–3135, June 2017.
 [17] Helmut Maurer and Jochem Zowe. First and secondorder necessary and sufficient optimality conditions for infinitedimensional programming problems. Math. Progam., 16(1):98–110, 1979.
 [18] Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, and Jeff Clune. Plug & play generative networks: Conditional iterative generation of images in latent space. arXiv preprint arXiv:1612.00005, 2016.
 [19] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony Price, and Daniel Rueckert. A deep cascade of convolutional neural networks for MR image reconstruction. arXiv preprint arXiv:1703.00555, 2017.
 [20] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Realtime single image and video superresolution using an efficient subpixel convolutional neural network. In Proc. IEEE Int. Conf. Comp. Vision and Pattern Recognition (CVPR), pages 1874–1883, 2016.
 [21] M. Simoes, J. BioucasDias, L.B. Almeida, and J. Chanussot. A convex formulation for hyperspectral image superresolution via subspacebased regularization. IEEE Trans. Geosci. Remote Sens., 53(6):3373–3388, Jun. 2015.
 [22] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 [23] Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, and Ferenc Huszár. Amortised MAP inference for image superresolution. arXiv preprint arXiv:1610.04490, 2016.
 [24] Albert Tarantola. Inverse problem theory and methods for model parameter estimation. SIAM, 2005.
 [25] A.N. Tikhonov and V.I.A. Arsenin. Solutions of illposed problems. Scripta series in mathematics. Winston, 1977.
 [26] Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. Plugandplay priors for model based reconstruction. In Proc. IEEE Global Conf. Signal and Information Processing (GlobalSIP), pages 945–948. IEEE, 2013.
 [27] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltechucsd birds2002011 dataset. 2011.
 [28] Q. Wei, N. Dobigeon, and JeanYves Tourneret. Bayesian fusion of multiband images. IEEE J. Sel. Topics Signal Process., 9(6):1117–1127, Sept. 2015.
 [29] Qi Wei, Nicolas Dobigeon, and JeanYves Tourneret. Fast fusion of multiband images based on solving a Sylvester equation. IEEE Trans. Image Process., 24(11):4109–4121, Nov. 2015.
 [30] Qi Wei, Nicolas Dobigeon, JeanYves Tourneret, J. M. BioucasDias, and Simon Godsill. RFUSE: Robust fast fusion of multiband images based on solving a Sylvester equation. IEEE Signal Process. Lett., 23(11):1632–1636, Nov 2016.
 [31] N. Zhao, Q. Wei, A. Basarab, N. Dobigeon, D. Kouamé, and J. Y. Tourneret. Fast single image superresolution using a new analytical solution for problems. IEEE Trans. Image Process., 25(8):3683–3697, Aug. 2016.