Accelerating Optimization Algorithms With Dynamic Parameter Selections Using Convolutional Neural Networks For Inverse Problems In Image Processing
Abstract
Recent advances using deep neural networks (DNNs) for solving inverse problems in image processing have significantly outperformed conventional optimization algorithm based methods. Most works train DNNs to learn 1) forward models and image priors implicitly for direct mappings from given measurements to solutions, 2) datadriven priors as proximal operators in conventional iterative algorithms, or 3) forward models, priors and/or static stepsizes in unfolded structures of optimization iterations. Here we investigate another way of utilizing convolutional neural network (CNN) for empirically accelerating conventional optimization for solving inverse problems in image processing. We propose a CNN to yield parameters in optimization algorithms that have been chosen heuristically, but have shown to be crucial for good empirical performance. Our CNNincorporated scaled gradient projection methods, without compromising theoretical properties, significantly improve empirical convergence rate over conventional optimization based methods in largescale inverse problems such as image inpainting, compressive image recovery with partial Fourier samples, deblurring and sparse view CT. During testing, our proposed methods dynamically select parameters every iterations to speed up convergence robustly for different degradation levels, noise, or regularization parameters as compared to direct mapping approach.
1 Introduction
Optimization based solvers for inverse problems have been widely investigated for image processing applications such as denoising [23], inpainting [28], deblurring [36], image recovery from incomplete Fourier samples [26], and image reconstruction from noisy Radon transformed measurements [35]. A typical pipeline for them is to construct an objective function with accurate forward modeling of image degradation processes and with reasonable image priors such as minimum total variation (TV) and/or sparsity in wavelet (or learned transformed) domain, and then to optimize the objective function to yield a solution, a recovered image, using theoretically wellgrounded optimization algorithms such as iterative shrinkagethresholding algorithm (ISTA) [12], fast ISTA (FISTA) [4], approximate message passing (AMP) based algorithm [11], or alternating directional method of multipliers (ADMM) [7]. Many related works have been proposed to improve theoretical convergence rates of algorithms and/or to design good image priors to regularize illposed inverse problems.
Deep neural networks (DNNs) have revolutionized the ways of solving inverse problems in image processing. These methods have significantly outperformed conventional optimization based methods by significantly improving the image quality of solutions and computation speed. There are largely three ways of using DNNs for inverse problems in image processing: 1) direct mapping DNNs from measurements (or analytic reconstructions) to solutions by implicitly learning forward models and image priors [32, 34, 21, 18, 17], 2) DNN based proximal operators for iterative algorithms by explicitly learning image priors [9, 24, 31, 16, 29], or 3) unfolded structure DNNs inspired by conventional optimization iterations by learning forward models, priors and/or stepsizes in optimizations [14, 30, 25, 13, 33, 10, 22]. Recently, it has been theoretically shown that unfolded LISTA has asymptotic linear convergence property for compressive sensing (CS) recovery problems [10, 22]. However, these methods have been applied only to smallscale CS recovery problems due to large number of parameters to train or to determine.
Most previous works determine static stepsizes with Lipschitz constant / training processes or dynamic stepsizes with backtracking. However, we argue that for empirically fast convergence rate, stepsizes in optimization iterations must be determined dynamically for different problems every iterations. Here we investigate an alternative way of utilizing CNNs for empirically accelerating optimization for solving largescale inverse problems in image processing. We propose CNNs estimating a near optimal stepsize (or a diagonal matrix) per iteration for given problems to accelerate empirical convergence rates of conventional optimizations. These parameters in optimization algorithms have been selected heuristically, but have shown to be crucial for good empirical performance. Our CNNincorporated scaled gradient projection (SGP) methods, without compromising theoretical properties, significantly improve empirical convergence rate over conventional optimization based methods such as ISTA [12] / FISTA [4] with backtracking in largescale inverse problems such as image inpainting, CS image recovery with partial Fourier samples, image deblurring and sparse view CT reconstruction. Our ways of using DNN select parameters every iteration to speed up empirical convergence robustly for different degradation levels, noise, or regularization parameters as compared to typical direct mapping CNN (, UNet [27]).
Here are our contributions: We 1) propose small CNNs to dynamically determine stepsizes in optimizations every iterations, 2) propose CNNincorporated SGP methods without compromising convergence properties, and 3) demonstrate the performance and robustness of our methods for largescale inverse problems in image processing.
2 Related Works
DNNbased direct mapping approaches have yielded stateoftheart performance in image quality and computation speed for inverse problems in image processing such as image inpainting [32], image denoising [34], single image super resolution [21], sparse image recovery [18] and medical image reconstruction [17]. However, they also have limitations such as no mechanism to ensure that a solution is corresponding to a given measurement through forward models by correcting for intermediate errors in a current solution. Moreover, DNNbased direct mapping methods are based on blackbox models with limited interpretability on the solutions for inverse problems. In contrast, conventional optimization based approaches often have theoretical guarantees for the exact recovery [8] or have interpretable converged solutions such as Ksparse images in wavelet domain with minimum distances to given measurements through forward models.
DNNs inspired by unfolded optimization iterations were also investigated with finite number of iterations. Learned ISTA (LISTA) [14] was the first work of this type to propose DNNs that implicitly learn forward models, image priors and stepsizes in optimizations from data. LISTA has been extended to unfolded LISTA with learned weights [10] and ALISTA with analytically determined weights [22]. The original LISTA compromised ISTA’s theoretical convergence properties, but there have been efforts to understand convergence properties in unfolded structure of DNNs [25, 13]. Recently, it has been theoretically shown that unfolded LISTA has asymptotic linear convergence property for CS recovery problems [10]. However, these methods have been applied to smallscale CS recovery problems due to large number of parameters to determine. If image size is 256256 and the compression ratio is about 20%, LISTA [14] and LISTACPSS [10] require about 4,465M, 2,727M parameters to train, while our proposed UNet based CNN requires only 7M parameters for 256256 images. Recently, ALISTA [22] proposed methods for determining analytical weights using dictionary learning or convolutional sparse coding, but it does not seem to work for general forward models yet.
ADMMnet [30] and ISTANet [33] are also based on unfolded optimization iterations of ADMM and ISTA, respectively, but unlike LISTA approaches, they utilized forward models in their networks and trained convolutional neural networks (CNNs) for image priors such as transformations, parametrized nonlinear functions and for optimization parameters such as stepsize. Using forward models explicitly allows these methods to deal with largescale inverse problems. Similarly, there have been works using DNNs only for proximal operators in iterative algorithms [9, 24, 31, 16, 29]. Unlike ADMMnet and ISTANet with fixed number of iterations, these methods have flexibility of running any number of iterations for different cases. However, they focused on using DNNs as image priors within conventional optimization framework, rather than investigating acceleration of convergence. Most works in this category have static stepsizes that have been determined heuristically, selected conservatively to ensure convergence (e.g., 1/Lipschitz constant), or learned from data.
Lastly, there have been a few recent attempts to learn optimization algorithms as DNN based functions of gradients [1] and as policies of selecting algorithms using reinforcement learning [20]. These are similar to our proposed methods in dynamically determining algorithms every iterations. However, our methods are within the framework of SGP methods with theoretical convergence properties.
3 Background
3.1 Proximal gradient method
Consider an optimization problem of the form
(1) 
where is convex, differentiable and is convex, subdifferentiable for . Then, the following update equation at the th iteration is called the proximal gradient method (PGM):
(2) 
where and denotes norm. PGM guarantees the convergence of to at the solution with the rate of .
One way to determine is based on a majorizationminimization technique for the cost function and its quadratic surrogate function for each iteration. is usually assumed to be Lipschitz continuous on a given domain and the reciprocal of a Lipschitz constant of is used for . Another popular method to choose is a backtracking method. Note that both methods do not seek the largest possible stepsize since it is often more efficient to calculate the next iteration with conservative suboptimal stepsize than to perform timeconsuming stepsize optimization. Thus, if there is a way to quickly calculate near optimal stepsizes, it could help to accelerate empirical convergence rates.
3.2 Scaled gradient projection method
Recently, scaled gradient projection (SGP) methods with the convergence rate have been proposed and have empirically demonstrated their general convergence speed improvements over FISTA with [6, 5]. The problem (1) can be seen as a constrained optimization problem
(3) 
where that is a convex set due to the convexity of and is determined by . Then, an iterative algorithm can be formulated as the PGM given by
(4) 
where Whenever , is a descent direction at for the problem (3) and thus its inner product with becomes negative. implies that is a stationary point. Since is a descent direction at , Armijo line search can generate a convergent sequence of that satisfy Armijo condition [3]:
(5) 
where and for .
In the problem (3), SGP methods introduced an additional symmetric positive definite matrix in front of . Symmetry and positive definiteness are necessary conditions for a Hessian matrix of for Newton’s method and they are also important conditions for quasiNewton methods. Newtontype methods usually converge with fewer iterations than firstorder optimization methods, but they are computationally demanding especially for largescale input data. SGP methods exploit symmetry and positive definiteness with the aim of less computational burdens while they can refine a direction vector of the diagonal elements in to accelerate convergence rate. SGP methods are based on Armijo line search since remains as a descent direction with the conditions on at . They are also applicable for proximal operators.
For the convergence of SGP methods, it requires additional condition for . Define for as the set of all symmetric positive definite matrices whose eigenvalues are in the interval . Then, for such that , the condition , should be satisfied. becomes an identity matrix and an iteration becomes similar to PGM. Appropriate [5] accelerated empirical convergence over fast PGMs such as FISTA.
For a given proximal operator the SGP method is summarized in Algorithm 1. Finding that can accelerate convergence still remains as an open problem. We propose to replace these heuristic decisions with DNNs.
4 Learningbased stepsize selection
We conjecture that DNN can be trained to generate a nearoptimal stepsize per iteration if a current estimate and the gradient of a cost function at that estimate are given. Since no ground truth is available for the optimal sequence of stepsizes over all iterations, we propose to train a stepsize DNN to yield a greedy nearoptimal stepsize per iteration by minimizing the distance between the estimated vector at the next iteration and the converged solution at each iteration.
4.1 Learning a stepsize for an iteration
To learn stepsizes by a DNN, a set of solution vectors of optimization problems was generated and used as ground truth data. Solution vectors can be obtained by optimizing the original problems using any convex optimization algorithm (, FISTA with 1200 iterations). Suppose that the estimates at the th iteration form a set of training data or that will be fed into the DNN. We denote the output of the DNN as a set of positive real numbers for stepsizes. Then, a set of vectors can be obtained at the next iteration:
(6) 
where for the th element in the vector (soft threshold).
The desired stepsizes for the th images at the th iteration can be obtained by training the DNN to minimize the following loss function with respect to :
(7) 
where and then by evaluating After are evaluated by the learned stepsizes using (6), we propose to generate another set of vectors for the next iteration by using a conventional stepsize based on Lipschitz constant :
(8) 
where . This additional step was necessary since were not often improved over when the DNN training was not done yet.
To sum, one iteration of our proposed method consists of two steps: 1) the first operation moves a current estimate towards its solution, 2) the second operation is applied for initial training of DNNs. In our simulations, our proposed training method worked well to reduce the loss quickly.
The same training method with a diagonal matrix can be applied by replacing the stepsize in (6) with a diagonal matrix . The output dimension of the DNN must be changed from to with its backpropagations.
4.2 Learning stepsizes for further iterations
Now we propose to further train DNNs to generate stepsizes for multiple iterations. Inspired the training strategy in [15], we define the following cumulative loss function:
(9) 
where is defined in (6), and new input datasets as well as ground truth labels are defined as and where contains duplicated sets.
Suppose that the DNN is to learn stepsizes of the first iterations. Initially, the DNN is trained with the input data set and the ground truth label at the th iteration using the procedure in Section 4.1. Then, in the next iteration, the DNN is retrained with the input data set and the ground truth label at the first iteration. This training process is repeated times so that the DNN can be trained cumulatively as summarized in Algorithm 2.
We expect that our trained DNN should yield nearoptimal stepsizes for the first iterations, but may not be able to yield good stepsizes in later iterations that are larger than . Thus, should be selected based on the tradeoff between image quality and computation time.
5 DNNincorporated convergent algorithms
5.1 SGP method as framework
The SGP method is described in Algorithm 1. If is an identity matrix and is equal to 1 for , the SGP method is reduced to the PGM. Thus, the SGP method is a generalized version of the PGM by additionally multiplying a symmetric positive definite matrix with the gradient of a loss function that guarantees to be a descent direction and by enforcing the Armijo condition for convergence. However, there is no known method to determine that can accelerate and guarantee convergence. We propose DNN to determine that can be trained using the learning procedure in Section 4. Since it is also possible for the DNN to yield that may not satisfy necessary conditions, we proposed to relax the SGP method to selectively use DNN based stepsize (or diagonal matrix) estimation or conservative Lipschitz constant based stepsize to guarantee convergence as summarized in Algorithm 3. We call this the direction relaxation scheme (DRS).
5.2 Proposed relaxation algorithms with DNN
For the DRS in Algorithm 3, note that when is norm and is a diagonal matrix whose diagonal elements form a vector . represents a wild search direction generated by the trained DNN and is a conservative search direction from conventional Lipschitz constant based stepsize. Then, depending on the relationship between and , the final search direction will be either a linear combination of both of them or alone. For the DNN to generate a single stepsize, will be an identity matrix multiplied by that stepsize.
We propose to incorporate the DNN based DRS method into the SGP algorithm as detailed in Algorithm 4. is a search direction to yield the estimated vector for the next iteration. As in Algorithm 3, is either the weighted average of and with or itself. The value of was initially set to be and it remains the same or decreases by a factor over iterations depending on the Armijo condition at each iteration . The ratio of the weight for to the weight for in is evaluated at each iteration. Initially, using the trained DNN is dominant in , but for later iterations and the DNN will not be used eventually. Thus, our proposed algorithm is initially the SGP with DNN search directions and becomes the PGM for later iterations.
Note that the proposed DRS method with relaxation only determines a search direction for the next estimate in a descent direction for inverse problems. is the final stepsize parameter, starting from and decreases its value by a factor until it satisfies the Armijo condition. Therefore, our proposed method in Algorithms 3 and 4 using the trained DNN is converging theoretically.
6 Simulation results
6.1 Inverse problem settings
We performed various image processing simulations such as image inpainting, CS image recovery with partial Fourier samples, image deblurring and largescale sparseview CT to evaluate our proposed methods. An optimization problem for inverse problems in image processing has the following form:
(10) 
where is an index for image, is a matrix that describes an image degradation forward process, is a measurement vector and is a regularization parameter to balance between data fidelity and image prior. An image is modeled to be in wavelet domain (three level symlet4) so that is a wavelet coefficient vector for an image. So, the linear operator is a measurement matrix (different for image by image) with an inverse sparsifying transform . For normalized measurement matrices, their Lipschitz constants are less than and normalized gradients helped to yield better results. BSDS500 dataset [2] with images was used for all simulations where 450 / 50 images were used for training / testing, respectively. was set to be 0.1.
We implemented our stepsize (or diagonal matrix) DNN based on UNet architecture [27] and modified FBPConvNet [17] using MatConvNet on MATLAB. Note that the input and output for the DNN are in a sparsifying transform domain and they have improved the overall performance of inverse problems. convolution filters are used for all convolutional layers and batch normalization and rectified linear unit (ReLU) were used after each convolution layer. max pooling was applied in the first half of the DNN and deconvolution layers / skip connections were used for the second half of the DNN. We reduced the number of layers in the original FBPConvNet to lower computation time (7M params). For stepsize learning, one fully connected layer was added at the end of the DNN to generate a single number. All simulations were run on an NVIDIA Titan X.
We compared our proposed SGP methods with stepsize DNN (called Steplearned) and diagonal matrix DNN (called Diaglearned) to conventional algorithms such as ISTA and FISTA with backtracking. We also compared our proposed methods with the UNet that was trained to yield ground truth converged images for (10) from input measurements. We chose UNet to compare since it has been shown to yield good results in various image processing problems with largescale inputs and with different forward models for images including inpainting and compressive image recovery [19]. Unfortunately, we were not able to compare ours with ReconNet [18] or LISTACPSS [10] that were limited to smallscale CS image recovery problems with the image patch sizes of and , respectively, and empirically one fixed forward model .
NMSE was used for evaluation criteria [10] using Note that all DNNs were trained with converged solutions while the evaluations were done with the oracle solutions (the original BSDS500 dataset) for better evaluating the robustness of all methods under significantly different forward models and additional measurement noise. All DNNs for our Steplearned/Diaglearned methods as well as UNet to yield solutions directly were trained for the case of recovering from noiseless 50% samples with different forward models for different training images. Then, all methods were tested on noiseless 30%, 50%, and 70% samples for test dataset to evaluate performance as well as robustness. For image recovery with partial Fourier samples, noisy 50% samples (Gaussian with standard deviation 5) were also used for further evaluation. We also performed image deblurring for Gaussian blur kernel with and sparseview CT reconstruction with 144 views using CT images to show the feasibility on other applications.
6.2 Image inpainting
Our proposed methods were applied to inpainting problems with different sampling rates. was used where is a sampling matrix that is different for image. All DNNs were trained with 50% samples (called UNet50%, Steplearned50%, Diaglearned50%). All results tested on different sampling rates are reported in Figures 1, 2 and Table 1.
When all methods were tested on the same sampling rate (50%), noniterative UNet instantly yielded the best image quality among all methods including our proposed methods and even the ground truth converged images (FISTA at 1200 iteration yielded 18.15dB). Both of our proposed methods at 20 iteration yielded image qualities comparable to the converged images, while FISTA at 100 iteration performed much worse than our proposed methods at 20 iteration.
However, UNet did not show robust performance and yielded substantial artifacts for the tests with different sampling rates such as 30% and 70%. However, our proposed methods without retraining yielded robust accelerations for different test cases as illustrated in Figure 2.





Method  Test30%  Test50%  Test70% 

FISTA@100  5.201.01  10.742.29  17.692.85 
UNet50%  7.050.97  19.402.91  9.112.38 
Steplearn50%@20  13.772.86  17.733.01  21.653.05 
Diaglearn50%@20  13.852.71  17.502.97  21.353.05 
6.3 Image recovery with partial Fourier samples
Method  Test30%  Test50%  Test70% 

FISTA@100  20.463.49  24.843.99  32.314.74 
UNet50%  18.223.43  20.174.08  19.624.57 
Steplearn50%@20  20.803.67  25.494.20  33.044.79 
Diaglearn50%@20  20.803.65  25.524.17  33.064.78 
Similar simulations were performed for image recovery with partial Fourier samples. Note that the input image of the DNN has four channels such that the first two channels are the real and imaginary of the estimated image. Initial images in Figure 3 were obtained using inverse Fourier transform with zero padding. All results are reported in Figures 3, 4 and Tables 2, 3.
For all test cases, noniterative UNet yielded the worse results among all methods including FISTA at 100 iteration. Forward models for partial Fourier sampling is much more complicated than forward models for inpainting, and thus more complicated DNN with much more dataset seems desirable for better performance. Our proposed methods without retraining yielded robust and excellent performance at early iteration for all test cases including the same sampling rate (50%), different sampling rates (30, 70%), and additional measurement noise case (noisy 50%). Thus, our proposed DNNs does seem robust to different models and noise in compressive sensing recovery with partial Fourier samples that was inherited from conventional optimization based algorithms.





Method  Test50%  Test50% noisy 

FISTA@100  24.843.99  22.792.85 
UNet50%  20.174.08  19.853.79 
Steplearn50%@20  25.494.20  23.162.78 
Diaglearn50%@20  25.524.17  23.182.76 
6.4 Robustness to regularization parameters
We investigate the robustness of our proposed methods by running them that were trained with the original regularization parameter on the test set with different regularization parameters that are half of the original value and twice (2x) of the original value.
Figures 2 (d), 4 (d) illustrate that our proposed methods were robust to small changes in regularization parameters such as half or twice. However, large changes such as 10 times smaller or larger than the original parameter seem to break fast empirical convergence properties of our proposed methods. These phenomena were expected since changing regularization parameters leads to changing ground truth images, thus our DNNs whose inputs are dependent on current estimate and its corresponding gradient should behave in a different way (, if the current estimate is the same as the converged solution with original regularization parameter, then, zero stepsize should be obtained for the problem with the same regularization parameter, but nonzero stepsize should be obtained for different problem with different regularization parameter). Thus, large changes in regularization parameter may require retraining the DNN.
6.5 Image deblurring
Proposed methods were applied to image deblurring problems. Images were blurred using Gaussian kernel with . Then, image deblurring was performed with the regularization parameter 0.00001. Note that the initial data fidelity term for deblurring problem is usually much larger than other inverse problems such as inpainting problems. Unlike other inverse problems in image processing, learned diagonal matrix based relaxed SGP yielded the best image quality among all compared methods as shown in Figure 5 qualitatively and quantitatively. It seems that large discrepancy in the data fidelity term was quickly compensated when using the learned diagonal matrix in SGP.
6.6 Sparseview medical image reconstruction
Lastly, our proposed method was investigated for sparseview CT image reconstruction. Initial image in Figure 6 (a) was obtained by filtered backprojection from 144 views of projections and had streaking artifacts. With the regularization parameter 0.0005, we ran FISTAb and proposed diaglearned SGP. At the 10th iteration, our proposed method yielded visually better image than FISTA as illustrated in Figure 6 (b) and (c). Figure 6 (d) shows that our proposed method achieved faster convergence rate than FISTAb.
6.7 Limited robustness to other measurements
Robustness of our trained DNN for determining stepsize or diagonal matrix when different forward models are used. Our proposed methods have shown robustness for the problems such as image inpainting. It shows that the trained DNN still achieved much faster convergence than FISTA. Similar tendency was observed for image inpainting with 70% sampling. The trained DNN also yielded robust performance for lower or higher (70%) sampling in partial Fourier image recovery. For image deblurring problems with different blur levels, the trained DNN yielded suboptimal performance compared to FISTA. However, note that steplearned SGP yielded relatively robust performance to diaglearned SGP and it yielded better performance than FISTA for early iterations. For sparseview image reconstruction, the trained DNN with 144 views did not yield good performance for the test with 45 views. Thus, the robustness of the trained DNN to other forward models seems applicationdependent. However, many DNN based algorithms are not robust to other measurement models [17].
7 Conclusion
We proposed a new way of using CNNs for empirically accelerating convergence for inverse problems in image processing with dynamic parameter selections over iterations with different forward models and without breaking theoretical properties such as convergence and robustness. Our trained DNN enabled SGP to empirically outperform FISTA that is theoretically faster than SGP and yield robust performance compared to direct mapping DNNs.
Acknowledgments
This work was supported partly by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF2017R1D1A1B05035810), the Technology Innovation Program or Industrial Strategic Technology Development Program (10077533, Development of robotic manipulation algorithm for grasping/assembling with the machine learning using visual and tactile sensing information) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C0316).
References
 [1] (2016) Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems, pp. 3981–3989. Cited by: §2.
 [2] (201105) Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (5), pp. 898–916. Cited by: §6.1.
 [3] (196601) Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16 (1), pp. 1–3. Cited by: §3.2.
 [4] (2009) A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2 (1), pp. 183–202. Cited by: §1, §1.
 [5] (201509) New convergence results for the scaled gradient projection method. Inverse Problems 31 (9), pp. 095008. Cited by: §3.2, §3.2, Algorithm 1.
 [6] (2008) A scaled gradient projection method for constrained image deblurring. Inverse Problems 25 (1), pp. 015002. Cited by: §3.2, Algorithm 1.
 [7] (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning 3 (1), pp. 1–122. Cited by: §1.
 [8] (200601) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52 (2), pp. 489–509. Cited by: §2.
 [9] (2017) One network to solve them all  solving linear inverse problems using deep projection models. In IEEE International Conference on Computer Vision (ICCV), pp. 5889–5898. Cited by: §1, §2.
 [10] (2018) Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds. In Advances in Neural Information Processing Systems (NeurIPS), pp. 9061–9071. Cited by: §1, §2, §6.1, §6.1.
 [11] (2009) Messagepassing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106 (45), pp. 18914–18919. Cited by: §1.
 [12] (2003) An EM algorithm for waveletbased image restoration. IEEE Transactions on Image Processing 12 (8), pp. 906–916. Cited by: §1, §1.
 [13] (201802) Tradeoffs Between Convergence Speed and Reconstruction Accuracy in Inverse Problems. IEEE Transactions on Signal Processing 66 (7), pp. 1676–1690. Cited by: §1, §2.
 [14] (2010) Learning fast approximations of sparse coding. In International Conference on Machine Learning (ICML), pp. 399–406. Cited by: §1, §2.
 [15] (2018) CNNbased projected gradient descent for consistent CT image reconstruction. IEEE Transactions on Medical Imaging 37 (6), pp. 1440–1453. Cited by: §4.2.
 [16] (201902) Optimizing a parameterized plugandplay admm for iterative lowdose ct reconstruction. IEEE Transactions on Medical Imaging 38 (2), pp. 371–382. Cited by: §1, §2.
 [17] (201709) Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. Cited by: §1, §2, §6.1, §6.7.
 [18] (2016) ReconNet: Noniterative reconstruction of images from compressively sensed measurements. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 449–458. Cited by: §1, §2, §6.1.
 [19] (2018) Noise2Noise: learning image restoration without clean data. In International Conference on Machine Learning (ICML), pp. 2965–2974. Cited by: §6.1.
 [20] (2017) LEARNING to optimize. In International Conference on Learning Representations (ICLR), Cited by: §2.
 [21] (2017) Enhanced Deep Residual Networks for Single Image SuperResolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132–1140. Cited by: §1, §2.
 [22] (2019) ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA. In International Conference on Learning Representations (ICLR), Cited by: §1, §2.
 [23] (2009) Nonlocal sparse models for image restoration. In IEEE International Conference on Computer Vision (ICCV), pp. 2272–2279. Cited by: §1.
 [24] (2017) Learned DAMP: Principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems (NIPS), pp. 1772–1783. Cited by: §1, §2.
 [25] (2017) Understanding Trainable Sparse Coding via Matrix Factorization. In International Conference on Learning Representations (ICLR), Cited by: §1, §2.
 [26] (201201) Gradientbased image recovery methods from incomplete Fourier measurements. IEEE Transactions on Image Processing 21 (1), pp. 94–105. Cited by: §1.
 [27] (2015) UNet: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and ComputerAssisted Intervention (MICCAI), pp. 234–241. Cited by: §1, §6.1.
 [28] (2005) Fields of experts: a framework for learning image priors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 860–867. Cited by: §1.
 [29] (2019) Plugandplay methods provably converge with properly trained denoisers. In International Conference on Machine Learning, pp. 5546–57. Cited by: §1, §2.
 [30] (2016) Deep ADMMNet for compressive sensing MRI. In Advances in Neural Information Processing Systems (NIPS), pp. 10–18. Cited by: §1, §2.
 [31] (2018) Image restoration by iterative denoising and backward projections. IEEE Transactions on Image Processing 28 (3), pp. 1220–1234. Cited by: §1, §2.
 [32] (2012) Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pp. 341–349. Cited by: §1, §2.
 [33] (2018) ISTANet: Interpretable OptimizationInspired Deep Network for Image Compressive Sensing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1828–1837. Cited by: §1, §2.
 [34] (2017) Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §1, §2.
 [35] (2013) Handling noise in single image deblurring using directional filters. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 612–619. Cited by: §1.
 [36] (2011) From learning models of natural image patches to whole image restoration. In IEEE International Conference on Computer Vision (ICCV), pp. 479–486. Cited by: §1.