Accelerating Optimization Algorithms With Dynamic Parameter Selections Using Convolutional Neural Networks For Inverse Problems In Image Processing

Accelerating Optimization Algorithms With Dynamic Parameter Selections Using Convolutional Neural Networks For Inverse Problems In Image Processing

Byung Hyun Lee     Se Young Chun
School of Electrical and Computer Engineering, UNIST, Republic of Korea
Corresponding to sychun@unist.ac.kr
Abstract

Recent advances using deep neural networks (DNNs) for solving inverse problems in image processing have significantly outperformed conventional optimization algorithm based methods. Most works train DNNs to learn 1) forward models and image priors implicitly for direct mappings from given measurements to solutions, 2) data-driven priors as proximal operators in conventional iterative algorithms, or 3) forward models, priors and/or static stepsizes in unfolded structures of optimization iterations. Here we investigate another way of utilizing convolutional neural network (CNN) for empirically accelerating conventional optimization for solving inverse problems in image processing. We propose a CNN to yield parameters in optimization algorithms that have been chosen heuristically, but have shown to be crucial for good empirical performance. Our CNN-incorporated scaled gradient projection methods, without compromising theoretical properties, significantly improve empirical convergence rate over conventional optimization based methods in large-scale inverse problems such as image inpainting, compressive image recovery with partial Fourier samples, deblurring and sparse view CT. During testing, our proposed methods dynamically select parameters every iterations to speed up convergence robustly for different degradation levels, noise, or regularization parameters as compared to direct mapping approach.

1 Introduction

Optimization based solvers for inverse problems have been widely investigated for image processing applications such as denoising [23], inpainting [28], deblurring [36], image recovery from incomplete Fourier samples [26], and image reconstruction from noisy Radon transformed measurements [35]. A typical pipeline for them is to construct an objective function with accurate forward modeling of image degradation processes and with reasonable image priors such as minimum total variation (TV) and/or sparsity in wavelet (or learned transformed) domain, and then to optimize the objective function to yield a solution, a recovered image, using theoretically well-grounded optimization algorithms such as iterative shrinkage-thresholding algorithm (ISTA) [12], fast ISTA (FISTA) [4], approximate message passing (AMP) based algorithm [11], or alternating directional method of multipliers (ADMM) [7]. Many related works have been proposed to improve theoretical convergence rates of algorithms and/or to design good image priors to regularize ill-posed inverse problems.

Deep neural networks (DNNs) have revolutionized the ways of solving inverse problems in image processing. These methods have significantly outperformed conventional optimization based methods by significantly improving the image quality of solutions and computation speed. There are largely three ways of using DNNs for inverse problems in image processing: 1) direct mapping DNNs from measurements (or analytic reconstructions) to solutions by implicitly learning forward models and image priors [32, 34, 21, 18, 17], 2) DNN based proximal operators for iterative algorithms by explicitly learning image priors [9, 24, 31, 16, 29], or 3) unfolded structure DNNs inspired by conventional optimization iterations by learning forward models, priors and/or stepsizes in optimizations [14, 30, 25, 13, 33, 10, 22]. Recently, it has been theoretically shown that unfolded LISTA has asymptotic linear convergence property for compressive sensing (CS) recovery problems [10, 22]. However, these methods have been applied only to small-scale CS recovery problems due to large number of parameters to train or to determine.

Most previous works determine static stepsizes with Lipschitz constant / training processes or dynamic stepsizes with backtracking. However, we argue that for empirically fast convergence rate, stepsizes in optimization iterations must be determined dynamically for different problems every iterations. Here we investigate an alternative way of utilizing CNNs for empirically accelerating optimization for solving large-scale inverse problems in image processing. We propose CNNs estimating a near optimal stepsize (or a diagonal matrix) per iteration for given problems to accelerate empirical convergence rates of conventional optimizations. These parameters in optimization algorithms have been selected heuristically, but have shown to be crucial for good empirical performance. Our CNN-incorporated scaled gradient projection (SGP) methods, without compromising theoretical properties, significantly improve empirical convergence rate over conventional optimization based methods such as ISTA [12] / FISTA [4] with backtracking in large-scale inverse problems such as image inpainting, CS image recovery with partial Fourier samples, image deblurring and sparse view CT reconstruction. Our ways of using DNN select parameters every iteration to speed up empirical convergence robustly for different degradation levels, noise, or regularization parameters as compared to typical direct mapping CNN (, U-Net [27]).

Here are our contributions: We 1) propose small CNNs to dynamically determine stepsizes in optimizations every iterations, 2) propose CNN-incorporated SGP methods without compromising convergence properties, and 3) demonstrate the performance and robustness of our methods for large-scale inverse problems in image processing.

2 Related Works

DNN-based direct mapping approaches have yielded state-of-the-art performance in image quality and computation speed for inverse problems in image processing such as image inpainting [32], image denoising [34], single image super resolution [21], sparse image recovery [18] and medical image reconstruction [17]. However, they also have limitations such as no mechanism to ensure that a solution is corresponding to a given measurement through forward models by correcting for intermediate errors in a current solution. Moreover, DNN-based direct mapping methods are based on black-box models with limited interpretability on the solutions for inverse problems. In contrast, conventional optimization based approaches often have theoretical guarantees for the exact recovery [8] or have interpretable converged solutions such as K-sparse images in wavelet domain with minimum distances to given measurements through forward models.

DNNs inspired by unfolded optimization iterations were also investigated with finite number of iterations. Learned ISTA (LISTA) [14] was the first work of this type to propose DNNs that implicitly learn forward models, image priors and stepsizes in optimizations from data. LISTA has been extended to unfolded LISTA with learned weights [10] and ALISTA with analytically determined weights [22]. The original LISTA compromised ISTA’s theoretical convergence properties, but there have been efforts to understand convergence properties in unfolded structure of DNNs [25, 13]. Recently, it has been theoretically shown that unfolded LISTA has asymptotic linear convergence property for CS recovery problems [10]. However, these methods have been applied to small-scale CS recovery problems due to large number of parameters to determine. If image size is 256256 and the compression ratio is about 20%, LISTA [14] and LISTA-CPSS [10] require about 4,465M, 2,727M parameters to train, while our proposed U-Net based CNN requires only 7M parameters for 256256 images. Recently, ALISTA [22] proposed methods for determining analytical weights using dictionary learning or convolutional sparse coding, but it does not seem to work for general forward models yet.

ADMM-net [30] and ISTA-Net [33] are also based on unfolded optimization iterations of ADMM and ISTA, respectively, but unlike LISTA approaches, they utilized forward models in their networks and trained convolutional neural networks (CNNs) for image priors such as transformations, parametrized non-linear functions and for optimization parameters such as stepsize. Using forward models explicitly allows these methods to deal with large-scale inverse problems. Similarly, there have been works using DNNs only for proximal operators in iterative algorithms [9, 24, 31, 16, 29]. Unlike ADMM-net and ISTA-Net with fixed number of iterations, these methods have flexibility of running any number of iterations for different cases. However, they focused on using DNNs as image priors within conventional optimization framework, rather than investigating acceleration of convergence. Most works in this category have static stepsizes that have been determined heuristically, selected conservatively to ensure convergence (e.g., 1/Lipschitz constant), or learned from data.

Lastly, there have been a few recent attempts to learn optimization algorithms as DNN based functions of gradients [1] and as policies of selecting algorithms using reinforcement learning [20]. These are similar to our proposed methods in dynamically determining algorithms every iterations. However, our methods are within the framework of SGP methods with theoretical convergence properties.

3 Background

3.1 Proximal gradient method

Consider an optimization problem of the form

(1)

where is convex, differentiable and is convex, subdifferentiable for . Then, the following update equation at the th iteration is called the proximal gradient method (PGM):

(2)

where and denotes -norm. PGM guarantees the convergence of to at the solution with the rate of .

One way to determine is based on a majorization-minimization technique for the cost function and its quadratic surrogate function for each iteration. is usually assumed to be Lipschitz continuous on a given domain and the reciprocal of a Lipschitz constant of is used for . Another popular method to choose is a backtracking method. Note that both methods do not seek the largest possible stepsize since it is often more efficient to calculate the next iteration with conservative sub-optimal stepsize than to perform time-consuming stepsize optimization. Thus, if there is a way to quickly calculate near optimal stepsizes, it could help to accelerate empirical convergence rates.

3.2 Scaled gradient projection method

Recently, scaled gradient projection (SGP) methods with the convergence rate have been proposed and have empirically demonstrated their general convergence speed improvements over FISTA with  [6, 5]. The problem (1) can be seen as a constrained optimization problem

(3)

where that is a convex set due to the convexity of and is determined by . Then, an iterative algorithm can be formulated as the PGM given by

(4)

where Whenever , is a descent direction at for the problem (3) and thus its inner product with becomes negative. implies that is a stationary point. Since is a descent direction at , Armijo line search can generate a convergent sequence of that satisfy Armijo condition [3]:

(5)

where and for .

In the problem (3), SGP methods introduced an additional symmetric positive definite matrix in front of . Symmetry and positive definiteness are necessary conditions for a Hessian matrix of for Newton’s method and they are also important conditions for quasi-Newton methods. Newton-type methods usually converge with fewer iterations than first-order optimization methods, but they are computationally demanding especially for large-scale input data. SGP methods exploit symmetry and positive definiteness with the aim of less computational burdens while they can refine a direction vector of the diagonal elements in to accelerate convergence rate. SGP methods are based on Armijo line search since remains as a descent direction with the conditions on at . They are also applicable for proximal operators.

For the convergence of SGP methods, it requires additional condition for . Define for as the set of all symmetric positive definite matrices whose eigenvalues are in the interval . Then, for such that , the condition , should be satisfied. becomes an identity matrix and an iteration becomes similar to PGM. Appropriate  [5] accelerated empirical convergence over fast PGMs such as FISTA.

For a given proximal operator the SGP method is summarized in Algorithm 1. Finding that can accelerate convergence still remains as an open problem. We propose to replace these heuristic decisions with DNNs.

  Given , , and ,
  for   do
     Set
     Choose , and
     
     while (5) is not satisfied do
        
     end while
     
  end for
Algorithm 1 Scaled Gradient Projection (SGP) [6, 5]

4 Learning-based stepsize selection

We conjecture that DNN can be trained to generate a near-optimal stepsize per iteration if a current estimate and the gradient of a cost function at that estimate are given. Since no ground truth is available for the optimal sequence of stepsizes over all iterations, we propose to train a stepsize DNN to yield a greedy near-optimal stepsize per iteration by minimizing the distance between the estimated vector at the next iteration and the converged solution at each iteration.

4.1 Learning a stepsize for an iteration

To learn stepsizes by a DNN, a set of solution vectors of optimization problems was generated and used as ground truth data. Solution vectors can be obtained by optimizing the original problems using any convex optimization algorithm (, FISTA with 1200 iterations). Suppose that the estimates at the th iteration form a set of training data or that will be fed into the DNN. We denote the output of the DNN as a set of positive real numbers for stepsizes. Then, a set of vectors can be obtained at the next iteration:

(6)

where for the th element in the vector (soft threshold).

The desired stepsizes for the th images at the th iteration can be obtained by training the DNN to minimize the following loss function with respect to :

(7)

where and then by evaluating After are evaluated by the learned stepsizes using (6), we propose to generate another set of vectors for the next iteration by using a conventional stepsize based on Lipschitz constant :

(8)

where . This additional step was necessary since were not often improved over when the DNN training was not done yet.

To sum, one iteration of our proposed method consists of two steps: 1) the first operation moves a current estimate towards its solution, 2) the second operation is applied for initial training of DNNs. In our simulations, our proposed training method worked well to reduce the loss quickly.

The same training method with a diagonal matrix can be applied by replacing the stepsize in (6) with a diagonal matrix . The output dimension of the DNN must be changed from to with its backpropagations.

4.2 Learning stepsizes for further iterations

Now we propose to further train DNNs to generate stepsizes for multiple iterations. Inspired the training strategy in [15], we define the following cumulative loss function:

(9)

where is defined in (6), and new input datasets as well as ground truth labels are defined as and where contains duplicated sets.

Suppose that the DNN is to learn stepsizes of the first iterations. Initially, the DNN is trained with the input data set and the ground truth label at the th iteration using the procedure in Section 4.1. Then, in the next iteration, the DNN is re-trained with the input data set and the ground truth label at the first iteration. This training process is repeated times so that the DNN can be trained cumulatively as summarized in Algorithm 2.

  Given , ,
  for   do
     Train DNN with the input set and the label set
  end for
Algorithm 2 Stepsize Learning for Multiple Iterations

We expect that our trained DNN should yield near-optimal stepsizes for the first iterations, but may not be able to yield good stepsizes in later iterations that are larger than . Thus, should be selected based on the trade-off between image quality and computation time.

5 DNN-incorporated convergent algorithms

5.1 SGP method as framework

The SGP method is described in Algorithm 1. If is an identity matrix and is equal to 1 for , the SGP method is reduced to the PGM. Thus, the SGP method is a generalized version of the PGM by additionally multiplying a symmetric positive definite matrix with the gradient of a loss function that guarantees to be a descent direction and by enforcing the Armijo condition for convergence. However, there is no known method to determine that can accelerate and guarantee convergence. We propose DNN to determine that can be trained using the learning procedure in Section 4. Since it is also possible for the DNN to yield that may not satisfy necessary conditions, we proposed to relax the SGP method to selectively use DNN based stepsize (or diagonal matrix) estimation or conservative Lipschitz constant based stepsize to guarantee convergence as summarized in Algorithm 3. We call this the direction relaxation scheme (DRS).

  Given , , , , Lipschitz constant of , and a trained DNN function
  if  then
     
     
     
     
     
     if  then
        
     else
        ,  
     end if
  else
     
  end if
Algorithm 3 Direction Relaxation Scheme (DRS)

5.2 Proposed relaxation algorithms with DNN

  Given , , , ,
  for k = 0, 1, 2, …, K do
     Generate by DRS in Algorithm 3
     Set and
     while (5) is not satisfied with  do
        ,  
     end while
     
  end for
Algorithm 4 Proposed SGP Algorithm with Stepsize DNN

For the DRS in Algorithm 3, note that when is -norm and is a diagonal matrix whose diagonal elements form a vector . represents a wild search direction generated by the trained DNN and is a conservative search direction from conventional Lipschitz constant based stepsize. Then, depending on the relationship between and , the final search direction will be either a linear combination of both of them or alone. For the DNN to generate a single stepsize, will be an identity matrix multiplied by that stepsize.

We propose to incorporate the DNN based DRS method into the SGP algorithm as detailed in Algorithm 4. is a search direction to yield the estimated vector for the next iteration. As in Algorithm 3, is either the weighted average of and with or itself. The value of was initially set to be and it remains the same or decreases by a factor over iterations depending on the Armijo condition at each iteration . The ratio of the weight for to the weight for in is evaluated at each iteration. Initially, using the trained DNN is dominant in , but for later iterations and the DNN will not be used eventually. Thus, our proposed algorithm is initially the SGP with DNN search directions and becomes the PGM for later iterations.

Note that the proposed DRS method with relaxation only determines a search direction for the next estimate in a descent direction for inverse problems. is the final stepsize parameter, starting from and decreases its value by a factor until it satisfies the Armijo condition. Therefore, our proposed method in Algorithms 3 and 4 using the trained DNN is converging theoretically.

6 Simulation results

6.1 Inverse problem settings

We performed various image processing simulations such as image inpainting, CS image recovery with partial Fourier samples, image deblurring and large-scale sparse-view CT to evaluate our proposed methods. An optimization problem for inverse problems in image processing has the following form:

(10)

where is an index for image, is a matrix that describes an image degradation forward process, is a measurement vector and is a regularization parameter to balance between data fidelity and image prior. An image is modeled to be in wavelet domain (three level symlet-4) so that is a wavelet coefficient vector for an image. So, the linear operator is a measurement matrix (different for image by image) with an inverse sparsifying transform . For normalized measurement matrices, their Lipschitz constants are less than and normalized gradients helped to yield better results. BSDS500 dataset [2] with images was used for all simulations where 450 / 50 images were used for training / testing, respectively. was set to be 0.1.

We implemented our stepsize (or diagonal matrix) DNN based on U-Net architecture [27] and modified FBPConvNet [17] using MatConvNet on MATLAB. Note that the input and output for the DNN are in a sparsifying transform domain and they have improved the overall performance of inverse problems. convolution filters are used for all convolutional layers and batch normalization and rectified linear unit (ReLU) were used after each convolution layer. max pooling was applied in the first half of the DNN and deconvolution layers / skip connections were used for the second half of the DNN. We reduced the number of layers in the original FBPConvNet to lower computation time (7M params). For stepsize learning, one fully connected layer was added at the end of the DNN to generate a single number. All simulations were run on an NVIDIA Titan X.

We compared our proposed SGP methods with stepsize DNN (called Step-learned) and diagonal matrix DNN (called Diag-learned) to conventional algorithms such as ISTA and FISTA with backtracking. We also compared our proposed methods with the U-Net that was trained to yield ground truth converged images for (10) from input measurements. We chose U-Net to compare since it has been shown to yield good results in various image processing problems with large-scale inputs and with different forward models for images including inpainting and compressive image recovery [19]. Unfortunately, we were not able to compare ours with ReconNet [18] or LISTA-CPSS [10] that were limited to small-scale CS image recovery problems with the image patch sizes of and , respectively, and empirically one fixed forward model .

NMSE was used for evaluation criteria [10] using Note that all DNNs were trained with converged solutions while the evaluations were done with the oracle solutions (the original BSDS500 dataset) for better evaluating the robustness of all methods under significantly different forward models and additional measurement noise. All DNNs for our Step-learned/Diag-learned methods as well as U-Net to yield solutions directly were trained for the case of recovering from noiseless 50% samples with different forward models for different training images. Then, all methods were tested on noiseless 30%, 50%, and 70% samples for test dataset to evaluate performance as well as robustness. For image recovery with partial Fourier samples, noisy 50% samples (Gaussian with standard deviation 5) were also used for further evaluation. We also performed image deblurring for Gaussian blur kernel with and sparse-view CT reconstruction with 144 views using CT images to show the feasibility on other applications.

6.2 Image inpainting

Our proposed methods were applied to inpainting problems with different sampling rates. was used where is a sampling matrix that is different for image. All DNNs were trained with 50% samples (called U-Net-50%, Step-learned-50%, Diag-learned-50%). All results tested on different sampling rates are reported in Figures 12 and Table 1.

When all methods were tested on the same sampling rate (50%), non-iterative U-Net instantly yielded the best image quality among all methods including our proposed methods and even the ground truth converged images (FISTA at 1200 iteration yielded -18.15dB). Both of our proposed methods at 20 iteration yielded image qualities comparable to the converged images, while FISTA at 100 iteration performed much worse than our proposed methods at 20 iteration.

However, U-Net did not show robust performance and yielded substantial artifacts for the tests with different sampling rates such as 30% and 70%. However, our proposed methods without re-training yielded robust accelerations for different test cases as illustrated in Figure 2.

(a) Input 30, 50, 70% samplings for inpainting
(b) FISTA-b for 30, 50, 70% samplings
(c) U-Net trained on 50%, tested on 30, 50, 70% samplings
(d) Step-learned SGP trained on 50%, tested on 30, 50, 70% samplings
(e) Diag-learned SGP trained on 50%, tested on 30, 50, 70% samplings
Figure 1: Recovered images for inpainting using DNNs trained on 50% and tested on 30, 50, 70% samplings, respectively. FISTA-b, Step-learned and Diag-learned SGPs were run with 40 iterations.
(a) Trained on 50%, Tested on 50%
(b) Trained on 50%, Tested on 30%
(c) Trained on 50%, Tested on 70%
(d) Trained on , Tested on ,
Figure 2: NMSEs over iterations for the methods in inpainting trained on 50% sampling with a regularization parameter and tested on 50, 30, 70% samplings (a-c) or with , (d).
Method Test-30% Test-50% Test-70%
FISTA@100 -5.201.01 -10.742.29 -17.692.85
U-Net-50% -7.050.97 -19.402.91 -9.112.38
Step-learn-50%@20 -13.772.86 -17.733.01 -21.653.05
Diag-learn-50%@20 -13.852.71 -17.502.97 -21.353.05
Table 1: Averaged NMSE (dB) of all methods trained with 50%, tested on various cases for inpainting.

6.3 Image recovery with partial Fourier samples

Method Test-30% Test-50% Test-70%
FISTA@100 -20.463.49 -24.843.99 -32.314.74
U-Net-50% -18.223.43 -20.174.08 -19.624.57
Step-learn-50%@20 -20.803.67 -25.494.20 -33.044.79
Diag-learn-50%@20 -20.803.65 -25.524.17 -33.064.78
Table 2: Averaged NMSE (dB) of methods trained with 50%, tested on various partial Fourier cases.

Similar simulations were performed for image recovery with partial Fourier samples. Note that the input image of the DNN has four channels such that the first two channels are the real and imaginary of the estimated image. Initial images in Figure 3 were obtained using inverse Fourier transform with zero padding. All results are reported in Figures 34 and Tables 23.

For all test cases, non-iterative U-Net yielded the worse results among all methods including FISTA at 100 iteration. Forward models for partial Fourier sampling is much more complicated than forward models for inpainting, and thus more complicated DNN with much more dataset seems desirable for better performance. Our proposed methods without re-training yielded robust and excellent performance at early iteration for all test cases including the same sampling rate (50%), different sampling rates (30, 70%), and additional measurement noise case (noisy 50%). Thus, our proposed DNNs does seem robust to different models and noise in compressive sensing recovery with partial Fourier samples that was inherited from conventional optimization based algorithms.

(a) Input images from 30, 50, and noisy 50% partial Fourier samples
(b) FISTA-b on 30, 50, noisy 50% partial Fourier samples
(c) U-Net trained with 50%, tested on 30, 50, noisy 50% partial Fourier samples
(d) Step-learned SGP trained with 50%, tested on 30, 50, noisy 50%
(e) Diag-learned SGP trained with 50%, tested on 30, 50, noisy 50%
Figure 3: Recovered images from partial Fourier sampling. DNNs were trained with 50% and tested on 30, 50, noisy 50% samples. 40 iterations were run for FISTA-b and our proposed methods.
(a) Train-50%,Test-30%
(b) Train-50%,Test-50%
(c) Train-50%,Test-50% noise
(d) Train-50%,Test-70%
Figure 4: NMSE over iterations for all methods in partial Fourier recovery trained on 50% sampling, tested on 30 (a), 50 (b), noisy 50% (c) samplings or with , (d).
Method Test-50% Test-50% noisy
FISTA@100 -24.843.99 -22.792.85
U-Net-50% -20.174.08 -19.853.79
Step-learn-50%@20 -25.494.20 -23.162.78
Diag-learn-50%@20 -25.524.17 -23.182.76
Table 3: Averaged NMSE (dB) of methods trained with 50%, tested on various partial Fourier cases.

6.4 Robustness to regularization parameters

We investigate the robustness of our proposed methods by running them that were trained with the original regularization parameter on the test set with different regularization parameters that are half of the original value and twice (2x) of the original value.

Figures 2 (d), 4 (d) illustrate that our proposed methods were robust to small changes in regularization parameters such as half or twice. However, large changes such as 10 times smaller or larger than the original parameter seem to break fast empirical convergence properties of our proposed methods. These phenomena were expected since changing regularization parameters leads to changing ground truth images, thus our DNNs whose inputs are dependent on current estimate and its corresponding gradient should behave in a different way (, if the current estimate is the same as the converged solution with original regularization parameter, then, zero stepsize should be obtained for the problem with the same regularization parameter, but non-zero stepsize should be obtained for different problem with different regularization parameter). Thus, large changes in regularization parameter may require re-training the DNN.

6.5 Image deblurring

(a) Blurred image
(b) FISTA-b
(c) Step-learned SGP
(d) Diag-learned SGP
(e) Iteration vs. NMSE (dB)
Figure 5: Reconstructed images for deblurring from (a) input blurred image using Gaussian kernel with : (b) FISTA with backtracking, (c) step-learned SGP, (d) diag-learned SGP, all at the 10th iteration. Proposed methods yielded faster initial convergence rates than FISTA-b (e) average convergence for 50 images.

Proposed methods were applied to image deblurring problems. Images were blurred using Gaussian kernel with . Then, image deblurring was performed with the regularization parameter 0.00001. Note that the initial data fidelity term for deblurring problem is usually much larger than other inverse problems such as inpainting problems. Unlike other inverse problems in image processing, learned diagonal matrix based relaxed SGP yielded the best image quality among all compared methods as shown in Figure 5 qualitatively and quantitatively. It seems that large discrepancy in the data fidelity term was quickly compensated when using the learned diagonal matrix in SGP.

6.6 Sparse-view medical image reconstruction

(a) Input image from 144 views
(b) FISTA-b
(c) Diag-learned SGP
(d) Iteration vs. NMSE (dB)
Figure 6: Reconstructed images for sparse-view CT from (a) Input image (144 views) : (b) FISTA with backtracking, (c) Proposed diag-learned SGP, all at the 10th iteration. (d) Proposed method yielded faster convergence rate than FISTA-b.

Lastly, our proposed method was investigated for sparse-view CT image reconstruction. Initial image in Figure 6 (a) was obtained by filtered back-projection from 144 views of projections and had streaking artifacts. With the regularization parameter 0.0005, we ran FISTA-b and proposed diag-learned SGP. At the 10th iteration, our proposed method yielded visually better image than FISTA as illustrated in Figure 6 (b) and (c). Figure 6 (d) shows that our proposed method achieved faster convergence rate than FISTA-b.

6.7 Limited robustness to other measurements

Robustness of our trained DNN for determining stepsize or diagonal matrix when different forward models are used. Our proposed methods have shown robustness for the problems such as image inpainting. It shows that the trained DNN still achieved much faster convergence than FISTA. Similar tendency was observed for image inpainting with 70% sampling. The trained DNN also yielded robust performance for lower or higher (70%) sampling in partial Fourier image recovery. For image deblurring problems with different blur levels, the trained DNN yielded sub-optimal performance compared to FISTA. However, note that step-learned SGP yielded relatively robust performance to diag-learned SGP and it yielded better performance than FISTA for early iterations. For sparse-view image reconstruction, the trained DNN with 144 views did not yield good performance for the test with 45 views. Thus, the robustness of the trained DNN to other forward models seems application-dependent. However, many DNN based algorithms are not robust to other measurement models [17].

7 Conclusion

We proposed a new way of using CNNs for empirically accelerating convergence for inverse problems in image processing with dynamic parameter selections over iterations with different forward models and without breaking theoretical properties such as convergence and robustness. Our trained DNN enabled SGP to empirically outperform FISTA that is theoretically faster than SGP and yield robust performance compared to direct mapping DNNs.

Acknowledgments

This work was supported partly by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2017R1D1A1B05035810), the Technology Innovation Program or Industrial Strategic Technology Development Program (10077533, Development of robotic manipulation algorithm for grasping/assembling with the machine learning using visual and tactile sensing information) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C0316).

References

  • [1] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas (2016) Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems, pp. 3981–3989. Cited by: §2.
  • [2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik (2011-05) Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (5), pp. 898–916. Cited by: §6.1.
  • [3] L. Armijo (1966-01) Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16 (1), pp. 1–3. Cited by: §3.2.
  • [4] A. Beck and M. Teboulle (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2 (1), pp. 183–202. Cited by: §1, §1.
  • [5] S. Bonettini and M. Prato (2015-09) New convergence results for the scaled gradient projection method. Inverse Problems 31 (9), pp. 095008. Cited by: §3.2, §3.2, Algorithm 1.
  • [6] S. Bonettini, R. Zanella, and L. Zanni (2008) A scaled gradient projection method for constrained image deblurring. Inverse Problems 25 (1), pp. 015002. Cited by: §3.2, Algorithm 1.
  • [7] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning 3 (1), pp. 1–122. Cited by: §1.
  • [8] E. J. Candes, J. Romberg, and T. Tao (2006-01) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52 (2), pp. 489–509. Cited by: §2.
  • [9] J. H. R. Chang, C. Li, B. Poczos, and B. V. K. V. Kumar (2017) One network to solve them all - solving linear inverse problems using deep projection models. In IEEE International Conference on Computer Vision (ICCV), pp. 5889–5898. Cited by: §1, §2.
  • [10] X. Chen, J. Liu, Z. Wang, and W. Yin (2018) Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds. In Advances in Neural Information Processing Systems (NeurIPS), pp. 9061–9071. Cited by: §1, §2, §6.1, §6.1.
  • [11] D. L. Donoho, A. Maleki, and A. Montanari (2009) Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106 (45), pp. 18914–18919. Cited by: §1.
  • [12] M. A. Figueiredo and R. D. Nowak (2003) An EM algorithm for wavelet-based image restoration. IEEE Transactions on Image Processing 12 (8), pp. 906–916. Cited by: §1, §1.
  • [13] R. Giryes, Y. C. Eldar, A. M. Bronstein, and G. Sapiro (2018-02) Tradeoffs Between Convergence Speed and Reconstruction Accuracy in Inverse Problems. IEEE Transactions on Signal Processing 66 (7), pp. 1676–1690. Cited by: §1, §2.
  • [14] K. Gregor and Y. LeCun (2010) Learning fast approximations of sparse coding. In International Conference on Machine Learning (ICML), pp. 399–406. Cited by: §1, §2.
  • [15] H. Gupta, K. H. Jin, H. Q. Nguyen, M. T. McCann, and M. Unser (2018) CNN-based projected gradient descent for consistent CT image reconstruction. IEEE Transactions on Medical Imaging 37 (6), pp. 1440–1453. Cited by: §4.2.
  • [16] J. He, Y. Yang, Y. Wang, D. Zeng, Z. Bian, H. Zhang, J. Sun, Z. Xu, and J. Ma (2019-02) Optimizing a parameterized plug-and-play admm for iterative low-dose ct reconstruction. IEEE Transactions on Medical Imaging 38 (2), pp. 371–382. Cited by: §1, §2.
  • [17] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser (2017-09) Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. Cited by: §1, §2, §6.1, §6.7.
  • [18] K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok (2016) ReconNet: Non-iterative reconstruction of images from compressively sensed measurements. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 449–458. Cited by: §1, §2, §6.1.
  • [19] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila (2018) Noise2Noise: learning image restoration without clean data. In International Conference on Machine Learning (ICML), pp. 2965–2974. Cited by: §6.1.
  • [20] K. Li and J. Malik (2017) LEARNING to optimize. In International Conference on Learning Representations (ICLR), Cited by: §2.
  • [21] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee (2017) Enhanced Deep Residual Networks for Single Image Super-Resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132–1140. Cited by: §1, §2.
  • [22] J. Liu, X. Chen, Z. Wang, and W. Yin (2019) ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA. In International Conference on Learning Representations (ICLR), Cited by: §1, §2.
  • [23] J. Mairal, F. R. Bach, J. Ponce, G. Sapiro, and A. Zisserman (2009) Non-local sparse models for image restoration. In IEEE International Conference on Computer Vision (ICCV), pp. 2272–2279. Cited by: §1.
  • [24] C. Metzler, A. Mousavi, and R. Baraniuk (2017) Learned D-AMP: Principled neural network based compressive image recovery. In Advances in Neural Information Processing Systems (NIPS), pp. 1772–1783. Cited by: §1, §2.
  • [25] T. Moreau and J. Bruna (2017) Understanding Trainable Sparse Coding via Matrix Factorization. In International Conference on Learning Representations (ICLR), Cited by: §1, §2.
  • [26] V. M. Patel, R. Maleh, A. C. Gilbert, and R. Chellappa (2012-01) Gradient-based image recovery methods from incomplete Fourier measurements. IEEE Transactions on Image Processing 21 (1), pp. 94–105. Cited by: §1.
  • [27] O. Ronneberger, P. Fischer, and T. Brox (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241. Cited by: §1, §6.1.
  • [28] S. Roth and M. J. Black (2005) Fields of experts: a framework for learning image priors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 860–867. Cited by: §1.
  • [29] E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin (2019) Plug-and-play methods provably converge with properly trained denoisers. In International Conference on Machine Learning, pp. 5546–57. Cited by: §1, §2.
  • [30] J. Sun, H. Li, Z. Xu, et al. (2016) Deep ADMM-Net for compressive sensing MRI. In Advances in Neural Information Processing Systems (NIPS), pp. 10–18. Cited by: §1, §2.
  • [31] T. Tirer and R. Giryes (2018) Image restoration by iterative denoising and backward projections. IEEE Transactions on Image Processing 28 (3), pp. 1220–1234. Cited by: §1, §2.
  • [32] J. Xie, L. Xu, and E. Chen (2012) Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pp. 341–349. Cited by: §1, §2.
  • [33] J. Zhang and B. Ghanem (2018) ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1828–1837. Cited by: §1, §2.
  • [34] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §1, §2.
  • [35] L. Zhong, S. Cho, D. Metaxas, S. Paris, and J. Wang (2013) Handling noise in single image deblurring using directional filters. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 612–619. Cited by: §1.
  • [36] D. Zoran and Y. Weiss (2011) From learning models of natural image patches to whole image restoration. In IEEE International Conference on Computer Vision (ICCV), pp. 479–486. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
398669
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description