Discriminative Transfer Learning for General Image Restoration

Discriminative Transfer Learning for General Image Restoration

Lei Xiao
University of British Columbia
   Felix Heide
Stanford University
   Wolfgang Heidrich
   Bernhard Schölkopf
MPI for Intelligent Systems
   Michael Hirsch
MPI for Intelligent Systems

Recently, several discriminative learning approaches have been proposed for effective image restoration, achieving convincing trade-off between image quality and computational efficiency. However, these methods require separate training for each restoration task (e.g., denoising, deblurring, demosaicing) and problem condition (e.g., noise level of input images). This makes it time-consuming and difficult to encompass all tasks and conditions during training. In this paper, we propose a discriminative transfer learning method that incorporates formal proximal optimization and discriminative learning for general image restoration. The method requires a single-pass training and allows for reuse across various problems and conditions while achieving an efficiency comparable to previous discriminative approaches. Furthermore, after being trained, our model can be easily transferred to new likelihood terms to solve untrained tasks, or be combined with existing priors to further improve image restoration quality.


1 Introduction

Low-level vision problems, such as denoising, deconvolution and demosaicing, have to be addressed as part of most imaging and vision systems. Although a large body of work covers these classical problems, low-level vision is still a very active area. The reason is that, from a Bayesian perspective, solving them as statistical estimation problems does not only rely on models for the likelihood (i.e. the reconstruction task), but also on natural image priors as a key component.

A variety of models for natural image statistics have been explored in the past. Traditionally, models for gradient statistics [27, 17], including total-variation, have been a popular choice. Another line of works explores patch-based image statistics, either as per-patch sparse model [11, 35] or modeling non-local similarity between patches [9, 10, 13]. These prior models are general in the sense that they can be applied for various likelihoods, with the image formation and noise setting as parameters. However, the resulting optimization problems are prohibitively expensive, rendering them impractical for many real-time tasks especially on mobile platforms.

Recently, a number of works [29, 8] have addressed this issue by truncating the iterative optimization and learning discriminative image priors, tailored to a specific reconstruction task (likelihood) and optimization approach. While these methods allow to trade-off quality with the computational budget for a given application, the learned models are highly specialized to the image formation model and noise parameters, in contrast to optimization-based approaches. Since each individual problem instantiation requires costly learning and storing of the model coefficients, current proposals for learned models are impractical for vision applications with dynamically changing (often continuous) parameters. This is a common scenario in most real-world vision settings, as well as applications in engineering and scientific imaging that rely on the ability to rapidly prototype methods.

In this paper, we combine discriminative learning techniques with formal proximal optimization methods to learn generic models that can be truly transferred across problem domains while achieving comparable efficiency as previous discriminative approaches. Using proximal optimization methods [12, 23, 3] allows us to decouple the likelihood and prior which is key to learn such shared models. It also means that we can rely on well-researched physically-motivated models for the likelihood, while learning priors from example data. We verify our technique using the same model for a variety of diverse low-level image reconstruction tasks and problem conditions, demonstrating the effectiveness and versatility of our approach. After training, our approach benefits from the proximal splitting techniques, and can be naturally transferred to new likelihood terms for untrained restoration tasks, or it can be combined with existing state-of-the-art priors to further improve the reconstruction quality. This is impossible with previous discriminative methods. In particular, we make the following contributions:

  • We propose a discriminative transfer learning technique for general image restoration. It requires a single-pass training and transfers across different restoration tasks and problem conditions.

  • We show that our approach is general by demonstrating its robustness for diverse low-level problems, such as denoising, deconvolution, inpainting, and for varying noise settings.

  • We show that, while being general, our method achieves comparable computational efficiency as previous discriminative approaches, making it suitable for processing high-resolution images on mobile imaging systems.

  • We show that our method can naturally be combined with existing likelihood terms and priors after being trained. This allows our method to process untrained restoration tasks and take advantage of previous successful work on image priors (e.g., color and non-local similarity priors).

2 Related work

Image restoration aims at computationally enhancing the quality of images by undoing the adverse effects of image degradation such as noise and blur. As a key area of image and signal processing it is an extremely well studied problem and a plethora of methods exists, see for example [22] for a recent survey. Through the successful application of machine learning and data-driven approaches, image restoration has seen revived interest and much progress in recent years. Broadly speaking, recently proposed methods can be grouped into three classes: classical approaches that make no explicit use of machine learning, generative approaches that aim at probabilistic models of undegraded natural images and discriminative approaches that try to learn a direct mapping from degraded to clean images. Unlike classical methods, methods belonging to the latter two classes depend on the availability of training data.

Classical models focus on local image statistics and aim at maintaining edges. Examples include total variation [27], bilateral filtering [32] and anisotropic diffusion models [34]. More recent methods exploit the non-local statistics of images [1, 9, 21, 10, 13, 31]. In particular the highly successful BM3D method [9] searches for similar patches within the same image and combines them through a collaborative filtering step.

Generative learning models seek to learn probabilistic models of undegraded natural images. A simple, yet powerful subclass include models that approximate the sparse gradient distribution of natural images [19, 17, 18]. More expressive generative models include the fields of experts (FoE) model [26], KSVD [11] and the EPLL model [35]. While both FoE and KVSD learn a set of filters whose responses are assumed to be sparse, EPLL models natural images through Gaussian Mixture Models. All of these models have in common that they are agnostic to the image restoration task, i.e. they are transferable to any image degradation and can be combined in a modular fashion with any likelihood and additional priors at test time.

Discriminative learning models have recently become increasingly popular for image restoration due to their attractive tradeoff between high image restoration quality and efficiency at test time. Methods include trainable random field models such as cascaded shrinkage fields (CSF) [29], regression tree fields (RTF) [16], trainable nonlinear reaction diffusion (TRD) models [8], as well as deep convolutional networks [15] and other multi-layer perceptrons [4].

Discriminative approaches owe their computational efficiency at run-time to a particular feed-forward structure whose trainable parameters are optimized for a particular task during training. Those learned parameters are then kept fixed at test-time resulting in a fixed computational cost. On the downside, discriminative models do not generalize across tasks and typically necessitate separate feed-forward architectures and separate training for each restoration task (denoising, demosaicing, deblurring, etc.) as well as every possible image degradation (noise level, Bayer pattern, blur kernel, etc.).

In this work, we propose the discriminative transfer learning technique that is able to combine the strengths of both generative and discriminative models: it maintains the flexibility of generative models, but at the same time enjoys the computational efficiency of discriminative models. While in spirit our approach is akin to the recently proposed method of Rosenbaum and Weiss [25], who equipped the successful EPLL model with a discriminative prediction step, the key idea in our approach is to use proximal optimization techniques [12, 23, 3] that allow the decoupling of likelihood and prior and therewith share the full advantages of a Bayesian generative modeling approach.

Runtime efficiency
Easy to parallelize
Table 1: Analysis of state-of-the-art methods. In the table, “Transferable” means the model can be used for different restoration tasks and problem conditions; “Modular” means the method can be combined with other existing priors at test time.

Table 1 summarizes the properties of the most prominent state-of-the-art methods and puts our own proposed approach into perspective.

3 Proposed method

3.1 Diversity of data likelihood

Figure 1: The architecture of our method. Input images are drawn from various restoration tasks and problem conditions. Each iteration uses the same model parameters, forming a recurrent network.

The seminal work of fields-of-experts (FoE) [26] generalizes the form of filter response based regularizers in the objective function given in Eq. 1. The vectors and represent the observed and latent (desired) image respectively, the matrix is the sensing operator, represents 2D convolution with filter , and represents the penalty function on corresponding filter responses . The positive scalar controls the relative weight between the data fidelity (likelihood) and the regularization term.


The well-known anisotropic total-variation regularizer can be viewed as a special case of the FoE model where is the derivative operator , and the norm.

While there are various types of restoration tasks (e.g., denoising, deblurring, demosaicing) and problem parameters (e.g., noise level of input images), each problem has its own sensing matrix and optimal fidelity weight . For example, is an identity matrix for denoising, a convolution operator for deblurring, a binary diagonal matrix for demosaicing, and a random matrix for compressive sensing [5]. depends on both the task and its parameters in order to produce the best quality results.

The state-of-the-art discriminative learning methods (CSF[29], TRD[8]) derive an end-to-end feed-forward model from Eq. 1 for each specific restoration task, and train this model to map the degraded input images directly to the output. These methods have demonstrated a great trade-off between high-quality and time-efficiency, however, as an inherent problem of the discriminative learning procedure, they require separate training for each restoration task and problem condition. Given the diversity of data likelihood of image restoration, this fundamental drawback of discriminative models makes it time-consuming and difficult to encompass all tasks and conditions during training.

3.2 Decoupling likelihood and prior

It is difficult to directly minimize Eq. 1 when the penalty function is non-linear and/or non-smooth (e.g., norm, ). Proximal algorithms [3, 12, 6] instead relax Eq. 1 and split the original problem into several easier subproblems that are solved alternately until convergence.

In this paper we employ the half-quadratic-splitting (HQS) algorithm [12] to relax Eq. 1, as it typically requires much fewer iterations to converge compared with other proximal methods such as ADMM [3] and PD [6]. The relaxed objective function is given in Eq. 2:


where a slack variable is introduced to approximate , and is a positive scalar.

With the HQS algorithm, Eq. 2 is iteratively minimized by solving for the slack variable and the latent image alternately as in Eq. 3 and 4 ().

Prior proximal operator:
Data proximal operator:

where increases as the iteration continues. This forces to become an increasingly good approximation of , thus making Eq. 2 an increasingly good proxy for Eq. 1.

Note that, while most related approaches including CSF [29] relax Eq. 1 by splitting on , we split on instead. This is critical for deriving our approach. With this new splitting strategy, the prior term and the data likelihood term in the original objective Eq. 1 are now separated into two subproblems that we call the “prior proximal operator” (Eq. 3) and the “data proximal operator” (Eq. 4), respectively.

3.3 Discriminative transfer learning

We observed that, while the data proximal operator in Eq. 4 is task-dependent because both the sensing matrix and fidelity weight are problem-specific as explained in Sec. 3.1, the prior proximal-operator (i.e. -update step in Eq. 3) is independent of the original restoration tasks and problem conditions.

This leads to our main insight: Discriminative learned models can be made transferable by using them in place of the prior proximal operator, embedded in a proximal optimization algorithm. This allows us to generalize a single discriminative learned model to a very large class of problems, i.e. any linear inverse imaging problem, while simultaneously overcoming the need for problem-specific retraining. Moreover, it enables learning the task-dependent parameter in the data proximal operator for each problem in a single training pass, eliminating tedious hand-tuning at test time.

We also observed that, benefiting from our new splitting strategy, the prior proximal operator in Eq. 3 can be interpreted as a Gaussian denoiser on the intermediate image , since the least-squares consensus term is equivalent to a Gaussian denoising term. This inspires us to utilize existing discriminative models that have been successfully used for denoising (e.g. CSF, TRD).

For convenience, we denote the prior proximal operator as , i.e.


where the model parameter includes a number of filters and corresponding penalty functions . Inspired by the state-of-the-art discriminative methods [29, 8], we propose to learn the model , and the fidelity weight scalar , from training data. Recall that with our new splitting strategy introduced in Sec. 3.2, the image prior and data-fidelity term in the original objective (Eq. 1) are contained in two separate subproblems (Eq. 3 and 4). This makes it possible to train together an ensemble of diverse tasks (e.g., denoising, deblurring, or with different noise levels) each of which has its own data proximal operator, while learning a single prior proximal operator that is shared across tasks. This is in contrast to state-of-the-art discriminative methods such as CSF [29] and TRD [8] which train separate models for each task.

For clarity, in Fig. 1 we visualize the architecture of our method. The input images may represent various restoration tasks and problem conditions. At each HQS iteration, each image from problem is updated by its own data proximal operator in Eq. 4 which contains separate trainable fidelity weight and pre-defined sensing matrix ; then each slack image is updated by the same, shared prior proximal operator implemented by a learned, discriminative model.

Recurrent network. Note that in Fig. 1 each HQS iteration uses exactly the same model parameters, forming a recurrent network. This is in contrast to previous discriminative learning methods including CSF and TRD, which form feed-forward networks. Our recurrent network architecture maintains the convergence property of the proximal optimization algorithm (HQS), and is critical for our method to transfer between various tasks and problem conditions.

Shared prior proximal operator. While any discriminative Gaussian denoising model could be used as in our framework, we specifically propose to use the multi-stage non-linear diffusion process that is modified from the TRD [8] model, for its efficiency. The model is given in Eq. 6.


where is the stage index, filters , function are trainable model parameters at each stage, and is the initial value of . Note that, different from TRD, our model does not contain the reaction term which would be with step size . The main reasons for this modification are:

  • The data constraint is contained in update in Eq. 4;

  • More importantly, by dropping the reaction term our model gets rid of the weight which changes at each HQS iteration. Therefore, our proximal operator is simplified to be:


The parameter to learn in our method includes ’s for each problem class (restoration task and problem condition), and in the prior proximal operator shared across different classes, i.e. . Even though the scalar parameters are trained, our method allows users to override them at test time to handle non-trained problem classes or specific inputs as we will show in Sec. 4. This contrasts to previous discriminative approaches whose model parameters are all fixed at test time. The subscript indicating the problem class in is omitted below for convenience. The values of are pre-selected: and for .

0:  degraded image
0:  recovered image
1:   (initialization)
2:  for  to  do
3:     (Update by Eq. 6 below)
5:     for  to  do
7:     end for
9:     (Update by Eq. 4 below)
12:  end for
Algorithm 1 Proposed algorithm

Note that a multi-stage model as in Eq. 6 is not possible if we split on instead of in Eq. 1 and 2. For clarity, an overview of the proposed algorithm is given in Algorithm 1.

3.4 Training

We consider denoising and deconvolution tasks at training, where the sensing operator is an identity matrix, or a block circulant matrix with circulant blocks that represents 2D convolution with randomly drawn blur kernels respectively. In denoising tasks, the update in Eq. 4 has a closed-form solution:


In deconvolution tasks, the update in Eq. 4 has a closed-form solution in the Fourier domain:


where and represent Fourier and inverse Fourier transform respectively. Note that, compared to CSF [29], our method does not require FFT computations for denoising tasks. We use the L-BFGS solver [28] with analytic gradient computation for training. The training loss function is defined as the negative average Peak Signal-to-Noise Ratio (PSNR) of reconstructed images. The gradient of w.r.t. the model parameters is computed by accumulating gradients at all HQS iterations, i.e.


The 1D functions in Eq. 6 are parameterized as a linear combination of equidistant-positioned Gaussian kernels whose weights are trainable.

Progressive training. A progressive scheme is proposed to make the training more effective. First, we set the number of HQS iterations to be 1, and train and the model of each stage in in a greedy fashion. Then, we gradually increase the number of HQS iterations from 1 to where at each step the model is refined from the result of the previous step. The L-BFGS iterations are set to be 200 for the greedy training steps, and 100 for the refining steps. Fig. 2 shows examples of learned filters in .

(a) Filters at stage 1.
(b) Filters at stage 2.
(c) Filters at stage 3.
Figure 2: Trained filters at each stage ( in Eq. 6) of the proximal operator in our model (3 stages each with 24 55 filters).

4 Results

Figure 3: Analysis of model generality on image denoising. In this plot, “TRD15” denotes the TRD model trained at noise , and “TRD25” trained at noise . Our model DTL is trained with mixed noise levels in a single pass.

Denoising and generality analysis. We compare the proposed discriminative transfer learning (DTL) method with state-of-the-art image denoising techniques, including KSVD [11], FoE [26], BM3D [9], LSSC [21], WNNM [13], EPLL [35], opt-MRF [7], ARF [2], CSF [29] and TRD [8]. The subscript in CSF and TRD indicates the number of cascaded stages (each stage has different model parameters). The subscript and superscript in our method DTL indicate the number of diffusion stages ( in Algorithm 1) in our proximal operator , and the number of HQS iterations ( in Alg. 1), respectively. Note that the complexity (size) of our model is linear in , but independent of . CSF, TRD and DTL use 24 filters of size 55 pixels at all stages in this section.

The compared discriminative methods, CSF and TRD both are trained at single noise level that is the same as the test images. In contrast, our model is trained on 400 images (100100 pixels) cropped from [26] with random and discrete noise levels (standard deviation ) varying between 5 and 25. The images with the same noise level share the same data fidelity weight at training.

To verify the generality of our method on varying noise levels, we test our model DTL (trained with varying noise levels in a single pass) and two TRD models (trained at specific noise levels 15 and 25) on 3 sets of 68 images with noise respectively. The average PSNR values are shown in Fig. 3. Although performing slightly below the TRD model trained for the exact noise level used at test time, our method is more generic and works robustly for various noise levels. The performance of the discriminative TRD method drops down quickly as the problem condition (i.e. noise level) at test differs from its training data. In sharp contrast to discriminative methods (CSF, TRD, etc), which are inherently specialized for a given problem setting, i.e. noise level, the proposed approach transfers across different problem settings. More analysis can be found in the supplementary material.

All compared methods are evaluated on the 68 test images from [26] and the averaged PSNR values are reported in Table 2. The compared discriminative methods (CSF, TRD, etc) were trained for exactly the same noise level as the test images (i.e. the best case for them), while our model was trained with mixed noise levels and works robustly for arbitrary noise levels. Our results are comparable to generic methods such as KSVD, FoE and BM3D, and very close to discriminative methods such as CSF, while at the same time being much more time-efficient.

30.87 30.99 31.08 31.27 31.37 31.19
31.18 30.70 31.14 31.30 30.91 31.00
Table 2: Average PSNR(dB) on 68 images from [26] for denoising.
Image size
WNNM 157.73 657.75 2759.79 - -
EPLL 29.21 111.52 463.71 - -
BM3D 0.78 3.45 15.24 62.81 275.39
CSF 1.23 2.22 7.35 27.08 93.66
TRD 0.39 0.71 2.01 7.57 29.09
DTL 0.60 1.19 3.45 12.97 56.19
DTL (Halide) 0.11 0.26 1.60 5.61 20.85
Table 3: Runtime (seconds) comparison for image denoising.

Run-time comparison. In Table 3 we compare the run-time of our method and state-of-the-art methods. The experiments were performed on a laptop computer with Intel i7-4720HQ CPU and 16GB RAM. WNNM and EPLL ran out-of-memory for images over 4 megapixels in our experiments. CSF, TRD and DTL all use “parfor” setting in Matlab. DTL is significantly faster than all compared generic methods (WNNM, EPLL, BM3D) and even the discriminative method CSF. Run-time of DTL is about 1.5 times that of TRD, which is expected as they use 5 versus 9 diffusion steps in total. In addition, we implement our method in Halide language [24], which has become popular recently for high-performance image processing applications, and report the run-time on the same CPU as mentioned above.

Deconvolution. In this experiment, we train a model with an ensemble of denoising and deconvolution tasks on 400 images (100100 pixels) cropped from [26], in which 250 images are generated for denoising tasks with random noise levels varying between 5 and 25, and the other 150 images are generated by blurring the images with random 2525 kernels (PSFs) and then adding Gaussian noise with ranging between 1 and 5. All images are quantized to 8 bits.

Figure 4: Our results with different fidelity weight for the non-blind deconvolution experiment reported in Table 4.

We compare our method with state-of-the-art non-blind deconvolution methods including Levin et al. [19], Schmidt et al. [30] and CSF [29]. Note that TRD [8] does not support non-blind deconvolution. We test the methods on the benchmark dataset from [20] which contains 32 real-captured images and report the average PSNR values in Table 4. The results of compared methods are quoted from [29].

As said in Sec. 3.3, while the scalar weight is trained, our method allows users to override it at test time for untrained problem classes or specific inputs. Fig. 4 shows our results with different on the experiments compared in Table 4. Within a fairly wide range of , our method outperforms all previous methods.

We further test the above model trained with ensemble tasks on the denoising experiment in Table 2. The result average PSNR is 30.98dB, which is comparable to the result with the model trained only on the denoising task.

Input Levin [19] Schmidt [30] CSF DTL
22.86 32.73 33.97 33.48 34.34
Table 4: Average PSNR (dB) on 32 images from [20] for non-blind deconvolution.

Modularity with existing priors. As shown above, even though the fidelity weight is trainable, our method allows users to override its value at test time. This property also makes it possible to combine our model (after being trained) with existing state-of-the-art priors at test time, in which case typically needs to be adjusted. This allows our method to take advantage of previous successful work on image priors. Again, this is not possible with previous discriminative methods (CSF, TRD).

(a) Input (20.17dB)
(b) BM3D (29.62dB)
(c) DTL (29.48dB)
(d) DTL + BM3D (29.74dB)
Figure 5: Experiment on incorporating non-local patch similarity prior (BM3D) with our model after being trained. The input noise level . Please zoom in for better view.

In Fig. 5 we show an example to incorporate a non-local patch similarity prior (BM3D [9]) with our method to further improve the denoising quality. BM3D performs well in removing noise especially in smooth regions but usually over-smoothes edges and textures. Our original model (DTL) well preserves sharp edges however sometimes introduces artifacts in smooth regions when the input noise level is high. By combining those two methods, which is easy with our HQS framework, the result is improved both visually and quantitatively.

We give the derivation of the proposed hybrid method below. Let represents the non-local patch similarity prior. The objective function is:


Applying the HQS technique described in Sec. 3, we relax the objective to be:


Then we minimize Eq. 12 by alternately solving the following 3 subproblems:


where is from our previous training, and the subproblem is approximated by running BM3D software on with noise parameter following [33, 14].

Similarly, our method can incorporate color image priors (e.g., cross-channel edge-concurrence prior [14]) to improve test results on color images, despite our model being trained on gray-scale images. An example is shown in Fig. 6. The hybrid method shares the advantages of our original model that effectively preserves edges and textures and the cross-channel prior that reduces color artifacts.

(a) Ground truth
(b) Input (20.18dB)
(c) TRD (28.06dB)
(d) DTL (27.80dB)
(e) TV + cross (26.89dB)
(f) DTL + cross (28.69dB)
Figure 6: Experiment on incorporating a color prior [14] with our model after being trained. The input noise level . (e,f) show the results by combining total variation (TV) denoising with a cross-channel prior, and our method with cross-channel prior, respectively. Please zoom in for better view.
(a) Ground truth
(b) Noisy input (20.18dB)
(c) Iter 1 (22.85dB)
(d) Iter 2 (25.93dB)
(e) Iter 3 (28.14dB)
Figure 7: Results at each HQS iteration of our method on image denoising with noise level . Inside brackets show the PSNR values.
(a) Ground truth
(b) Blurry input (23.37dB)
(c) Iter 1 (27.32dB)
(d) Iter 2 (28.48dB)
(e) Iter 3 (29.36dB)
Figure 8: Results at each HQS iteration of our method on non-blind deconvolution with a 2525 PSF and noise level .

Transferability to unseen tasks. Our method allows for new data-fidelity terms that are not contained in training, with no need for re-training. We demonstrate this flexibility with an experiment on the joint denoising and inpainting task shown in Fig. 9. In this experiment, 60% pixels of the input image are missing, and the measured 40% pixels are corrupted with Gaussian noise with . Let vector be the binary mask for measured pixels. The sensing matrix in Eq. 1, assumed to be known, is a binary diagonal matrix (hence ) with diagonal elements . To reuse our model trained on denoising/deconvolution tasks, we only need to specify and . The subproblems of our HQS framework are given in Eq. 14.

(a) Input
(b) Delaunay interp.(23.19dB)
(c) DTL (25.10dB)
(d) Ground truth
Figure 9: Experiment on joint denoising and inpainting task. The input image (a) misses 60% pixels, and is corrupted with noise . Our method takes the result of Delaunay interpolation (b) as the initial estimation . Please zoom in for better view.

Analysis of convergence and model complexity. To better understand the convergence of our method, in Fig. 7 and 8 we show the results of each HQS iteration of our method on denoising and non-blind deconvolution.

To understand the effect of model complexity and the number of HQS iteration on results, in Table 5 we report test results of our method using models trained with different HQS iterations ( in Algorithm 1), and with different stages in ( in Algorithm 1).

# HQS iterations
1 3 5

# stages

1 29.80 / 26.81 30.89 / 28.12 30.96 / 28.28
3 30.54 / 27.82 30.91 / 28.19 31.00 / 28.42
5 30.54 / 27.83 30.92 / 28.18 -
Table 5: Test with different HQS iterations () and model stages () for image denoising. Average PSNR (dB) results on 68 images from [20] with noise and are reported (before and after “/” in each cell respectively).

5 Conclusion

In this paper, we proposed the discriminative transfer learning framework for general image restoration. By combining advanced proximal optimization algorithms and discriminative learning techniques, a single training pass leads to a transferable model useful for a variety of image restoration tasks and problem conditions. Furthermore, our method is flexible and can be combined with existing priors and likelihood terms after being trained, allowing us to improve image quality on a task at hand. In spite of this generality, our method achieves comparable run-time efficiency as previous discriminative approaches, making it suitable for high-resolution image restoration and mobile vision applications.

We believe that in future work, our framework incorporating advanced optimization with discriminative learning techniques can be extended to deep learning, for training more compact and shareable models, and to solve high-level vision problems.


  • [1] B. C. A. Buades and J. M. Morel. A review of image denoising algorithms, with a new one. Multiscale Modeling and Simulation, 4(2):490–530, 2005.
  • [2] A. Barbu. Training an active random field for real-time image denoising. IEEE Transactions on Image Processing, 18(11):2451–2462, 2009.
  • [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
  • [4] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with BM3D? In CVPR 2012.
  • [5] E. J. Candès and M. B. Wakin. An introduction to compressive sampling. IEEE signal processing magazine, 25(2):21–30, 2008.
  • [6] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011.
  • [7] Y. Chen, T. Pock, R. Ranftl, and H. Bischof. Revisiting loss-specific training of filter-based mrfs for image restoration. In German Conference on Pattern Recognition 2013.
  • [8] Y. Chen, W. Yu, and T. Pock. On learning optimized reaction diffusion processes for effective image restoration. In CVPR 2015.
  • [9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080–2095, 2007.
  • [10] W. Dong, L. Zhang, G. Shi, and X. Li. Nonlocally centralized sparse representation for image restoration. IEEE Transactions on Image Processing, 22(4):1620–1630, 2013.
  • [11] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745, 2006.
  • [12] D. Geman and C. Yang. Nonlinear image recovery with half-quadratic regularization. IEEE Transactions on Image Processing, 4(7):932–946, 1995.
  • [13] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In CVPR 2014.
  • [14] F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak, D. Reddy, O. Gallo, J. Liu, W. Heidrich, K. Egiazarian, et al. Flexisp: a flexible camera image processing framework. ACM Transactions on Graphics (TOG), 33(6):231, 2014.
  • [15] V. Jain and H. Seung. Natural image denoising with convolutional networks.
  • [16] J. Jancsary, S. Nowozin, T. Sharp, and C. Rother. Regression tree fields - an efficient, non-parametric approach to image labeling problems. In CVPR 2012.
  • [17] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-laplacian priors. In NIPS 2009.
  • [18] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure. In CVPR 2011.
  • [19] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Image and depth from a conventional camera with a coded aperture. ACM transactions on graphics (TOG), 26(3):70, 2007.
  • [20] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Efficient marginal likelihood optimization in blind deconvolution. In CVPR 2011.
  • [21] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration. In ICCV 2009.
  • [22] P. Milanfar. A tour of modern image filtering: New insights and methods, both practical and theoretical. IEEE Signal Processing Magazine, 30(1):106–128, 2013.
  • [23] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1(3):123–231, 2013.
  • [24] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6):519–530, 2013.
  • [25] D. Rosenbaum and Y. Weiss. The return of the gating network: Combining generative models and discriminative training in natural image priors. In NIPS 2015.
  • [26] S. Roth and M. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
  • [27] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1-4):259–268, 1992.
  • [28] M. Schmidt. minfunc: unconstrained differentiable multivariate optimization in matlab. http://www.cs.ubc.ca/s̃chmidtm/Software/minFunc.html.
  • [29] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In CVPR 2014.
  • [30] U. Schmidt, C. Rother, S. Nowozin, J. Jancsary, and S. Roth. Discriminative non-blind deblurring. In CVPR 2013.
  • [31] H. Talebi and P. Milanfar. Global image denoising. IEEE Transactions on Image Processing, 23(2):755–768, 2014.
  • [32] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In ICCV 1998.
  • [33] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg. Plug-and-play priors for model based reconstruction. In GlobalSIP 2013.
  • [34] J. Weickert. Anisotropic diffusion in image processing. ECMI Series, Teubner-Verlag, Stuttgart, Germany, 1998.
  • [35] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In ICCV 2011.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description