Toward Designing Convergent Deep Operator Splitting Methods
for Taskspecific Nonconvex Optimization
Abstract
Operator splitting methods have been successfully used in computational sciences, statistics, learning and vision areas to reduce complex problems into a series of simpler subproblems. However, prevalent splitting schemes are mostly established only based on the mathematical properties of some general optimization models. So it is a laborious process and often requires many iterations of ideation and validation to obtain practical and taskspecific optimal solutions, especially for nonconvex problems in realworld scenarios. To break through the above limits, we introduce a new algorithmic framework, called Learnable Bregman Splitting (LBS), to perform deeparchitecturebased operator splitting for nonconvex optimization based on specific task model. Thanks to the datadependent (i.e., learnable) nature, our LBS can not only speed up the convergence, but also avoid unwanted trivial solutions for realworld tasks. Though with inexact deep iterations, we can still establish the global convergence and estimate the asymptotic convergence rate of LBS only by enforcing some fairly loose assumptions. Extensive experiments on different applications (e.g., image completion and deblurring) verify our theoretical results and show the superiority of LBS against existing methods.
Toward Designing Convergent Deep Operator Splitting Methods
for Taskspecific Nonconvex Optimization
Risheng Liu^{†}^{†}thanks: Corresponding Author., Shichao Cheng, Yi He, Xin Fan, Zhongxuan Luo International School of Information Science & Engineering, Dalian University of Technology Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province School of Mathematical Science, Dalian University of Technology {rsliu, xin.fan, zxluo}@dlut.edu.cn, {shichao.cheng, heyiking}@outlook.com
1 Introduction
In this work, we consider the optimization problem
(1) 
where has blocks (), is continuously differentiable, but a series of is not necessarily differentiable. Notice that convexity is not assumed for , or . By considering as an extended value function (i.e., take value), we can incorporate the set constraint into since this is equivalent to minimize the indicator function of . Therefore, we will not include this set constraint in our following analysis. Typically, captures the loss of data fitting from the specific task modeling and is the regularization that promotes desired structures on the variable . Problems appearing in many learning and vision applications, such as sparse coding [?], tensor factorization [?], image restoration [?], and data clustering [?], can all be (re)formulated in the form of Eq. (1).
1.1 Related Works
One of the most prevalent algorithms to solve Eq. (1) is the operator splitting approach. The main idea behind such kind of schemes is to reduce complex problems built from simple pieces into a series smaller subproblems which can be solved sequentially or in parallel. In the past several decades, a variety of splitting methods have been designed and analyzed. For example, [?] provided a prototype of the ForwardBackward Splitting (FBS) and proved its ergodic convergence. [?] presented the convergence rate of proximal gradient (PG) and accelerated proximal gradient (APG, also known as FISTA). Recently, [?] provided a unified way to analyze the convergence rate of PeacemanRachford Splitting (PRS) and DouglasRachford Splitting (DRS). It is also known that the widely used Alternating Direction Method of Multipliers (ADMM) can be reformulated within the operator splitting (e.g., DRS) framework in the dual space [?]. Though with mathematically proved convergence properties, the generally designed algorithms may still fail on some particular nonconvex optimization models in real scenarios. This is mainly because that due to their fixed updating schemes, it is hard to escape the unwanted saddle points during iterations.
To improve the performance in practical realworld applications, some researches tried to parameterize exiting iteration schemes and learned their parameters in the resulted propagation models. For example, [?] learned the parameters of a parameterized partial differential equation for various image and video processing tasks. Similarly, [?] introduced a higherorder diffusion system to perform datadependent gradient descent for image denoising and superresolution. The studies in [?] and [?] respectively parametrized the halfquadratic splitting and ADMM for practical applications, such as nonblind deconvolution and MRI imaging. Very recently, inspired by the success of deep networks in different application fields, some works also tried to replace the standard iterations by existing network architectures. By considering convolutional neural networks (CNNs) as special image priors, [?] proposed an iterative CNN scheme to address image restoration problems.
However, we have to point out that although with relatively better performance in some specific tasks, the nice convergence properties proved from theoretical side are completely missing in these methods. That is, neither the adaptive parameterization nor the replaced CNNs mentioned above can preserve the convergence results proved for the original iteration schemes. Moreover, it is even impossible to investigate and control the iterative behaviors (e.g., descent) of these methods, since their learned iterations actually no longer solve the original optimization model.
1.2 Contributions
In this work, we propose Learnable Bregman Splitting (LBS), a novel deep operator splitting algorithm for nonconvex optimization in realworld scenarios. Specifically, we first introduce a Bregman distance function to penalize the variables at each iteration. Then the basic LBS updating scheme is established based on a relaxed KrasonselskiiMann iteration [?]. By introducing a novel triple operator splitting strategy, we can successfully combine the taskmodelinspired and datalearningdriven operators within the LBS algorithmic framework. In summary, our contributions mainly include:

LBS provides a novel learning strategy to extend prevalent mathematically designed operator splitting schemes for taskspecific nonconvex optimization. Thanks to the learnable deep architectures, we can learn our iterations on collected training data to avoid unwanted solutions in particular applications.

Different from most existing learningbased optimization algorithms (e.g., iteration parameterization and CNN incorporation methods mentioned above), in which there is no theoretical guarantee, we provide rich investigations on the iterative behaviors, prove the global convergence and estimate the convergence rate of our LBS.

We also demonstrate how to apply our algorithm for different computer vision applications and extensive results verify that LBS outperforms stateoftheart methods on all the compared problems.
2 Learnable Bregman Splitting Method
In this section, a learningbased operator splitting method, named Learnable Bregman Splitting (LBS), is developed for the nonconvex optimization model in Eq. (1).
2.1 Bregman Distance Penalization
As a fundamental proximity measure, the Bregman distance^{1}^{1}1The use of Bregman distance in optimization within various contexts is well spread. Many interesting properties of this function can be found in the comprehensive work [?]. plays important roles in various iteration algorithms. However, since it does not satisfy the triangle inequality nor symmetry, this function is not a real metric. Given a convex differential function , the associated Bregman distance can be written as
(2) 
Clearly, is strictly convex with respect to the first argument. Moreover, for all and is equal to zero if and only if . So actually provides a natural (asymmetric) proximity measure between points in the domain of .
In this work, we introduce as a penalty term for each at th iteration. That is, we actually minimize the following energy to update :
(3) 
where we denote and is the penalty parameter. It will be demonstrated that brings nice convergence properties for the proposed optimization model when it is strong convex [?].
2.2 Uniform Coordinate Updating Scheme
In this work, we consider the following general coordinate update scheme to minimize the energy function in Eq. (3):
(4) 
where denotes the update direction (regarding to the problem) on , is a step size and denotes the th block of the given variable. It should be pointed out that by formulating (here denotes the identity mapping), Eq. (4) can be further recognized as a relaxed KrasonselskiiMann iteration [?] with the operator (i.e., ) and then various existing firstorder schemes can be reformulated in the form of Eq. (4).
Specifically, by defining (resolvent), (reflection) for (operator about ), we can obtain a variety of prevalent splitting schemes, such as FBS, PRS, and DRS. As for the operator in our work, if setting and , we obtain FBS from Eq. (4), i.e., where denotes the operator composition. By considering ^{2}^{2}2The proximal operation with respect to (denoted as ) is defined as . and , we further have the wellknown proximal (or projected) gradient scheme from FBS. Setting and , Eq. (4) reduces to which is just the standard PRS iteration. Similarly, with the same in PRS and , we can also deduce DRS.
Additionally, it should be pointed out that the wellknown ADMM [?] can also be deduced by applying DRS on its Lagrange dual space [?]. Therefore, although the original ADMM is designed for linearly constraint models, we can still reformulate it as a special case of Eq. (4) in the dual variable space. Thus Eq. (4) actually can also be utilized to address the constrained problems.
2.3 Splitting with Learnable Architecture
As discussed above, most existing splitting algorithms (e.g., FBS, PRS and DRS) specify the operator only based on the optimization model. However, due to the nonconvex nature of the model, it is hard for these schemes to escape undesired local minimum. Moreover, the complex data distribution in real applications will also slow down the redesigned iterations.
To partially address these issues, we provide a new splitting strategy, in which a learnable operator is introduced to extract information from the data. That is, we consider the following triple splitting scheme:
(5) 
where and are operators related to in Eq. (3). Here we just follow a FBSlike strategy to define and . As for , we would like to build it as a learnable network architecture and train its parameters from collected training data set^{3}^{3}3See Sec. 4 for the details of this operator and its training strategy.. In this way, we can successfully incorporate data information to improve the iterative performance of the proposed algorithm.
Notice that it is challenging to analyze the convergence issues for the existing networkincorporated iterations (e.g., [?]), since all their schemes are built in heuristic manners. In contrast, we will demonstrate in the following section that the convergence of our LBS can be strictly proved.
2.4 The Complete Algorithm
It can be seen that the learnable operator are not deduced from strict optimization rule, there may exists iteration errors when calculating at each stage. Thus we introduce a new condition to control the inexactness of our updating scheme at each iteration. Specifically, we define the optimality errors of a given variable at th iteration based on the first order subdifferential of , i.e.,
where (here we denote as the limiting Ferchet subdifferential of [?]) and . Then we consider the following socalled Relaxed Optimality Condition (ROC) for the given .
Condition 1.
(Relaxed Optimality Condition) Given any , we define the relaxed optimality condition of for () as , where is a fixed positive constant.
Based on the above condition, we are ready to propose our LBS algorithm for solving Eq. (1) in Alg. 1. Notice that the UCUS iteration, denoted as , are independently stated in Alg. 2. It can be seen that if ROC is satisfied, the LBS iterations are fully based on the learnable network operator. While for some iterations, which do not satisfy ROC, we may still perform the modelbased operators to guarantee the final convergence. For convenience, hereafter the subvectors and are denoted as and for short, respectively. We also denote , in which .
3 Convergence Analysis
In this section, we provide strict analysis on the convergence behaviors of LBS. The following assumptions on the functions , , and are necessary for our analysis. Notice that all these assumptions are fairly loose in optimization area and satisfied in most vision and learning problems.
Assumption 1.
1) is Lipschitz smooth and is proximable^{4}^{4}4A function is proximable if it is easy to obtain the minimizer of for any given y and .. 2) is coercive.
The roadmap of our analysis is summarized as follows: We first prove that the nonincrease of objective, the boundedness of the variables sequence, and the convergence of subsequence in Propositions 1, 2, and 3, respectively. Then prove Theorem 1 that LBS can generate Cauchy sequences, which converge to the critical points of the model in Eq. (1). The convergence rate of the sequences is also analyzed in Corollary 1. The detailed proofs are represented on the arXiv report ().
Proposition 1.
Remark 1.
The inequalities in Proposition 1 builds the relationship of and , thus we can obtain a series of useful inequalities:
where . It implies the nonincreasing property of .
Proposition 2.
(Square summable). If , , are the sequences by Alg. 1, we have
Proposition 3.
(Subsequence convergence). Let be the sequence generalized by Alg. 1. If is any accumulation point of . Then we have
when .
Remark 2.
Theorem 1.
(Critical point and Cauchy sequence). Let Assumption 1 hold for Eq. (1), then the sequences generated by Alg. 1 has critical points of , i.e., if is the limit of sequence , We have
If is a Kurdyka–Łojasiewicz function^{5}^{5}5It should be pointed out that many functions arising in learning and vision areas, including norm and rational norms (i.e., ) are all Kurdyka–Łojasiewicz functions [?]., we can further prove that is a Cauchy sequence, thus globally converges to a critical point of .
Based on the above theorem, we can estimate convergence rate as follows.
Corollary 1.
Let be a desingularizing function with a constant and a parameter . Then generated by Alg. 1 converges after finite iterations if . The linear and sublinear rates can be obtained if choosing and , respectively.
Input  FBS  FISTA  ADMM  LBS 
4 Numerical Results
To verify the convergence and performance of LBS for nonconvex optimization, we apply it on two widely researched vision problems, i.e., image completion and deblurring. In our algorithm, we adopt residual network as the learnable network architecture for , which can well describes the sparse priors. Specially, there are 19 layers in our network which includes 7 convolution layers, 6 ReLU layers, 5 batch normalization layers and one loss layer. Every convolution layer has 64 kernels of size , and possesses the dilation attribute. In training stage, we randomly select 800 natural images from ImageNet database [?]. The chosen pictures are cropped into small patches of size 35 35 and Gaussian noise is imposed to these patches. As for the Bergman distance , we choose Mahalanobis distance as in our applications [?]. All experiments are performed on a PC with Intel Core i7 CPU @ 3.4 GHz, 32 RAM and NVIDIA GeForce GTX 1050 Ti GPU.
4.1 Sparse Coding for Image Completion
We first consider to solve a sparse coding model to address the problem of image completion (also known as image inpainting). The purpose of this task is to restore a visually plausible image in which data are missing due to damage or occlusions. This problem can be formulated as:
(6) 
where is the observed image, denotes a mask, is the dictionary, is its corresponding sparse coefficients and is a parameter. Following [?], we consider as a inverse wavelet basis (i.e., multiplying by corresponds to performing inverse wavelet transform) and thus is just the latent image (denoted as ). To enforce the sparsity of , we set ( is unit matrix) in Bergman distance and in the above coding model.
It is easy to check that Eq. (6) is just a specific case of Eq. (1) with single variable. In this following, we first verify the theoretical results proposed in this work, and then test the performance of LBS on challenging benchmark datasets.
Iteration Behaviors Analysis: We first choose example images from CBSD68 dataset [?] to demonstrate the iterative behaviors of LBS together with some other widely used splitting schemes (e,g, FBS, FISTA, and ADMM). For fair comparisons, the stopping criterion of all the compared methods are set in the same manner. That is, we denote and consider as the stopping criterion in all these methods.
Fig. 1 showed the convergence curves from different aspects, including iteration error (“”, defined as ) and reconstruction error (“”, defined as ). Our LBS have superiority against traditional FBS, FISTA, and ADMM on both convergence rates and final reconstruction. LBS only almost a dozen steps can achieve the convergence precision while FBS and FISTA need few hundreds steps and ADMM needs four dozens of steps. Since introducing the network as , our strategies have lesser reconstruction error than others obviously. The PSNR and SSIM of the final results also verify that our LBS has better performance. Concretely, our PSNR is approximately higher 3dB than the compared methods.
We also compared the curves of objective function value errors (“”, based on ) for different settings of LBS, including naive LBS (nLBS, do not check the ROC and monotone conditions) and the complete LBS in Alg. 1. From the left subfigure of Fig. 2, it is easy to observe that the proposed criteria can lead to very fast convergence, while there are severe oscillations on the curves of nLBS. Furthermore, we plotted the bars of ROC (i.e., the error and the threshold ) on the right part of Fig. 2. It can be seen that the ROC condition is always satisfied except at the last two iterations. Thus deep networks are performed at most of our iterations. Only at the last stages, LBS tended to perform modelinspired iterations (i.e., Step 8 in Alg. 1) to obtain accurate solution for the given optimization model.
%  Metric  FoE  VNL  ISDSB  JSM  Ours 

20  PSNR  38.23  28.87  35.20  37.55  38.77 
SSIM  0.95  0.95  0.96  0.98  0.98  
40  PSNR  34.01  27.55  31.32  33.54  34.54 
SSIM  0.90  0.91  0.91  0.94  0.95  
60  PSNR  30.81  26.13  28.23  29.96  31.27 
SSIM  0.81  0.85  0.83  0.81  0.90  
80  PSNR  27.64  24.23  24.92  27.32  27.71 
SSIM  0.65  0.75  0.70  0.79  0.80  
  TIME  34.85  1515.49  28.00  207.57  1.40 
Comparisons on Benchmarks: To further express the superiority of LBS, we generated random masks of different levels (including 20%, 40%, 60% and 80% missing pixels) on CBSD68 dataset [?] for comparison, which contains 68 images with the size of 481321. Then we compared LBS with four stateoftheart methods, namely, FoE [?], VNL [?], ISDSB [?], and JSM [?]. Tab. 1 reports the averaged quantitative results, including PSNR, SSIM, and time (in second). It can be seen that regardless the proportion of masks, LBS can achieve better performance against the stateoftheart approaches. This is mainly due to our superior strategy which using learnable network operator.
We then compared the visual performance of LBS with all these methods. Fig. 3 presented the comparisons on an image from ImageNet database [?] with 60% missing pixels. It can be seen that LBS outperformed all the compared methods on both visualization and metrics (PSNR and SSIM). The edge of motorcycle wheels can be restored more smooth and clear by LBS, while other approaches exist some noises and masks to affect the visual effects.
Input  FoE  VNL  ISDSB  JSM  LBS 
  (24.45 / 0.86)  (24.92 / 0.86)  (23.27 / 0.83)  (25.40 / 0.87)  (26.11 / 0.88) 
4.2 Nonconvex TV for Image Deblurring
We further evaluate LBS on image deblurring, which is a challenging problem in computer vision area. Here we consider the following widely used total variation (TV) based formulation:
(7) 
where denote the blur kernel, latent image, and blurry observation, respectively. is the nonconvex TV regularization with gradient matrices and (here we also set for the norm). is the indicator function of the set Following the halfquadratic splitting technique, Eq. (7) (with auxiliary variables and ) can be reformulated as
(8) 
Obviously, Eq. (8) is a special case of Eq. (1) with three blocks. Thus can be efficiently addressed by LBS. We adopt in Bergman distance which satisfies s are unit matrices, , and .
Input  RTF  IRCNN  FDN  PADNet  LBS 
  (22.78 / 0.75)  (25.00 / 0.77)  (23.79 / 0.78)  (24.39 / 0.81)  (25.34 / 0.83) 
Metric  TV  HL  CSF  IDDBM3D  EPLL  RTF  MLP  IRCNN  FDN  PADNet  Ours 

PSNR  30.67  31.03  31.55  30.79  32.44  32.45  31.47  32.61  32.65  32.69  32.90 
SSIM  0.85  0.85  0.87  0.87  0.88  0.89  0.86  0.89  0.89  0.89  0.90 
Fig. 4 demonstrated the convergence behaviors of LBS on . It can be seen from the left subfigure that “” of all blocks quickly decreased to , notice that “” of is even less than . On the right subfigure, the “” of and also have dramatic decline trend, which are shown along with the right vertical ordinate. Due to the different range of values, we have to plot the curves of w.r.t. the left vertical ordinate. We can see that it still obtained the least “”.
We then reported results on the challenging image deblurring benchmark dataset collected by Sun [?] (which includes 640 blurry images with 1% Gaussian noises) for quantitative evaluation. We compared LBS with plenty of competitive approaches, including TV [?], HL [?], CSF [?], IDDBM3D [?], EPLL [?], RTF [?], MLP [?], IRCNN [?], FDN [?], and PADNet [?]).
It is known that learningbased methods (e.g., CSF, RTF, MLP, IRCNN, FDN, and PADNet) can achieve better performance than other conventional approaches in terms of quantitative metrics (e.g., PSNR and SSIM). However, due to the weak theoretical guarantee, they are worse than LBS (see Tab. 2). Fig. 5 expressed the qualitative results of LBS against other methods (top 4 in Tab. 2) on an example blurry image, which is generated with a large scale blur kernel (7575 pixels) on an image from ImageNet [?]. It can be seen that LBS can restore the text and windows more distinctly than others. Although IRCNN has relatively higher PSNR than others (but lower than LBS), its visual quality and SSIM are not satisfied.
5 Conclusions
This paper proposed Learnable Bregman Splitting (LBS), a novel deep architectures based operator splitting algorithm for taskspecific nonconvex optimization. It is demonstrated that both the modelbased operators and the datadependent networks can be used in our iteration. We also provided solid theoretical analysis to guarantee the convergence of LBS. The experimental results verified that LBS can obtain better performance against most other stateoftheart approaches.
6 Acknowledgments
This work is partially supported by the National Natural Science Foundation of China (Nos. 61672125, 61733002, 61572096, 61432003 and 61632019), and the Fundamental Research Funds for the Central Universities.
References
 [Arias et al., 2011] Pablo Arias, Gabriele Facciolo, Vicent Caselles, and Guillermo Sapiro. A variational framework for exemplarbased image inpainting. IJCV, 93(3):319–347, 2011.
 [Bauschke et al., 1997] Heinz H Bauschke, Jonathan M Borwein, et al. Legendre functions and the method of random bregman projections. Journal of Convex Analysis, 4(1):27–67, 1997.
 [Beck and Teboulle, 2009] Amir Beck and Marc Teboulle. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
 [Chen et al., 2015] Yunjin Chen, Wei Yu, and Thomas Pock. On learning optimized reaction diffusion processes for effective image restoration. In CVPR, 2015.
 [Danielyan et al., 2012] Aram Danielyan, Vladimir Katkovnik, and Karen Egiazarian. Bm3d frames and variational image deblurring. IEEE TIP, 21(4):1715–1728, 2012.
 [Davis and Yin, 2016] Damek Davis and Wotao Yin. Convergence rate analysis of several splitting schemes. In Splitting Methods in Communication, Imaging, Science, and Engineering, pages 115–163. 2016.
 [Deng et al., 2009] Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li FeiFei. Imagenet: A largescale hierarchical image database. In CVPR, 2009.
 [He and Wang, 2014] Liangtian He and Yilun Wang. Iterative support detectionbased split bregman method for wavelet framebased image inpainting. IEEE TIP, 23(12):5470–5485, 2014.
 [Krishnan and Fergus, 2009] Dilip Krishnan and Rob Fergus. Fast image deconvolution using hyperlaplacian priors. In NIPS, pages 1033–1041, 2009.
 [Kruse et al., 2017] Jakob Kruse, Carsten Rother, and Uwe Schmidt. Learning to push the limits of efficient fftbased image deconvolution. In ICCV, pages 4596–4604, 2017.
 [Lin et al., 2011] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for lowrank representation. In NIPS, 2011.
 [Lin et al., 2015] Zhouchen Lin, Risheng Liu, and Huan Li. Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Machine Learning, 99(2):287, 2015.
 [Liu et al., 2016] Risheng Liu, Guangyu Zhong, Junjie Cao, Zhouchen Lin, Shiguang Shan, and Zhongxuan Luo. Learning to diffuse: A new perspective to design pdes for visual analysis. IEEE TPAMI, 38(12):2457–2471, 2016.
 [Liu et al., 2018] Risheng Liu, Xin Fan, Shichao Cheng, Xiangyu Wang, and Zhongxuan Luo. Proximal alternating direction network: A globally converged deep unrolling framework. In AAAI, 2018.
 [Passty, 1979] Gregory B Passty. Ergodic convergence to a zero of the sum of monotone operators in hilbert space. Journal of Mathematical Analysis and Applications, 72(2):383–390, 1979.
 [Roth and Black, 2009] Stefan Roth and Michael J. Black. Fields of experts. IJCV, 82(2):205–229, 2009.
 [Schmidt et al., 2016] Uwe Schmidt, Jeremy Jancsary, Sebastian Nowozin, Stefan Roth, and Carsten Rother. Cascades of regression tree fields for image restoration. IEEE TPAMI, 38(4):677–689, 2016.
 [Schuler et al., 2013] Christian J Schuler, Harold Christopher Burger, Stefan Harmeling, and Bernhard Scholkopf. A machine learning approach for nonblind image deconvolution. In CVPR, pages 1067–1074, 2013.
 [Shi et al., 2016] HaoJun Michael Shi, Shenyinying Tu, Yangyang Xu, and Wotao Yin. A primer on coordinate descent algorithms. arXiv preprint arXiv:1610.00040, 2016.
 [Sun et al., 2013] Libin Sun, Sunghyun Cho, Jue Wang, and James Hays. Edgebased blur kernel estimation using patch priors. In ICCP, 2013.
 [Uwe and Stefan, 2014] Schmidt Uwe and Roth Stefan. Shrinkage fields for effective image restoration. In CVPR, pages 2774–2781, 2014.
 [Wang et al., 2008] Yilun Wang, Junfeng Yang, Wotao Yin, and Yin Zhang. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences, 1(3):248–272, 2008.
 [Xu and Yin, 2017] Yangyang Xu and Wotao Yin. A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing, pages 1–35, 2017.
 [Yang et al., 2016] Yingzhen Yang, Jiashi Feng, Nebojsa Jojic, Jianchao Yang, and Thomas S Huang. ell^0sparse subspace clustering. In ECCV, pages 731–747, 2016.
 [Yang et al., 2017] Yan Yang, Jian Sun, Huibin Li, and Zongben Xu. Admmnet: A deep learning approach for compressive sensing mri. In NIPS, 2017.
 [Zhang et al., 2014] Jian Zhang, Debin Zhao, Ruiqin Xiong, Siwei Ma, and Wen Gao. Image restoration using joint statistical modeling in a spacetransform domain. IEEE TCSVT, 24(6):915–928, 2014.
 [Zhang et al., 2017] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. Learning deep cnn denoiser prior for image restoration. In CVPR, 2017.
 [Zoran and Weiss, 2011] Daniel Zoran and Yair Weiss. From learning models of natural image patches to whole image restoration. In ICCV, pages 479–486, 2011.