# ISTA-Net: Iterative Shrinkage-Thresholding Algorithm Inspired Deep Network for Image Compressive Sensing

###### Abstract

Traditional methods for image compressive sensing (CS) reconstruction solve a well-defined inverse problem (convex optimization problems in many cases) that is based on a predefined CS model, which defines the underlying structure of the problem and is generally solved by employing convergent iterative solvers. These optimization-based CS methods face the challenge of choosing optimal transforms and tuning parameters in their solvers, while also suffering from high computational complexity in most cases. Recently, some deep network based CS algorithms have been proposed to improve CS reconstruction performance, while dramatically reducing time complexity as compared to optimization-based methods. Despite their impressive results, the proposed networks (either with fully-connected or repetitive convolutional layers) lack any structural diversity and they are trained as a black box, void of any insights from the CS domain. In this paper, we combine the merits of both types of CS methods: the structure insights of optimization-based method and the performance/speed of network-based ones. We propose a novel structured deep network, dubbed ISTA-Net, which is inspired by the Iterative Shrinkage-Thresholding Algorithm (ISTA) for optimizing a general norm CS reconstruction model. ISTA-Net essentially implements a truncated form of ISTA, where all ISTA-Net parameters (e.g. transforms, shrinkage thresholds, step size, etc.), are learned end-to-end to minimize a reconstruction error in training. Borrowing more insights from the optimization realm, we propose an accelerated version of ISTA-Net, dubbed FISTA-Net, which is inspired by the fast iterative shrinkage-thresholding algorithm (FISTA). Interestingly, this acceleration naturally leads to skip connections in the underlying network design. Extensive CS experiments demonstrate that the proposed ISTA-Net and FISTA-Net outperform existing optimization-based and network-based CS methods by large margins, while maintaining a fast runtime with 20 fps on a Quadro 6000 GPU.

## 1 Introduction

Compressive Sensing (CS) has drawn much attention as an alternative to the classical paradigm of sampling followed in data compression [1]. From much fewer acquired measurements than determined by Nyquist sampling theory, CS theory demonstrates that a signal can be reconstructed with high probability when it exhibits sparsity in some transform domain [2]. This novel acquisition strategy is much more hardware-friendly, which allows image or video capturing with a sub-Nyquist sampling rate. In addition, by exploiting the redundancy existing in a signal, CS conducts sampling and compression at the same time, which greatly alleviates the need for high transmission bandwidth and large storage space, enabling low-cost on-sensor data compression. CS has been applied in many practical applications, e.g. accelerating magnetic resonance imaging (MRI) [3].

In the past decade, many algorithms have been developed to solve the image CS reconstruction problem [4, 5, 8, 7, 6, 9]. Most of these methods exploit some structured sparsity as image prior and consequently solve a sparsity-regularized optimization problem in an iterative fashion. As a consequence, they usually require seconds to minutes to reconstruct/recover a single image on commodity hardware. It is clear that this high computational cost restricts the application of these methods for image CS. Another drawback of these optimization-based algorithms is the challenge they face in selecting the optimal image prior (e.g. optimal transforms) and the best tuning parameters for the underlying iterative optimization (e.g. step size and regularization coefficients).

Fueled by the powerful learning ability of deep networks and the availability of massive training data, several deep network-based image CS reconstruction algorithms have been recently proposed [10, 22, 11]. Unlike optimization-based methods, they directly learn the inverse mapping from the CS measurement domain to the original signal domain. Being non-iterative, these algorithms not only dramatically reduce time complexity (compared with optimization-based algorithms), but they also achieve impressive reconstruction performance. However, existing network-based CS algorithms either comprise fully-connected or repetitive convolutional network layers, which are trained without any underlying structure. They neglect the domain specific characteristics of the CS reconstruction problem. We believe that the lack of structural diversity in existing network-based image CS algorithms has been the bottleneck for performance improvement.

With the aim of developing a fast yet accurate algorithm for image CS reconstruction, we combine the merits of both types of approaches above. Specifically, we propose a novel structured deep network, dubbed ISTA-Net, which is inspired by the Iterative Shrinkage-Thresholding Algorithm (ISTA) [12] for optimizing a general norm CS reconstruction model. Implementing a truncated form of ISTA, ISTA-Net is composed of a fixed number of phases, each of which strictly corresponds to an ISTA-like iteration. However, all the parameters involved in ISTA-Net (e.g. transforms, shrinkage thresholds, step size, etc.) are learned end-to-end with the back-propagation algorithm that minimizes a reconstruction loss function in training, instead of a hand-crafted loss. The truncated implementation of ISTA coupled with all the learned parameters allows ISTA-Net to enjoy the advantages of the previous two types of CS reconstruction algorithms, namely fast and accurate reconstruction with a well-defined interpretability. Inspired by the fast iterative shrinkage-thresholding algorithm (FISTA) [12] and by introducing a single extra parameter, an accelerated version of ISTA-Net, dubbed FISTA-Net, can be developed with better performance and faster convergence. Experiments demonstrate that the proposed ISTA-Net and FISTA-Net outperform existing optimization-based and network-based image CS methods by large margins, while maintaining a fast computational speed.

#### Contributions.

(1) A novel structured deep network, dubbed ISTA-Net, is proposed by mapping the truncated ISTA to a deep network for image CS reconstruction, while all the parameters, including transforms, thresholds, step size, etc., are discriminatively trained without any hand-crafted setting. (2) We demonstrate that the proposed ISTA-Net greatly enhances the results of current existing image CS reconstruction algorithms and achieves state-of-the-art performance with fast computational speed.

## 2 Related Work

The purpose of compressive sensing is to reconstruct the original signal from its randomized CS measurements . Here, is a random projection. Because , this problem is typically ill-posed. However, CS theory demonstrates that a signal can be reconstructed with high probability when it exhibits sparsity in some domain. In the past decade, many algorithms were proposed to solve the image CS reconstruction problem and we generally divide them into two categories: traditional optimization-based CS methods and recent network-based CS methods.

Traditional Optimization-based CS Reconstruction. A signal is said to be sparse with respect to some transform , if its transform coefficients are mostly zeros, or nearly sparse if the dominant portion of coefficients are either zeros or very close to zeros. The sparsity of in is quantified by the number of significant elements within the coefficient vector . Given the linear measurements and according to CS theory, traditional image CS methods usually reconstruct the original image by solving the following convex optimization problem:

(1) |

where the sparsity of the vector is characterized by norm, which stands for adding all the absolute values of the entries in a vector.

Many classic domains (e.g. DCT, wavelet [4], and gradient domain [7]) have been applied in modeling Eq. (1). Since natural images are typically non-stationary, these fixed domains are signal independent or not adaptive, which result in poor reconstruction performance. To rectify the problem, many works incorporated additional prior knowledge about transform coefficients (e.g. statistical dependencies [5], structure [17], etc.) into the CS recovery framework. Furthermore, some elaborate priors exploiting the non-local self-similarity properties of natural images, such as the collaborative sparsity prior [6] or low-rank prior [8], have been proposed to improve CS reconstruction performance. However, all these traditional image CS reconstruction algorithms require hundreds of iterations to solve Eq. (1) by means of some iterative solvers (e.g. ISTA [14], ADMM [13], or AMP [9]), which inevitably gives rise to high computational cost and restricts the application of CS. In addition, the selected image prior (e.g. optimal transform) and all the optimization parameters (e.g. step size) are always hand-crafted, which is usually very challenging to determine. Fig. 1(a) illustrates the optimization process in network form solved by ISTA for CS reconstruction of Eq. (1) in the wavelet domain.

Network-based CS Reconstruction. Deep learning has achieved great success in many computer vision and image processing tasks, for instance, classification [18], semantic segmentation [19], image super-resolution [15], and image denoising and inpainting [20]. Recently, some deep network based algorithms have been proposed for image CS reconstruction. The basic idea is to take advantage of the powerful learning ability and the availability of massive training data to directly infer the inverse mapping from the CS measurement domain to the original signal domain. Mousavi et al. [10] first propose to apply a stacked denoising autoencoder (SDA) to learn the representation from training data and to reconstruct test data from their CS measurements. Iliadis et al. [22] propose to utilize a fully-connected neural network for video CS reconstruction. Kulkarni et al. [11] further propose a convolutional neural network based CS algorithm, dubbed ReconNet, which in training takes each image block and its CS measurement as the output and input, respectively. ReconNet is not only robust to sensor noise, but it also achieves promising reconstruction results, especially at low compression ratios. Fig. 1(b) illustrates the ReconNet framework, which includes one fully-connected layer and six convolutional layers. Obviously, network-based image CS algorithms are non-iterative, which dramatically reduces the time complexity compared with their optimization-based counterparts. However, we believe that their lack of structural diversity originating from the absence of CS domain specific insights is the bottleneck for further performance improvement.

## 3 Proposed ISTA-Net Framework for Image CS Reconstruction

In this section, we will first briefly review traditional ISTA optimization for image CS reconstruction, and then provide details on how ISTA-Net is designed accordingly.

### 3.1 ISTA Optimization for CS Reconstruction

The iterative shrinkage-thresholding algorithm (ISTA) is a popular first order proximal method. It has the advantage of being a simple gradient based algorithm involving simple computations like matrix multiplications followed by a soft thresholding function, and thus is adequate for solving large-scale linear inverse problems. Specifically, ISTA can solve the CS reconstruction problem in Eq. (1) by iterating between the following two update steps:

(2) | ||||

(3) |

Here, denotes the -th ISTA iteration, and is the step size. It is obvious to observe that in Eq. (2) is easy to calculate, while in Eq. (3) seems difficult. However, for some choices of , the corresponding update also has a simple closed form. For instance, choosing (identity matrix) leads to , while choosing (wavelet transform matrix) leads to , due to the orthogonality of . Here, is denotes the soft thresholding operator and is its threshold.

Fig. 1(a) illustrates the ISTA optimization process with iterations to solve Eq. (1), where is the wavelet matrix, and Fig. 2(a) further illustrates the -th iteration process, corresponding to Eq. (2) and Eq. (3). For typically sized images, ISTA usually requires hundreds of iterations to obtain a satisfactory reconstruction result, suffering from extensive computation. Moreover, the optimal transform and all the parameters such as and are always hand-crafted, which is usually very challenging to choose and tune. It is also not trivial to obtain in Eq. (3) for some complex non-orthogonal (or even non-linear) transform .

### 3.2 ISTA-Net Framework

To design a fast yet accurate algorithm for image CS reconstruction, we design a fixed size deep network (appropriately called ISTA-Net) whose functionality is inspired by traditional ISTA, but it overcomes its aforementioned drawbacks. ISTA-Net learns an effective inverse mapping from the CS measurement domain to the original signal domain in a straightforward manner. The basic idea of ISTA-Net is to map a truncated version of ISTA to a deep network while taking full advantages of both merits of ISTA and network-based CS methods. ISTA-Net is composed of a fixed number of phases, each of which strictly corresponds to one iteration in traditional ISTA. Moreover, the above two steps in each ISTA iteration, i.e. Eq. (2) and Eq. (3), correspond to two separate modules within ISTA-Net. We will discuss them in detail next.

Module: It corresponds to Eq. (2) and is used to generate the immediate reconstruction result . Note that is essentially the gradient of the data-fidelity term , computed at . To preserve the ISTA structure while increasing the network flexibility, the output of this module with input is defined as

(4) |

where defines the step size for the -th iteration. In the case of ISTA-Net, this constant becomes a learnable parameter instead of a fixed value when compared to ISTA.

Module: It aims to compute , corresponding to Eq. (3) with the input . Note that the output of Eq. (3) is actually the denoised version of subject to sparse regularization and induced by the hand-crafted transform . In order to improve the reconstruction performance and increase network capacity, we define the transform as a general non-linear transform function, denoted by , whose parameters are learnable. Inspired by the powerful representation power of the convolutional neural network (CNN), is designed as a combination of two convolutional layers and a non-linear ReLU (rectified linear unit) operator (refer to Fig. 3, denotes the number of feature maps and is set to 32 by default). Then, replacing in Eq. (3) with , we obtain the following update step:

(5) |

With the learnable and non-linear characteristics, it is expected that is able to achieve better representation for natural images. Now, the remaining problem relates to efficiently solving Eq. (5). Note that is the immediate reconstruction result at the -th iteration. In image inverse problems, one general and reasonable assumption is that each element of follows an independent normal distribution with common zero mean and variance [21]. Here, we also make this assumption and we prove the following theorem:

###### Theorem 1

Let be independent normal random variables with common zero mean and variance . If and given any matrices , , define a new random variable . Then, and are linearly related, i.e. , where is only related with and . (Please refer to the supplementary material for the proof and more details.)

Since each convolutional filter is a linear operator (and thus can be formulated in matrix form), has its equivalent matrix form as , where and correspond to the two convolutional filters, respectively. According to Theorem 1, we can make the following approximation:

(6) |

where is a scalar and is only related with . By incorporating this linear relationship into Eq. (5), it yields the following optimization:

(7) |

where . Therefore, we get a closed form for as follows:

(8) |

Due to the invertible characteristics, the wavelet transform enables the closed form solution for Eq. (5). Accordingly, we design the transform such that it is the inverse of , i.e. , where is any input image. has the symmetric structure with , as shown in Fig. 3, and here is named as symmetric constraint. Because and both contain learnable parameters, this symmetric constraint can be satisfied by incorporating it into the network training loss function (refer to subsection 3.5). Given both and , can be efficiently computed as follows:

(9) |

It is worth emphasizing that , as a shrinkage threshold, is a learnable parameter in this module. Similarly, to increase the network capacity, we do not constrain that , , and are the same at each phase. That is, each phase has its own , as illustrated in Fig. 1(c). Therefore, with all the learnable parameters, the output in this module should be updated as:

(10) |

Fig. 2(b) illustrates the -th phase process for the proposed ISTA-Net, corresponding to Eq. (4) and Eq. (10).

Parameters in ISTA-Net. From previous descriptions, one can clearly see that each module in each phase of ISTA-Net strictly corresponds to the updates steps in each ISTA iteration. The learnable parameter set in ISTA-Net includes the step size in the module, the parameters of the forward transform , the parameters of the backward transform , and the shrinkage threshold in the module. As such, , where is the phase number and is the total numbers of ISTA-Net phases. Note that all the parameters are phase-dependent and will be learned as neural network parameters.

With the learned network parameters, by feeding CS measurements into ISTA-Net, the output of ISTA-Net is the corresponding reconstructed image. To make it clear, an example of the reconstruction process of ISTA-Net with three phases is shown in Fig. 4. Here, The CS measurements, denoted by heptagonal number 0, is successively processed in the three-phase ISTA-Net in an order from heptagonal number 1 to heptagonal number 16, and generates the output as the reconstructed image.

### 3.3 Initialization

Given the training data pairs including the massive image blocks and their corresponding CS measurements, i.e. with , and , we propose to directly use the linear mapping to compute the initialization. Specifically, the linear mapping matrix can be efficiently solved by the following least square problem:

(11) |

where , and . Then, the closed form of is simply obtained by solving the underlying linear matrix system:

(12) |

Hence, given any input CS measurement , its corresponding initialization for ISTA-Net can be efficiently calculated with as:

(13) |

### 3.4 Enhanced Version of ISTA-Net: FISTA-Net

The well-known accelerated version of ISTA, called FISTA [12], has significantly better global convergence rate, while preserving the computational simplicity of ISTA. The main difference between FISTA and ISTA is that in Eq. (2) is not computed using the previous point , but rather at an intermediate point that is a specific linear combination of the previous two points and . Inspired by the FISTA idea and by introducing one extra learnable parameter for each ISTA-Net phase, as illustrated in Fig. 2(c), we further propose an enhanced version of ISTA-Net, dubbed FISTA-Net, whose learnable parameter set is .

### 3.5 Loss Function Design

Given the training data pairs , ISTA-Net first takes the CS measurement as input and generates the reconstructed result, denoted by as output. Note that, the purpose is to reduce the discrepancy between and while satisfying the symmetric constraint . Therefore, our end-to-end loss function for ISTA-Net is defined as follows:

(14) |

where , , , and are the total number of ISTA-Net phases, the total number of training blocks, the size of each block , and a regularization parameter. In this paper, is set to 0.01. Considering that the output of each phase in ISTA-Net or FISTA-Net can be regarded as the output of each iteration in ISTA or FISTA, we further propose another optimization-inspired loss function to reduce all the discrepancies between and at the same time:

(15) |

## 4 Experimental Results

Training setting:
We use the same set of 91 images used in [11] to generate the training data pairs. From the luminance component of all the training images, we first randomly extract 88912 3333 sized image blocks, i.e. and . We use Tensorflow to implement and train the ISTA-Net and FISTA-Net for each CS ratio. The optimization algorithm is Adam [16] with a learning rate of 0.001 (300 epochs), and batch size of 64 ^{1}^{1}1The sources code and training models of ISTA-Net and FISTA-Net will be published after this paper is accepted.. The experiments are also conducted on the same eleven test images as [11]. The reconstruction results are reported as the average Peak Signal-to-Noise Ratio (PSNR) over the test images.

Choice of phase number : To determine for ISTA-Net and FISTA-Net, we train the ISTA-Net and FISTA-Net with different phase numbers and show the average PSNR performances for test images in the case of ratio=0.25, as tabulated in Table 1. From Table 1, one can clearly see that FISTA-Net always achieves better performance than ISTA-Net. Furthermore, the PSNR increases quickly when and increases marginally when . By making a trade-off between the network complexity and the reconstruction performance, in this paper, the default phase number is set to 9 for ISTA-Net and FISTA-Net.

Phase Number | |||||
---|---|---|---|---|---|

ISTA-Net | 30.45 | 31.17 | 31.53 | 31.73 | 31.81 |

FISTA-Net | 31.03 | 31.57 | 31.83 | 31.90 | 31.90 |

Algorithm | ratio=0.25 | ratio=0.1 | Computational Time (unit: s) |
---|---|---|---|

TVAL3 [7] | 27.84 | 22.84 | 35.8 (CPU) |

NLR-CS [8] | 28.05 | 14.19 | 136.3 (CPU) |

D-AMP [9] | 28.17 | 21.14 | 47.5 (CPU) |

SDA [10] | 24.72 | 22.43 | 0.001 (GPU) |

ReconNet [11] | 25.54 | 22.68 | 0.011 (GPU) |

Enhanced ReconNet [11] | 27.64 | 24.06 | 0.011 (GPU) |

Proposed ISTA-Net | 31.53 | 25.80 | 0.049 (GPU) |

Proposed FISTA-Net | 31.83 | 26.01 | 0.052 (GPU) |

Comparison with state-of-the-art methods: we compare our algorithms with five representative methods, i.e., TVAL3 [7], NLR-CS [8], D-AMP [9], SDA [10], and ReconNet [11]. The first three belong to traditional optimization-based methods, while the last two are recent network-based methods. Note that D-AMP and ReconNet are the state-of-the-art optimization-based method and the network-based method, respectively. The PSNR reconstruction performance in the high-ratio case (ratio=0.25) and the low-ratio case (ratio=0.1) are summarized in Table 2. Here to be fair, like [11], all the competing methods reconstruct each image block independently from its CS measurements. It is worth emphasizing that, inspired by the structure of our proposed ISTA-Net, we also develop a enhanced version of ReconNet by simply removing its ReLU layer (marked by red square in Fig. 1(b)) before generating one feature map in the middle, named Enhanced ReconNet. From Table 2, one can observe that the developed enhanced ReconNet obtains more than 1 dB gain than the original ReconNet. Hence, we only show the results of the enhanced ReconNet in the following. Obviously, the proposed ISTA-Net and FISTA-Net outperform the existing algorithms in both low-ratio and high-ratio cases by a large margin, which fully demonstrates the effectiveness of ISTA-Net and FISTA-Net. The visual comparisons in the case of ratio=0.25 in Fig. 5 show that the proposed ISTA-Net and FISTA-Net are able to reconstruct more details and sharper edges without obvious blocking artifacts. The last column only shows the average time to reconstruct a 256 256 image with Intel(R) Core(TM) i7 CPU and Quadro 6000 GPU, which also indicates that the proposed ISTA-Net and FISTA-Net maintain a fast computational speed.

Advantage of loss function: By default, ISTA-Net and FISTA-Net are trained by the loss function , i.e. Eq. (14). To verify the advantage of the optimization-inspired loss function , i.e. Eq. (15), we denote their corresponding versions trained by the loss function as ISTA-Net* and FISTA-Net*. Then we conduct the experiments using these four types of networks with and . The results are provided in Table 3. It is clear to see that ISTA-Net* and FISTA-Net* achieve better results than ISTA-Net and FISTA-Net, which demonstrates the superiority of over .

Phase Number | ISTA-Net | FISTA-Net | ISTA-Net* | FISTA-Net* |
---|---|---|---|---|

31.17 | 31.57 | 31.78 | 31.79 | |

31.53 | 31.83 | 31.92 | 31.93 |

Algorithm | ratio=0.2 | ratio=0.3 | ratio=0.4 | ratio=0.5 | ratio=0.6 |
---|---|---|---|---|---|

TVAL3 [7] | 25.37 | 28.39 | 29.76 | 31.51 | 33.16 |

D-AMP [9] | 27.94 | 30.36 | 33.53 | 35.83 | 37.55 |

Enhanced ReconNet [11] | 27.18 | 29.11 | 30.49 | 31.39 | 32.44 |

ISTA-Net | 30.06 | 32.91 | 35.36 | 37.43 | 39.11 |

29.77 | 32.90 | 35.36 | 37.31 | 38.93 |

Effect of symmetric constraint: In subsection 3.2, the symmetric constraint is developed in order to correspond to ISTA optimization and efficiently solve Eq. (5). Here, we also show the results by various networks trained without this constraint in the case of , as illustrated in Fig. 6. Interestingly, in most cases, the reconstruction results with the symmetric constraint are better and more stable than the ones without the symmetric constraint.

Effect of feature map number : We also present the results by setting different feature map number, that is , in learnable transforms and . From Fig. 6, one can observe that, when , bigger is not able to bring any gains, which is prone to overfitting instead.

Generalization ability: Similar to traditional optimization-based CS methods, our proposed ISTA-Net has the generalization ability to apply the learned net from one CS ratio to other ratios. It is worth emphasizing that current existing deep network based CS methods, such as ReconNet [11] and SDA [10], do not have this ability, due to their end-to-end network without using initialization and CS domain information. Table 4 shows the average PSNR (dB) performance for test images with CS ratio from 0.2 to 0.6 by different methods, including TVAL3 [7], D-AMP [9], Enhanced ReconNet [11] and the proposed ISTA-Net. It is also obvious to see that the ISTA-Net works well over a wide ratio range, and outperforms current state-of-the-art algorithm D-AMP [9] on average in PSNR by a large margin over all the cases. Note that ISTA-Net is trained separately for each ratio from 0.2 to 0.6, and the same goes for Enhanced ReconNet [11]. To test the generalization ability, we only use the ISTA-Net learned by the middle ratio i.e. 0.4 for all the ratios’ CS reconstruction, which is named as . One can clearly see that still achieves competitive reconstruction performance on all the ratios, resulting in a good generalization ability. In particular, the less difference is obtained when the ratio is closer to 0.4.

## 5 Conclusion

To develop a fast yet accurate CS reconstruction algorithm, in this paper, we make full use of the merits of the structure insights of optimization-based methods and the performance/speed of network-based ones, and propose a novel structured deep network, dubbed ISTA-Net, which is inspired by the Iterative Shrinkage-Thresholding Algorithm (ISTA), and its enhanced version FISTA-Net. The proposed ISTA-Net and FISTA-Net not only maintain a fast runtime, but also greatly improve the results of current optimization-based and network-based CS methods. One interesting direction of our future work is to extend ISTA-Net and FISTA-Net for other applications.

## References

- [1] Duarte, M. F., Davenport, M. A., Takbar, D., Laska, J. N., Sun, T., Kelly, K. F., and Baraniuk, R. G. (2008). Single-pixel imaging via compressive sampling. IEEE Signal Processing Magazine, 25(2), 83-91.
- [2] Candes, E. J., and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies?. IEEE transactions on Information Theory, 52(12), 5406-5425.
- [3] Lustig, M., Donoho, D., and Pauly, J. M. (2007). Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine, 58(6), 1182-1195.
- [4] Mun, S., and Fowler, J. E. (2009, November). Block compressed sensing of images using directional transforms. In 2009 16th IEEE International Conference on Image Processing, (pp. 3021-3024).
- [5] Kim, Y., Nadar, M. S., and Bilgin, A. (2010, September). Compressed sensing using a Gaussian scale mixtures model in wavelet domain. In 2010 17th IEEE International Conference on Image Processing, (pp. 3365-3368).
- [6] Zhang, J., Zhao, D., Zhao, C., Xiong, R., Ma, S., and Gao, W. (2012). Image compressive sensing recovery via collaborative sparsity. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2(3), 380-391.
- [7] Li, C., Yin, W., Jiang, H., and Zhang, Y. (2013). An efficient augmented Lagrangian method with applications to total variation minimization. Computational Optimization and Applications, 56(3), 507-530.
- [8] Dong, W., Shi, G., Li, X., Ma, Y., and Huang, F. (2014). Compressive sensing via nonlocal low-rank regularization. IEEE Transactions on Image Processing, 23(8), 3618-3632.
- [9] Metzler, C. A., Maleki, A., and Baraniuk, R. G. (2016). From denoising to compressed sensing. IEEE Transactions on Information Theory, 62(9), 5117-5144.
- [10] Mousavi, A., Patel, A. B., and Baraniuk, R. G. (2015, September). A deep learning approach to structured signal recovery. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), (pp. 1336-1343).
- [11] Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., and Ashok, A. (2016). ReconNet: Non-iterative reconstruction of images from compressively sensed measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 449-458).
- [12] Beck, A., and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on Imaging Sciences, 2(1), 183-202.
- [13] Afonso, M. V., Bioucas-Dias, J. M., and Figueiredo, M. A. (2011). An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems. IEEE Transactions on Image Processing, 20(3), 681-695.
- [14] Blumensath, T., and Davies, M. E. (2009). Iterative hard thresholding for compressed sensing. Applied and computational harmonic analysis, 27(3), 265-274.
- [15] Dong, C., Loy, C. C., He, K., and Tang, X. (2014, September). Learning a deep convolutional network for image super-resolution. In European Conference on Computer Vision (pp. 184-199).
- [16] Kingma, D., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- [17] He, L., and Carin, L. (2009). Exploiting structure in wavelet-based Bayesian compressive sensing. IEEE Transactions on Signal Processing, 57(9), 3488-3497.
- [18] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
- [19] Long, J., Shelhamer, E., and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3431-3440).
- [20] Xie, J., Xu, L., and Chen, E. (2012). Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems, (pp. 341-349).
- [21] Zhang, J., Zhao, D., and Gao, W. (2014). Group-based sparse representation for image restoration. IEEE Transactions on Image Processing, 23(8), 3336-3351.
- [22] Iliadis, M., Spinoulas, L., and Katsaggelos, A. K. (2016). Deep fully-connected networks for video compressive sensing. arXiv preprint arXiv:1603.04930.

Supplementary Material

Theorem 1 Let be independent normal random variables with common zero mean and variance . If and given any matrices , , define a new random variable . Then, and are linearly related, i.e. , where is only related with and .

Proof: Let , since then we have , where . Note that , and denotes the identity matrix. Let , we first discuss the relationship between and .

Obviously, the variance of can be expressed as , is related with , then the probability density function of , denoted by , is expressed as

According to , we have the mean and the variance of as below:

Our purpose is to compute the covariance matrix of , that is . After computing , which is on the diagonal position of , we now calculate , by

Assume that the joint probability density function of and is written as

where is the correlation coefficient between and . Then,

Furthermore, define , the computation of is transformed to be

where .

Hence,

According to the expressions of and , it is clear to see that can be formulated as

where is only determined by the matrix .

Therefore, the covariance matrix of , i.e. is written as

Due to ( denotes the trace of a matrix) and , then we have

where . That means and are linearly related.