Intelligent Parameter Tuning in Optimizationbased Iterative CT Reconstruction via Deep Reinforcement Learning
Abstract
A number of imageprocessing problems can be formulated as optimization problems. The objective function typically contains several terms specifically designed for different purposes. Parameters in front of these terms are used to control the relative weights among them. It is of critical importance to tune these parameters, as quality of the solution depends on their values. Tuning parameter is a relatively straightforward task for a human, as one can intelligently determine the direction of parameter adjustment based on the solution quality. Yet manual parameter tuning is not only tedious in many cases, but becomes impractical when a number of parameters exist in a problem. Aiming at solving this problem, this paper proposes an approach that employs deep reinforcement learning to train a system that can automatically adjust parameters in a humanlike manner. We demonstrate our idea in an example problem of optimizationbased iterative CT reconstruction with a pixelwise totalvariation regularization term. We set up a parameter tuning policy network (PTPN), which maps an CT image patch to an output that specifies the direction and amplitude by which the parameter at the patch center is adjusted. We train the PTPN via an endtoend reinforcement learning procedure. We demonstrate that under the guidance of the trained PTPN for parameter tuning at each pixel, reconstructed CT images attain quality similar or better than in those reconstructed with manually tuned parameters.
I Introduction
A number of medical imageprocessing problems can be formulated as solving optimization problems. In such problems, the objective function typically contain several terms carefully designed for different purposes. A set of parameters are used to control the relative weights of these terms in order to achieve a satisfactory solution quality. Take a typical problem of iterative Computed Tomography (CT) reconstruction as an example, it can be formulated as
(1) 
where is the image vector to be reconstructed by solving the optimization problem, stands for the xray projection operator, and the measured projection data. The first term is a datafidelity term, minimizing of which ensures agreement between and the measurement . stands for a regularization term specifically designed to enforce quality of the solution image from a certain aspect, e.g. piecewise smoothness. is the parameter that is used to control the tradeoff between this regularization term and the datafidelity term. Over the years, a number of regularization terms have been developed to successfully restore a solution using undersampled or noisy measurement . Examples include, but are not limited to, total variation (TV)[1, 2, 3], tight frame (TF)[4, 5], and nonlocal means (NLM)[6, 7, 8].
Despite the success, parameter tuning in these optimizationbased image processing problems is inevitable. Manually adjusting the parameters for the best image quality is not uncommon in literature[2, 3, 5, 7, 8]. Yet this is a tedious approach, as one has to carefully navigate through the parameter space to find the optimal value. The required efforts and human time impede clinical applications of those novel imageprocessing methods. Moreover, manual parameter tuning becomes an increasingly challenging task in those problems with multiple regularization terms. An extreme example is CT reconstruction but with weighting parameter freely adjustable at each pixel[9, 10]. Clearly, the substantial amount of parameters makes manual parameter tuning infeasible. Therefore, it is highly desirable to develop a method for automatic parameter tuning. Over the years, this problem has attracted a lot of research interests. For instance, generalized cross validation and Lcurve methods have been used to choose the regularization parameter[11, 12, 13]. It has also been proposed to develop a method to assess image quality, which can be used to guide parameter adjustment towards the direction of improving the quality[14, 15]. In certain contexts, such as the CT reconstruction problem, it may be even possible to estimate the level of data contamination based on physics or mathematical principles. This can provide valuable information to set the parameter values[16]. Despite these efforts, a practical solution that is applicable to general problems still does not exist, calling for further investigations.
Although it is quite difficult for a computer to automate the parameter tuning process, this task seems to be less of a problem for humans. One typically has a strong intuition about which direction the parameter should be adjusted based on the observed image quality. Again, let us take the iterative CT reconstruction problem in Eq. (1) as an example. By looking at the solution image, one knows that the regularization term needs to be enhanced, if the solution appears to be noisy, or be relaxed otherwise. Based on this fact, it is of interest and importance to model this remarkable intuition and capability in an intelligence system, which can then be used to solve the parameter tuning problem from a new angle.
Not until recently does the tremendous success in deeplearning regime shine a light in this direction. In the past a few years, deep learning has clearly demonstrated its power in numerous medical image processing problems[17, 18, 19, 20, 21, 22]. More importantly, it was found that humanlevel intelligence can be spontaneously generated via deeplearning schemes, which enables a system to perform a certain task in a humanlike fashion, or even better than humans. In a pioneer work, an artificial intelligent system was developed to realize humanlevel control of Atari computer games[23, 24]. Employing a deep Qnetwork approach, the system was trained through the framework of deep reinforcement learning to learn how to interact with the environment, i.e. play an Atari game. The results were remarkable: the trained system was able to achieve a level comparable to that of a professional human players in a set of 49 Atari games.
Motivated by this fact, we propose in this paper to develop an intelligent system to accomplish the parameter tuning task in optimizationbased imageprocessing problems. We take a CT reconstruction problem as a example to demonstrate our idea. Specifically, we will develop a parameter tuning policy network (PTPN), which can intellectually determine the direction and magnitude of parameter tuning by observing an input image patch. The rest of this paper is organized as follows. Sec. II will introduce the example problem of TVbased CT reconstruction with pixelwise regularization. We will also describe the PTPN structure and how to train it to develop the skill of parameter tuning. Sec. III will present our validation studies and results. Finally, we will make some discussions in Sec. IV and conclude the study in Sec. V.
Ii Methods
Iia An example CT reconstruction problem
In this paper, we consider the following iterative CT reconstruction problem as an example to demonstrate our idea:
(2) 
This approach falls in to the regime of TVbased regularization[1], which penalizes the norm of the image gradient to ensure image smoothness while preserving edges. In the second term of the objective function, we consider a general case that extends into a vector. Each entry of controls the weight of an image pixel. The substantially higher amount of parameters in this example problem than a typical singleparameter TV model highlight the need for an automatic parameter tuning system.
There are a number of novel numerical algorithms to solve this optimization problem for fixed parameter [25, 26, 27]. In this study, we use the alternating direction method of multipliers (ADMM)[27]. It introduces an auxiliary variable and adds a constraint . The problem can be handled by tackling the augmented Lagrangian:
(3) 
where is a parameter in the algorithm. Major steps of the ADMM algorithm is outlined in Fig. 1. Due to the large scale of the reconstruction probelm, the matrix inverse operation in Line 2 is achieved using conjugate gradient algorithm[28].
IiB System setup
Our system tunes parameters in an iterative manner. Specifically, at the iteration step , it observes the result generated by the image reconstruction system using ADMM algorithm at its convergence, . Note that here is the solution at the convergence of the ADMM algorithm, rather than the image during the ADMM iteration. For each pixel , the image patch centering around this pixel, denoted as is fed to the parameter tuning system. The system then outputs direction and magnitude by which the parameter is adjusted. Here, we explicitly associate with the index , as it will vary from step to step. Such a process continues, until a stopping criteria is met.
We would like to achieve the parameter tuning capability using the optimal actionvalue function in the Qlearning regime[29]. This function is defined as
(4) 
where is the reward at iteration step , is a discount factor, and stands for the parameter tuning policy: taking an action after observing a state . Here, we consider a deterministic policy that generates a unique action based on the observed state . Specifically, we follow a greedy strategy that selects the action maximizing the value under the input , i.e. . In the particular problem of interest here, the state is an image patch . We consider five possible output actions: keeping the parameter unchanged, increasing or decreasing it by , and increasing or decreasing it by . We choose the values of or as possible amounts of changes in the system, as we expect these values will not critically affect the capability of parameter tuning of our system.
Under this framework, we parameterize the value function using a convolutional neural network, where are network parameters. This network is referred as ParameterTuning Policy Network (PTPN) from here on. The structure of the network is depicted in Fig. 2. will be determined through a reinforcement learning process, as will be described in the next section.
IiC PTPN training via deep reinforcement learning
IiC1 General deep reinforcement learning idea
One particular property of the function is the Bellman equation[30]:
(5) 
where is the state of the imaging system that follows after taking the action . With this identity, for a function , it is possible to define the loss function as the square of the deviation from this identity in order to quantify the deviation of from . When the function is approximated with a network, , this loss function is .
To determine through a reinforcement learning process, we introduce another variable and hence define a target term . For a fixed , we consider the loss function
(6) 
Note that the inside the target term is related to by the action . At the end of the learning process, and in Eq. (6) should converge. This can be achieved by performing learning in a sequence of stages. In each stage, the parameter is kept unchanged, whereas the parameter is optimized towards minimizing the loss function. At the end of each stage, is updated to the optimized parameter .
Within a stage, since is kept unchanged, the gradient of the loss function with respect to is simply
(7) 
The last term can be computed via the standard backpropagation approach in a typical network training process. As in many other studies, we use stochastic gradient descent approach that computes the gradient and therefore updates the network parameter using a subset of training data randomly selected from the full training data set. is then updated as , where is learning rate and is the index of iteration.
IiC2 Training PTPN
We train the PTPN following the general idea outlined in the previous section. As such, we repeatedly perform image reconstruction using the ADMM algorithm in Fig. 1. At the step , the solution image is observed. For each pixel we use a greedy to select an action to adjust the parameter value . Specifically, with probability of , we randomly select an action among all the possible choices with equal chances. Otherwise, we select the action that attains the highest output value of with the current image patch as input; we choose . With the selected action, we update the parameter accordingly. After the parameters of all the pixels are updated, we perform image reconstruction one more time with as the initial guess, yielding an updated solution image .
At this point, we randomly sample a number of patches from the image to generate training data for the PTPN. For each selected patch at location , we gather the information of the solution image patches , , the reward , as well as the action . The reward function at this patch is defined as
(8) 
where is the ground truth image. stands for the standard norm of a vector. We define this reward function to encourage image patch updates that are moving towards the ground truth image patch. The inverse function is utilized to amplify the change between and : as the parameter is tuned through a sequence of steps, an additional step typically improves the image quality only slightly, and hence reduces the distance to the ground truth by a small amount.
The collected information at different locations forms a set of data . The data is then put into a pool of training data set. Finally, to train PTPN, a subset of the training data randomly selected from the pool are used to update parameter to minimize the loss function in Eq. (6) with gradient computed using Eq. (7). This strategy is known as experience replay in the deepQ learning regime, which is designed to overcome the problem that the training data generated in a sequential steps of actions are highly correlated[23, 24]. This process continues for a preset number of steps . Within this process, we update to , after every steps.
The training process described above is executed in multiple epochs. Each epoch contains the same training process on multiple data sets of different CT cases. The overall algorithm structure is summarized in Fig. 3.
IiD Implementation details
We implemented this algorithm using Python with TensorFlow. The computational platform is a desktop workstation with a Intel Xeon 3.5 GHz CPU processor, 8 GB memory and an Nvidia Quadro M4000 GPU card.
For the CT reconstruction part, we consider a fanbeam projection geometry with 180 projections equally spaced over a angular range. The image has a resolution of pixels. A relatively low resolution is chosen due to computational concerns. The xray detector is of a line shape with elements covering a cm range. The sourcetoisocenter distance is cm and the isocentertodetector distance is cm. The projection matrix is computed using the standard Siddon’s algorithm [31]. We select six patient CT images at different anatomical sites including brain, lung, and abdomen as training images. Projection data is simply calculated as , where is the ground truth image and is a Gaussian noise signal with zero mean and variance determined by as in a previous study[32]. The averaged relative noise level is . Values of relevant parameters used in training are summarized in Table I.
Parameter  Value  Comments  

Stopping criteria in ADMM  
0.  Discount rate  
Parameter of greedy approach  
Number of epochs  




Learning rate when updating  
128  Number of data for training each time  
300  Number of steps to update 
Iii Validation studies and results
Iiia Training process and trained PTPN
During the training process, we monitor the quality of the trained PTPN shown in Fig. 4. Both the average output of the PTPN and the reward follow an increasing trend although with some oscillations. This indicates that the PTPN is adjusted gradually in this reinforcement learning process towards predicting actions with high reward values.
IiiB Parameter tuning in CT reconstruction
IiiB1 CT reconstruction under PTPN guidance
With the PTPN trained, we use it to guide parameter tuning in a CT reconstruction problem. As such, we select a ground truth CT image and generate the projection data with noise added. We first set the parameter arbitrarily to , a constant value that is likely not optimal. After that, we apply PTPN to guide parameter tuning as outlined in the first paragraph in Sec.II.B. The tuning process stops, when the relative difference between CT images in two successive reconstruction is less than .
To observe this process in detail, we select a test case that is not used in training. Fig. 5(a)(c) present reconstructed CT images at step 1, 4, and 7. It is clear that the image quality is improved with the parameter tuned. Quantitatively, we compute the relative error at different steps and plot it in Fig. 5(d). A monotonic decay trend is observed, indicating the effectiveness of parameter tuning.
IiiB2 Reconstruction results
Fig. 6 is a case that is used in training, whereas Fig. 7 is the same one in Fig. 5, which is not included in training. Since we arbitrarily set initial values of , which is too small in these two cases, the resulting images contain a lot of noise (Fig. 6(b) and 7(b)). After the parameter is tuned by PTPN, the image quality in both cases is substantially improved (Fig. 6(c) and 7(c)).
We compare the results with those under manually tuned parameters. Since it is impractical for one to adjust the parameter for each individual pixel, we consider a special context that the parameter is a constant throughout the image and we manually adjust this parameter value for the best image quality. The appropriate parameter values are for Fig. 6 and for Fig. 7. Fig. 6(d) and 7(d) depict images reconstructed under these parameters in the two cases, respectively. It is found that the images still contain a certain amount of noise and the quality is inferior to those with parameters tuned by PTPN.
As for the parameter maps tuned by the PTPN shown in Fig. 6(e) and 7(e), it is observed that PTPN deliberately reduces parameter values most around image edges. This is understandable. Reducing parameters at those pixels decreases the amount of regularization in those areas, which is beneficial in terms of preserving image edges.
Interestingly, for the simple problem in Eq. (2), it is possible to derive the optimal parameter map . As such, let us take the gradient of the objective function and set it to zero at : . This implies that the optimal parameter map is
(9) 
The numerator in this expression is more or less an image of noise that is obtained by backprojecting the residual error in the projection domain to the image domain. Here, we neglect the image structure of the noise and plot the image in Fig. 6(f) and 7(f) for the two cases, respectively. The images shows that is small along the image edges. Comparing subfigures (e) and (f) in Fig. 6 and Fig. 7, the similarity between corresponding pair of images implies that PTPN can intelligently adjust towards the optimal parameter maps. Note that this intelligence is purely developed by the PTPN itself through the reinforcement learning process. Except providing rewards for an action, we do not explicitly give any information regarding how to tune the parameters.
Quantitatively, we evaluate the image quality using relative error and Peak SignaltoNoise Ratio (PSNR). Table II summarizes the results in six training and six testing cases. These cases are CT images at different anatomical sites. In each case, we present the metrics for the image under manually tuned parameters, under an arbitrarily set initial parameter, and under parameter tuned by PTPN. For all the training cases, the images under PTPNtuned parameters achieve the smallest error and the highest PSNR, indicating the satisfactory quality of the trained PTPN. Among the six testing cases, the PTPNtuned parameter yields the smallest errors and the highest PSNRs in five cases (14, 6). For the case 5, the difference between manually tuned and PTPNtuned results is small.
Case  (dB)  (dB)  (dB)  

Training  1  4.47  7.50  4.21  39.13  34.65  39.67 
2  4.69  7.72  4.56  38.67  34.33  38.90  
3  10.77  11.89  10.38  31.57  30.70  31.89  
4  12.92  13.61  12.54  29.56  29.11  29.82  
5  3.68  6.83  3.62  40.04  35.07  40.59  
6  3.82  6.78  3.55  41.09  36.11  41.74  
Testing  1  4.35  6.93  4.24  44.28  40.22  44.50 
2  12.17  12.35  12.13  29.45  29.32  29.48  
3  10.30  11.49  8.48  32.14  31.19  33.83  
4  5.42  8.17  5.32  36.67  33.10  36.89  
5  4.62  7.19  4.95  36.91  33.07  36.31  
6  8.56  10.29  7.56  31.69  30.09  32.78 
IiiB3 Application to other cases
The PTPN determines the way of parameter tuning based on observed image patch. It is expected that the trained PTPN is also applicable to image reconstruction under settings that are different from that in training. To demonstrate this fact, we also applied PTPN to image reconstruction in cases with different number of projections, noise levels, and projection geometry. Fig. 8(a) and (b) are the same case as in Fig. 7 but with and noise in the projection data, different from the noise level of in training. Fig. 8(c) is the case with only projections. In Fig. 8(d) we change the isocentertodetector distance to cm. In all the cases, PTPN is able to adjust parameters to yield images with satisfactory quality. The resulting parameter maps in Fig. 8(e)(h) are all similar to the ground truth shown in Fig. 7(f).
Iv Discussions
Relation to other works. The power of deep learning in medical image processing has been clearly demonstrated in a spectrum of problems. Among these studies, most used supervised training to determine parameters inside a network in order to establish a map between input and output images. In [20, 21], a network was set up to map a noisecontaminated CT image acquired at a lowdose level to the clean image. In [17], deep residual learning was employed to map a CT image with streak artifacts caused by undersampling to the artifact image, which was further subtracted from the original image to eliminate the artifact. In an study [33] that viewed the iterative image reconstruction process as a data flow in a network under the ADMM algorithm, the supervised learning process enabled discovery of the algorithm parameters, such as image filters and threshold values. Comparing to these novel works, our study is different in twofold. First, the purpose of using deep learning is different. Instead of trying to predict the underlying true solution or image artifacts, the purpose of setting up a PTPN is to predict a dynamic policy applicable to the image reconstruction problem in Eq. (2). Under the guidance of this policy, the output image of the reconstruction algorithm is directed towards a satisfactory quality. Second, the method to train our network is also different from the supervised training in previous works. Instead of using labeled training pairs in a supervised training fashion, we employed the reinforcement learning strategy. This strategy let the algorithm to play by itself and get rewards based on the image and the selected action. Through the training process, the PTPN spontaneously discovered the appropriate strategy for an input system state. This was the process in which intelligence is generated.
The CT reconstruction problem with pixelwise regularization has been investigated in previous studies[9, 10]. It was proposed to perform a sequence of reconstructions with parameters adjusted based on the reconstructed images. The motivation was to detect image edges and to tune down the regularization weights for the purpose of edge preservation. As opposed to designing this explicit rule of parameter tuning, this study discovered the rule via the reinforcement learning process. It is interesting to observe that intelligence can be correctly generated, which coincides with previous human knowledge.
Necessity of deep reinforcement learning. What is ultimately learned by the PTPN is evaluation of the image quality and the link to parameter tuning. With this in mind, one may argue that the complex reinforcement learning technique is probably unnecessary, as one can simply perform supervised learning by using a sizable data set containing paired data of image patches and corresponding ways of parameter tuning. We agree with this statement to a certain extent, but still think our study is meaningful. For this CT reconstruction problem, it is straightforward to generate labeled training pairs (image patch and direction of parameter tuning) to allow supervised training. Yet if we would like to label an image patch with not only the direction of parameter tuning, i.e. increase or decrease, but also with the amount of parameter change, i.e. or as in our example, it becomes quite difficult to generate training data. Hence, the advantage of reinforcement learning is to automatically learn a more comprehensive policy. Beyond the problem of CT reconstruction, it may not be easy to generate labeled training pairs in many optimizationbased inverse problems. However, since very often one has a good sense of judging the output results, it is still relatively easy to quantify the result quality via a reward function. This allows the use of reinforcement learning to establish the policy in those problems for which labeled training pairs are hard to get.
Relevance to other problems. This study uses an optimizationbased iterative CT reconstruction problem as an example to show that it is possible to achieve intelligent parameter tuning via a deep learning approach. With the rapid growth of deep learning techniques in CT reconstruction area, the impact of this study may diminish. However, we think studying the general task of parameter tuning is still of significance and deep learning opens a new window to tackle this problem. First, parameter tuning is not a problem unique to the CT reconstruction regime, but generally existing in many areas. Even beyond the scope of image processing, many other decision making problems in medicine can be solved in an optimization approach, for which parameter tuning is an indispensable task. One notable example is treatment planning in cancer radiation therapy[34]. Even with a modern treatment planning system to solve the underlying optimization problem, a hospital still needs to hire a number of dosimetrists to manually tune the parameters in order to generate plans meeting clinician’s requirements. This fact clearly highlights the needs for and potential benefits of an intelligent parameter tuning system. Second, even for the deep learning technique itself, the training stage has a number of parameters to be tuned by the researcher to achieve the best performance. These parameters include, but are not limited to, learning rate, number of epochs, size and number of filters, etc. It would be an interesting and important step to develop a parameter tuning system to handle the adjustment of these parameters. Meanwhile, we have to admit that solving the parameter tuning problem in the area beyond the simple example of CT reconstruction is apparently much more challenging. We hope our study here can shed some light in this direction and trigger deeper investigations in future.
Limitations and future directions. This study has the following limitations. First, due to limitation on computational power, we only considered images with a relatively low resolution in a small number of cases. It is our plan to extend the studies to highresolution images that are of more clinical relevance. We will also use more cases for training and testing to yield a more robust PTPN. The second limitation of this study is that PTPN has to wait for the ADMM iterative process to finish, before it can adjust parameters. Although the image quality resulting from this this approach is acceptable, waiting for the ADMM to finish reduces the overall workflow efficiency. This can be potentially improved by using another reconstruction algorithm with a higher convergence rate. Another possible way of acceleration is to predict the converged CT image at an early step of the iterative ADMM reconstruction, for instance using a deep learning approach[22]. Third, PTPN lays a general framework on developing strategy to improve image quality. The current setup in Eq. (2) limits possible policy to the five options of modifying the regularization parameters. However, in general, it is possible to include policy that act more directly on the images, such as reducing noise and artifacts, etc. It is noted that deeplearning has achieved tremendously in each of these CT image enhancement problems[17, 18, 19]. Using them under the guidance of a policy network is expect to yield a complete and comprehensive image reconstruction system that can automatically handle various of data contamination.
V Conclusion
In this paper, we have aimed ourselves at shedding some lights to the task of automatic parameter tuning in an optimization problem, which is a typical task in a number of imageprocessing, or nonimageprocessing problems. The significance of this study is underscored by the fact that the solution quality is critically determined by the parameter values, and yet there is no satisfactory way of automatically adjusting parameters. We proposed to solve this problem by constructing a policy network, which can be trained to guide parameter tuning. We demonstrated our idea in an example problem of optimizationbased iterative CT reconstruction with a pixelwise TV regularization term. We configured a PTPN to map a CT image patch to the direction and magnitude of tuning the parameter at the patch center. PTPN was trained via an endtoend reinforcement learning procedure. A series tests demonstrated that the trained PTPN is able to intelligently determine the way of parameter adjustment. Under the guidance of PTPN, the reconstructed CT images achieved image quality similar or better than that under manually tuned parameters.
Acknowledgment
The authors would like to thank funding support from NIH grant 1R21EB021545.
References
 [1] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1, pp. 259 – 268, 1992. [Online]. Available: http://www.sciencedirect.com/science/article/pii/016727899290242F
 [2] E. Y. Sidky and X. Pan, “Image reconstruction in circular conebeam computed tomography by constrained, totalvariation minimization,” Physics in Medicine and Biology, vol. 53, no. 17, p. 4777, 2008. [Online]. Available: http://stacks.iop.org/00319155/53/i=17/a=021
 [3] X. Jia, Y. Lou, R. Li, W. Y. Song, and S. B. Jiang, “Gpubased fast cone beam ct reconstruction from undersampled and noisy projection data via total variation,” Medical Physics, vol. 37, no. 4, pp. 1757–1760, 2010. [Online]. Available: http://dx.doi.org/10.1118/1.3371691
 [4] B. Dong, Z. Shen, M. Series, B. Dong, Z. Shen, B. Dong, and Z. Shen, “Mrabased wavelet frames and applications,” in IAS Lecture Notes Series, Summer Program on ÒThe Mathematics of Image ProcessingÓ, Park City Mathematics Institute, 2010.
 [5] X. Jia, B. Dong, Y. Lou, and S. B. Jiang, “Gpubased iterative conebeam ct reconstruction using tight frame regularization,” Physics in Medicine and Biology, vol. 56, no. 13, p. 3787, 2011. [Online]. Available: http://stacks.iop.org/00319155/56/i=13/a=004
 [6] Y. Lou, X. Zhang, S. Osher, and A. Bertozzi, “Image recovery via nonlocal operators,” Journal of Scientific Computing, vol. 42, no. 2, pp. 185–197, Feb 2010. [Online]. Available: https://doi.org/10.1007/s1091500993202
 [7] Z. Chen, H. Qi, S. Wu, Y. Xu, and L. Zhou, “Fewview ct reconstruction via a novel nonlocal means algorithm,” Physica Medica, vol. 32, no. 10, pp. 1276 – 1283, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1120179716301077
 [8] X. Jia, Z. Tian, Y. Lou, J.J. Sonke, and S. B. Jiang, “Fourdimensional cone beam ct reconstruction and enhancement using a temporal nonlocal means method,” Medical Physics, vol. 39, no. 9, pp. 5592–5602, 2012. [Online]. Available: http://dx.doi.org/10.1118/1.4745559
 [9] W. Guo and W. Yin, “Edgecs: edge guided compressive sensing reconstruction,” Proc.SPIE, vol. 7744, pp. 7744 – 7744 – 10, 2010. [Online]. Available: http://dx.doi.org/10.1117/12.863354
 [10] Z. Tian, X. Jia, K. Yuan, T. Pan, and S. B. Jiang, “Lowdose ct reconstruction via edgepreserving total variation regularization,” Physics in Medicine and Biology, vol. 56, no. 18, p. 5949, 2011. [Online]. Available: http://stacks.iop.org/00319155/56/i=18/a=011
 [11] G. H. Golub, M. Heath, and G. Wahba, “Generalized crossvalidation as a method for choosing a good ridge parameter,” Technometrics, vol. 21, no. 2, pp. 215–223, 1979. [Online]. Available: http://www.jstor.org/stable/1268518
 [12] P. C. Hansen, “Analysis of discrete illposed problems by means of the lcurve,” SIAM Review, vol. 34, no. 4, pp. 561–580, 1992. [Online]. Available: https://doi.org/10.1137/1034115
 [13] S. Ramani, Z. Liu, J. Rosen, J. F. Nielsen, and J. A. Fessler, “Regularization parameter selection for nonlinear iterative image restoration and mri reconstruction using gcv and surebased methods,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3659–3672, Aug 2012.
 [14] X. Zhu and P. Milanfar, “Automatic parameter selection for denoising algorithms using a noreference measure of image content,” IEEE Transactions on Image Processing, vol. 19, no. 12, pp. 3116–3132, Dec 2010.
 [15] H. Liang and D. S. Weller, “Comparisonbased image quality assessment for selecting image restoration parameters,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5118–5130, Nov 2016.
 [16] T. Bai, H. Yan, L. Ouyang, D. Staub, J. Wang, X. Jia, S. B. Jiang, and X. Mou, “Data correlation based noise level estimation for cone beam projection data,” Journal of XRay Science and Technology, no. Preprint, pp. 1–20, 2017.
 [17] Y. Han, J. J. Yoo, and J. C. Ye, “Deep residual learning for compressed sensing CT reconstruction via persistent homology analysis,” CoRR, vol. abs/1611.06391, 2016. [Online]. Available: http://arxiv.org/abs/1611.06391
 [18] E. Kang, J. Min, and J. C. Ye, “Wavenet: a deep convolutional neural network using directional wavelets for lowdose xray CT reconstruction,” CoRR, vol. abs/1610.09736, 2016. [Online]. Available: http://arxiv.org/abs/1610.09736
 [19] H. Li and K. Mueller, “Lowdose ct streak artifacts removal using deep residual neural network,” Proceedings of Fully 3D conference 2017, pp. 191–194, 2017.
 [20] H. Chen, Y. Zhang, W. Zhang, P. Liao, K. Li, J. Zhou, and G. Wang, “Lowdose ct via convolutional neural network,” Biomed. Opt. Express, vol. 8, no. 2, pp. 679–694, Feb 2017. [Online]. Available: http://www.osapublishing.org/boe/abstract.cfm?URI=boe82679
 [21] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Lowdose ct with a residual encoderdecoder convolutional neural network (redcnn),” IEEE Transactions on Medical Imaging, vol. PP, no. 99, pp. 1–1, 2017.
 [22] L. Cheng, S. Ahn, S. Ross, H. Qian, and B. D. Man, “Accelerated iterative image reconstruction using a deep learning based leapfrogging strategy,” Proceedings of Fully 3D conference 2017, pp. 715–720, 2017.
 [23] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, “Playing atari with deep reinforcement learning,” CoRR, vol. abs/1312.5602, 2013. [Online]. Available: http://arxiv.org/abs/1312.5602
 [24] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
 [25] T. Goldstein and S. Osher, “The split bregman method for l1regularized problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 323–343, 2009. [Online]. Available: https://doi.org/10.1137/080725891
 [26] A. Chambolle and T. Pock, “A firstorder primaldual algorithm for convex problems with applications to imaging,” Journal of Mathematical Imaging and Vision, vol. 40, no. 1, pp. 120–145, May 2011. [Online]. Available: https://doi.org/10.1007/s1085101002511
 [27] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, Jan. 2011. [Online]. Available: http://dx.doi.org/10.1561/2200000016
 [28] G. Golub and C. Van Loan, Matrix Computations, ser. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 2013. [Online]. Available: https://books.google.com/books?id=X5YfsuCWpxMC
 [29] C. J. C. H. Watkins and P. Dayan, “Qlearning,” Machine Learning, vol. 8, no. 3, pp. 279–292, May 1992. [Online]. Available: https://doi.org/10.1007/BF00992698
 [30] R. Bellman and R. Karush, Dynamic Programming: A Bibliography of Theory and Application, ser. Memorandum (Rand Corporation). Rand Corporation, 1964. [Online]. Available: https://books.google.com/books?id=zHG1AQAACAAJ
 [31] R. L. Siddon, “Fast calculation of the exact radiological path for a threedimensional ct array,” Medical Physics, vol. 12, no. 2, pp. 252–255, 1985. [Online]. Available: http://dx.doi.org/10.1118/1.595715
 [32] J. Wang, H. Lu, Z. Liang, D. Eremina, G. Zhang, S. Wang, J. Chen, and J. Manzione, “An experimental study on the noise properties of xray ct sinogram data in radon space,” Physics in Medicine and Biology, vol. 53, no. 12, p. 3327, 2008. [Online]. Available: http://stacks.iop.org/00319155/53/i=12/a=018
 [33] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep admmnet for compressive sensing mri,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 10–18. [Online]. Available: http://papers.nips.cc/paper/6406deepadmmnetforcompressivesensingmri.pdf
 [34] T. Bortfeld, “Imrt: a review and preview,” Physics in medicine and biology, vol. 51, no. 13, p. R363, 2006.