Learned Primaldual Reconstruction
Abstract
We propose the Learned PrimalDual algorithm for tomographic reconstruction. The algorithm accounts for a (possibly nonlinear) forward operator in a deep neural network by unrolling a proximal primaldual optimization method, but where the proximal operators have been replaced with convolutional neural networks. The algorithm is trained endtoend, working directly from raw measured data and it does not depend on any initial reconstruction such as \acFBP.
We compare performance of the proposed method on low dose \aclCT reconstruction against \acFBP, \acTV, and deep learning based postprocessing of \acFBP. For the SheppLogan phantom we obtain \acsPSNR improvement against all compared methods. For human phantoms the corresponding improvement is \unit6.6\decibel over \acTV and \unit2.2\decibel over learned postprocessing along with a substantial improvement in the \aclSSIM. Finally, our algorithm involves only ten forwardbackprojection computations, making the method feasible for time critical clinical applications.
MRI,ADMM,GPU,CPU \addunit\pixelpixel \addunit\pixelspixels \addunit\voxelvoxel \addunit\decibeldB \addunit\byteB \addunit\hounsfieldHU
ADMM,GPU,CPU
I Introduction
In an inverse problem, the goal is to reconstruct parameters characterizing the system under investigation from indirect observations. Such problems arise in several areas of science and engineering, like tomographic imaging where one seeks to visualize the interior structure of an object (2D/3D image) from indirect observations. Imaging technologies of this type, such as \acCT and \acMRI imaging, are indispensable for contemporary medical diagnostics, intervention and monitoring.
There is by now a rich theory for model driven tomographic image reconstruction. A key component is knowledge about the forward model, which describes the data formation process in absence of noise. In many applications, an explicit forward model can be derived starting out from the underlying physical principles that are utilized by the imaging modality. Another component accounts for the knowledge about the statistical properties of data and a priori information about the image to be recovered.
A parallel line of development in signal processing has been the usage of deep learning for solving a wide range of tasks that can be cast as supervised learning. The success of these data driven approaches are this far confined to tasks where knowledge about the forward model is not needed, or has little importance. As an example, an entirely data driven approach for tomographic image reconstruction applicable to clinical sized problems has yet to be demonstrated.
A central question is whether one can combine elements of model and data driven approaches for solving illposed inverse problems. In particular, is there a framework for incorporating the knowledge of a forward model when designing a neural network for reconstruction? The Learned PrimalDual reconstruction method developed in this paper is such a framework. It applies to general inverse problems and it is best described in an abstract setting, so our starting point is to formalize the notion of an inverse problem.
Mathematically, an inverse problem can be formulated as reconstructing (estimating) a signal from data where
(1) 
Here, the reconstruction space and the data space are typically Hilbert Spaces, is the forward operator that models how a signal gives rise to data in absence of noise, and is a single sample of a valued random variable that represents the noise component of data.
Ia Variational regularization
A common model driven approach for solving creftype 1 is to maximize the likelihood of the signal, or equivalently minimizing the negative data loglikelihood [1]:
(2) 
This minimization is for typical choices of illposed, that is, a solution (if it exists) is unstable with respect to the data in the sense that small changes to data results in large changes to a reconstruction. Hence, a maximum likelihood solution typically leads to overfitting against data.
Variational regularization, also referred to as model based iterative reconstruction in medical image processing, avoids overfitting by introducing a functional (regularization functional) that encodes a priori information about the true (unknown) and penalizes unlikely solutions [2, 3]. Hence, instead of minimizing only the negative data loglikelihood as in creftype 2, one now seeks to minimize a regularized objective functional by solving
(3) 
In the above, (regularization parameter) governs the influence of the a priori knowledge encoded by the regularization functional against the need to fit data.
IB Machine learning in inverse problems
We here review results on learned iterative schemes, see [4] for a wider review on machine learning for medical imaging and [5] for usage of machine learning for solving inverse problems in general.
Machine learning is widely used for nonlinear function approximation under weak assumptions and has recently emerged as the state of the art for several image processing tasks such as classification and segmentation. Applied to the inverse problem in creftype 1, it can be phrased as the problem of finding a (nonlinear) mapping satisfying the following pseudoinverse property:
A key element in machine learning approaches is to parametrize the set of such pseudoinverse operators by a parameter where is some parameter space and the main algorithmic complication is to select an appropriate structure of such that, given appropriate training, the pseudoinverse property is satisfied as well as possible.
In the context of tomographic reconstruction, three main research directions have been proposed. The first is so called learned postprocessing or learned denoisers. Here, the learned reconstruction operator is of the form
where is a learned postprocessing operator and is some approximate pseudoinverse, e.g. given by \acFBP in \acCT reconstruction. This type of method is relatively easy to implement, given that the pseudoinverse can be applied offline, before the learning is performed, which reduces the learning to inferring an transformation. This has been investigated by several authors [6, 7, 8].
Another method is to learn a regularizer and use this regularizer in a classical variational reconstruction scheme according to creftype 3. Examples of this include dictionary learning [9], but several alternative methods have been investigated, such as learning a variational autoencoder [10] or using a cascade of wavelet transforms (scattering transform) [11].
Finally, some authors investigate learning the full reconstruction operator, going all the way from data to reconstruction. Doing this in one step is typically very computationally expensive and does not scale to the data sizes encountered in tomographic reconstruction. Instead, learned iterative schemes have been studied. These schemes resemble classical optimization methods used for tomographic reconstruction but use machine learning to find the best update in each iteration given the last iterate and results of applying the forward operator and its adjoint as input.
One of the first works on learned iterative schemes is [12], which learns an \acADMMlike scheme for \acMRI reconstruction. A further development along this lines is given in [13], which learns over a broader class of schemes instead of \acADMMtype of schemes. The application is to finite dimensional inverse problems typically arising in image restoration. This approach was in [5] further extended to nonlinear forward operators in to the infinite dimensional setting, which also applies learned iterative schemes to (nonlinear, prelog) \acCT. Similar approaches for \acMRI reconstruction have also been considered [14, 15]. Here, the situation is simpler than \acCT reconstruction since the forward operator is approximated by a Fourier transform, i.e. \acMRI reconstruction amounts to inverting the Fourier transform.
Given a structure of , the ”learning” part refers to choosing an ”optimal” set of parameters given some training data, where the concept of optimality is typically quantified through a loss functional that measures the quality of a learned pseudoinverse .
To define this loss functional, consider a –valued random variable with joint probability distribution . This could be e.g. the probability distribution of human bodies and corresponding noisy tomographic data. We define the optimal reconstruction operator as the one whose reconstructions have the lowest average mean squared distance to the true reconstructions, where the average is taken w.r.t. . Finding this reconstruction operator is then given by selecting the parameters so that the loss functional is minimized:
(4) 
However, in practice we often do not have access to the probability distribution of the random variable . Instead, we know a finite set of samples . In this setting, we replace in creftype 4 with its empirical counterpart, so the loss function is replaced with the empirical loss
(5) 
Since our main goal is to minimize the loss functional, our practical goal is thus twofold: we want to find a learned reconstruction scheme that minimizes the empirical loss, and we also want it to generalize to new, unseen data.
Ii Contribution and overview of paper
This paper proposes the Learned PrimalDual algorithm, a general framework for solving inverse problems that combines deep learning with model based reconstruction. The proposed learned iterative reconstruction scheme involves \acpCNN in both the reconstruction and data space, and these are connected by the forward operator and its adjoint. We train the networks to minimize the mean squared error of the reconstruction and demonstrate that this achieves very high performance in \acCT reconstruction, surpassing recent learning based methods on both analytical and human data.
We emphasize that we learn the whole reconstruction operator, mapping data to reconstruction, and not just a postprocessing nor only the prior in isolation.
In addition, we make all of our code and learned parameters open source so that the community can reproduce the results and apply the methods to other inverse problems [16].
Iii The Learned PrimalDual algorithm
We here introduce how primaldual algorithms can be learned from data and how this can be used to solve inverse problems.
Iiia PrimalDual optimization schemes
In imaging, the minimization in creftype 3 is a large scale optimization problem, which traditionally has been addressed using gradient based methods such as gradient descent or its extensions to higher order derivatives, e.g. quasiNewton or Newton methods. However, many regularizers of interest result in a nondifferentiable objective functional, so gradient based methods are not applicable. A common approach to handle this difficulty is to consider a smooth approximation, which however introduces additional parameters and gives nonexact solutions.
An alternative approach is to use methods from nonsmooth convex optimization. Proximal methods have been developed in order to directly work with nonsmooth objective functionals. Here, a proximal step replaces the gradient step. The simplest example of such an algorithm is the proximal point algorithm for minimizing an objective functional . It can be seen as the proximal equivalent of the gradient descent scheme and is given by
(6) 
where is a step size and the proximal operator is defined by
(7) 
While this algorithm could, in theory, be applied to solve creftype 3 it is rarely used directly since creftype 7 does not have a closed form solution. Proximal primaldual schemes offer a work around. In these schemes, an auxiliary dual variable in the range of the operator is introduced and the primal () and dual variables are updated in an alternating manner.
One well known primaldual scheme is the \acPDHG algorithm [17], also known as the ChambollePock algorithm, with a recent extension to nonlinear operators [18]. The scheme (algorithm 1) is adapted for minimization problems with the following structure:
(8) 
where is a (possibly nonlinear) operator, is a Hilbert space and and are functionals on the dual/primal spaces. Note that creftype 3 is a special case of creftype 8 if we set , and .
In algorithm 1, denotes the Fenchel conjugate of , is the dual variable and is the adjoint of the (Fréchet) derivative of in point .
Example: \AcTV regularized \acCt
The \acPDHG method has been widely applied to \acCT [19]. In \acCT, the forward operator is given by the raytransform , which integrates the signal over a set of lines given by the acquisition geometry. Hence, elements in are functions on lines
and the adjoint of the derivative is the backprojection [20].
A typical example of variational regularization in imaging is \acTV regularization, which applies to signals that are represented by scalar functions of bounded variation. The corresponding regularization functional is then given as the 1norm of the gradient magnitude, i.e. , , is the dimension of the space.
The \acPDHG method can be used to solve the \acTV regularized \acCT optimization problem
Since the proximal of is hard to compute, the following identification is better suited for recasting the above into (8):
IiiB Learned \acPdhg
The aim is to derive a learned reconstruction scheme inspired by \acPDHG, algorithm 1. We follow the observation in [21, 22], that proximal operators can be replaced by other operators that are not necessarily proximal operators. The aforementioned publications replace a proximal operator with a denoising operator such as Block Matching 3D (BM3D). Our idea is to replace the proximal operators by parametrized operators where the parameters are learned from training data, resulting in a learned reconstruction operator.
In order to make the learned reconstruction operator well defined and implementable on a computer we also need to select a stopping criterion. Choosing a proper stopping criterion is an active research area, but for simplicity and usability we use a fixed number of iterates. By selecting a fixed number of iterations, the computation budget is also fixed prior to training, which is a highly desirable property in time critical applications.
Algorithm 2 below outlines the resulting variant of the \acPDHG algorithm with iterations in which the primal proximal has been replaced by a learned proximal, and the dual proximal by a learned proximal . Note that in this article we consider only a single forward model and no regularizing operator, so we have , , but we give the algorithm in full generality for completeness.
In algorithm 2, there are several parameters that need to be selected. These are the parameters of the dual proximal, , the primal proximal, , the step lengths, , and the overrelaxation parameter, . In a learned \acPDHG algorithm these would all be infered, learned, from training data.
We implemented this algorithm and show its performance in the results section. While the performance was comparable to traditional methods, it did not improve upon the state of the art in deep learning based image reconstruction.
IiiC Learned PrimalDual
To gain substantial improvements, guided by recent advances in machine learning, the following modifications to the learned \acPDHG algorithm were done.

Instead of explicitly enforcing updates of the form , allow the network to learn how to combine the previous update with the result of the operator evaluation.

Instead of hardcoding the overrelaxation , let the network freely learn in what point the forward operator should be evaluated.

Instead of using the same learned proximal operators in each iteration allow them to differ. This increases the size of the parameter space but it also notably improves reconstruction quality.
The above modifications result in a new algorithm, henceforth called the Learned PrimalDual algorithm, that is outlined in algorithm 3.
IiiC1 Choice of starting point
In theory, the Learned PrimalDual algorithm can be used with any choice of starting points and . The most simple starting point, both from a conceptual and computational perspective, is zeroinitialization
(9) 
where is the zero element in the primal or dual space.
In cases where a good starting guess is available, it would make sense to use it. One such option is to assume that there exists a pseudoinverse , e.g. \acFBP for \acCT. For the dual variable, the data enters into each iterate so there is no need for a good initial guess. This gives the starting point
(10) 
In our tests we found that providing the Learned PrimalDual algorithm with such an initial guess marginally decreased training time, but did not give better final results. Given that using the pseudoinverse adds more complexity by making the learned reconstruction operator depend on an earlier reconstruction, we report values only from zeroinitialization.
IiiD Connection to variational regularization
We note that by selecting and the Learned PrimalDual algorithm naturally reduces to the classical \acPDHG algorithm by making the following choices:
Even if the learned proximal operators do not have explicit access to the proximals, the universal approximation property of neural networks [23] guarantees that given sufficient training data these equalities can be approximated arbitrarily well.
A wide range of other optimization schemes can also be seen as special cases of the Learned PrimalDual algorithm. For example, the gradient descent algorithm with steplength for solving creftype 8 is given by
and can be obtained by selecting
More advanced gradient based methods such as Limited memory BFGS are likewise subcases obtained by appropriate choices of learned proximal operators.
In summary, the Learned PrimalDual algorithm contains a wide range of optimization schemes as special cases. If the parameters are appropriately selected, then the proposed algorithm should always perform at least as well as current variational regularization schemes given the same stopping criteria, which here is a fixed number of iterates.
Iv Implementation and evaluation
We evaluate the algorithm on two low dose \acCT problems. One simplified using analytical phantoms based on ellipses and one with a more realistic forward model and human phantoms. We briefly describe these test cases and how we implemented the Learned PrimalDual algorithm. We also describe the methods we compare against.
Iva Test cases
Ellipse phantoms
This problem is identical to [5] and we restate it briefly. Training data is randomly generated ellipses on a \unit128 ×128\pixel domain. The forward operator is the ray transform and hence .
The projection geometry was a sparse 30 view parallel beam geometry with 182 detector pixels. 5% additive Gaussian noise was added to the projections. Since the forward operator is linear, the adjoint of the derivative is simply the adjoint, which for the ray transform is the backprojection
Human phantoms
In order to evaluate the algorithm on a clinically realistic usecase we consider reconstruction of simulated data from human abdomen \acCT scans as provided by Mayo Clinic for the AAPM Low Dose CT Grand Challengfe [24]. The data includes full dose \acCT scans from 10 patients, of which we used 9 for training and 1 for evaluation. We used the \unit3\milli\meter slice thickness reconstructions, resulting in 2168 training images, each in size. Thus, given that we minimize the pointwise error, the total number of datapoints is .
We used a twodimensional fanbeam geometry with 1000 angles, \unit1000\pixels, source to axis distance \unit500\milli\meter and axis to detector distance \unit500\milli\meter. In this setting, we consider the more physically correct nonlinear forward model given by BeerLamberts law
where the unit of is \unit\gram/\centi\cubic\meter and is the mass attenuation coefficient, in this work selected to \unit0.2\centi\squaren\meter/\gram which is approximately the value in water at xray energies. We used Poisson noise corresponding to incident photons per pixel before attenuation, which would correspond to a low dose \acCT scan. We find the action of the adjoint of the derivative by straightforward computation
The forward model can also be linearised by applying to both the data and forward operator, which then simply becomes the raytransform as for the ellipse data. We implemented both the prelog (nonlinear) and postlog (linear) forward models and compare their results.
For validation of the ellipse data case, we simply use the (modified) SheppLogan phantom and for the human phantom data we use one held out set of patient data. See fig. 1 for examples.
IvB Implementation
The methods described above were implemented in Python using \acODL [25] and TensorFlow [26]. All operatorrelated components, such as the forward operator , were implemented in \acODL, and these were then converted into TensorFlow layers using the as_tensorflow_layer functionality of \acODL. The neural network layers and training were implemented using TensorFlow. The implementation utilizes abstract \acODL structures for representing functional analytic notions and is therefore generic and easily adaptable to other inverse problems. In particular, the code can be easily adapted to other imaging modalities.
We used the \acODL operator RayTransform in order to evaluate the ray transform and its adjoint using the \acGPU accelerated ’astra_gpu’ backend [27].
IvB1 Deep neural network and training details
Given the general Learned PrimalDual scheme in algorithm 3, a parametrization of the learned proximal operators is needed in order to proceed. In many inverse problems, and particularly in \acCT and \acMRI reconstruction, most of the useful properties for both the forward operator and prior are approximately translation invariant. For this reason the resulting reconstruction operator should be approximately translation invariant, which indicates that \acpCNN are suitable for parametrizing the aforementioned reconstruction operator.
We used learned proximal operators of the form
where is the identity operator that makes the network a residual network. There are two main reasons for choosing such a structure. First, proximal operators (as the name implies) are typically close to the identity and second, there is rich evidence in the machine learning literature [28] that networks of this type are easier to train. Heuristically this is because each update does not need to learn the whole update, but only a small offset from the identity.
Additionally, we used affine operators parametrized by weights and biases . The affine operators are defined in terms of so called convolution operators (here given on the primal space, but equivalently on the dual space). These are given as affine combinations of regular convolutions, more specifically:
where the :th component is given by
where and .
The nonlinearities were chosen to be Parametric Rectified Linear Units (PReLU) functions
This type of nonlinearity has proven successful in other applications such as classification [29].
We let the number of data that persists between the iterates be . The convolutions were all pixel size, and the number of channels was, for each primal learned proximal, , and for the duals where the higher number of inputs is due to the data being supplied to the dual proximals.
We let the number of unrolled iterations be , that is the operator and the adjoint of its derivative are both evaluated 10 times by the network. Since each iterate involves two 3layer networks, one for each proximal, the total depth of the network is 60 convolutional layers and the total number of parameters approximately . In the context of deep learning, this is a deep network but with a small number of parameters. The network is visualized in fig. 2.
We used the Xavier initialization scheme [30] for the convolution parameters, and initialized all biases to zero.
We trained the network by minimizing the empirical loss creftype 5 using training data as explained above using the ADAM optimizer in TensorFlow [31]. We used batches on each problem and used a learning rate schedule according to cosine annealing [32], i.e. the learning rate in step was
where the initial learning rate was set to . We also let the parameter of the ADAM optimizer to and let all other parameters use the default choices. We performed global gradient norm clipping [33], limiting the gradient norms to 1 in order to improve training stability and used a batch size of 5 for the ellipse data and 1 for the human phantoms.
We did not use any regularization of the learned parameters, nor did we utilize any tricks such as dropout or batch normalization. Neither did we perform any data augmentation.
The training was done using a single GTX 1080 TI \acGPU and took about 11 hours for the ellipse data and 40 hours for the human phantoms.
IvB2 Incorporating the forward operator in neural networks
In order to minimize the loss function creftype 4, \acSGD type methods are typically used and these require (an estimate of) the gradient of the loss function
where is the adjoint of the derivative (with respect to ) of the learned reconstruction operator applied in , often called gradient in the machine learning literature. This introduces a challenge since it will depends on each component of the neural network, including the learned proximal operators but also the forward operator and the backward operator , propagated through all iterations.
To solve this, we used the built in automatic differentiation functionality of TensorFlow which uses the chain rule (backpropagation). This in turn requires the adjoints of the derivatives of each individual component which for the proximals were computed by TensorFlow and for the operators by ODL.
IvC Comparison
We compare the algorithm to several widely used algorithms, including standard \acFBP and (isotropic) \acTV regularized reconstruction. We also compare against several learned schemes. These are briefly summarize here, see the references for full descriptions.
The \acFBP reconstructions were done with a Hann filter and used the method fbp_op in ODL. The \acTV reconstruction was performed using 1000 iterations of the classical \acPDHG algorithm, implemented in ODL as pdhg. The filter bandwith in the \acFBP reconstruction and the regularization parameter in the \acTV reconstruction were selected in order to maximize the \acPSNR.
The partially Learned Gradient method in [5] is similar to the algorithm proposed in this article, but differs in that instead of learning proximal operators it learns a gradient operator and the forward operator enters into the neural network through the gradient of the data likelihood. Publicly available code and parameters [34] were used.
The next comparison is against a deep learning based approach for postprocessing based on a so called UNet [35]. The UNet was first proposed for image segmentation, but by changing the number of output channels to one, it can also be used for postprocessing as was done in [7]. Here an initial reconstruction is first performed using \acFBP and a neural network is trained on pairs of noisy \acFBP images and noiseless/low noise ground truth images, learning a mapping between them. We reimplemented the algorithm from [7] but found that using the training procedure as stated in the paper gave suboptimal results. We hence report values from using the same training scheme as for our other algorithms in order to give a more fair comparison.
Additionally, our comparison includes learned \acPDHG, algorithm 2, as well as the following two simplified versions of the Learned PrimalDual algorithm. The first is a Learned Primal algorithm, which does not learn any parameters for the dual proximal, instead it returns the residual
The second, FBP + residual denoising algorithm, further simplifies the problem by discarding the forward operator completely, and can be seen as selecting
Since this method does not have access to the data , we select the initial guess according to a \acFBP, see section IIIC1. This makes the algorithm a learned denoiser.
For the human phantoms we compare both nonlinear and linearized versions of the forward operator, but given that training times are noticeably longer, we only compare to the previously established methods of \acFBP, \acTV and UNet denoising.
All learned algorithms were trained using the same training scheme as outlined in section IVB1, and measure the runtime, \acPSNR and the \acSSIM [36].
All methods that we compare against are available in the accompanying source code.
V Results
The quantitative results for the ellipse data is given in table I, where we can see that the proposed Learned PrimalDual algorithm outperforms the classical schemes (\acFBP and \acTV) significantly w.r.t. the reconstruction error as measured by both \acPSNR and \acSSIM. We also note that the Learned PrimalDual algorithm gives a large improvement over the previous deep learning based methods such as the learned gradient scheme and UNet based postprocessing, giving an improvement exceeding \unit6\decibel. The Learned PrimalDual algorithm also outperforms the Learned \acPDHG and the FBP + residual denoising algorithms by wide margins.
The only method that achieves results close to the Learned PrimalDual algorithm is the Learned Primal method, but the Learned PrimalDual algorithm gives a noticeable improvement of \unit1.3\decibel.
The results are visualized in fig. 3. We note that small structures, such as the small inserts, are much more clearly visible in the Learned Primal and Learned PrimalDual reconstructions than in the other reconstructions. We also note that both the Learned \acPDHG and Learned Primal reconstruction seem to have a halo artefact close to the outer bone which is absent in the Learned PrimalDual reconstruction.
With respect to runtime the learned methods that involve calls to the forward operator (Learned Gradient, \acPDHG, Primal, PrimalDual) are slower than the methods that do not (FBP + UNet denoising, Residual) by a factor . When compared to \acTV regularized reconstruction all learned methods are at least 2 orders of magnitude faster.
Method  \acsPSNR  SSIM  Runtime  Parameters 

\acFBP  19.75  4  1  
\acTV  28.06  5 166  1  
FBP + UNet denoising  29.20  9  
FBP + residual denoising  32.38  9  
Learned Gradient  32.29  56  
Learned \acPDHG  28.32  48  
Learned Primal  36.97  43  
Learned PrimalDual  38.28  0.988749956718  49 
Method  \acsPSNR  SSIM  Runtime  Parameters 

\acFBP  33.65  423  1  
\acTV  37.48  64 371  1  
FBP + UNet denoising  41.92  463  
Learned PrimalDual, linear  44.11  0.9694  620  
Learned PrimalDual, nonlinear  43.91  0.96873  670 
Quantitative results for the human phantoms data are presented in table II. We note that the \acFBP reconstruction has a much more competitive image quality than it had for the ellipse data, both quantitatively and visually. It is likely for this reason that the FBP + UNet denoising performs better than it did on the ellipses, outperforming \acTV by \unit4.4\decibel. However, if we look at the \acSSIM we note that this improvement does not translate as well to the structural similarity, where the method is comparable to \acTV regularization.
Both quantitatively and visually, the linear and nonlinear versions of the Learned PrimalDual algorithm give very similar results. We will focus on the linear version which gave slightly better results.
The Learned PrimalDual algorithm gives a \unit10.5\decibel improvement over the \acFBP reconstruction, a \unit6.6\decibel improvement over \acTV and \unit2.2\decibel over the UNet denoiser. This is less than for the ellipse data, but still represents a large improvement. On the other hand, while the UNet denoiser did not improve the \acSSIM as compared to \acTV regularization, the Learned PrimalDual algorithm gives a large improvement.
This improvement is also present in the images when inspected visually in fig. 4. In particular, we see that some artifacts visible in the \acFBP reconstruction are still discernible in the UNet denoiser and \acTV reconstructions. Examples include streak artifacts, especially around the edges of the phantom and structures spuriously created from noise, such as a line in the muscle above the right bone. These are mostly absent in the Learned PrimalDual reconstruction. However, we do note that the images do look slightly oversmoothed. Both of these observations become especially apparent if we look at the zoomed in regions, where we note that the Learned PrimalDual algorithm is able to reconstruct finer detail than the other algorithms, but gives a very smooth texture.
With respect to the run time, the Learned PrimalDual is more competitive with the \acFBP and UNet denoiser algorithms for full size data than for the ellipse data. This is because the size of the data is much larger, which increases the runtime of the \acFBP reconstruction, which is also needed to compute the initial guess for the UNet denoiser. As for the ellipse data, both learned methods outperform \acTV regularized reconstruction by two orders of magnitude with respect to runtime.
Vi Discussion
The results show that the Learned PrimalDual algorithm outperforms classical reconstruction algorithm by large margins as measured in both \acPSNR and \acSSIM and also improves upon learned postprocessing methods for both simplified ellipse data and for human phantoms. In addition, especially for the human phantoms, the reconstruction time is comparable with even filtered backprojection and learned postprocessing.
One interesting, and to the best of our knowledge, unique feature of the Learned PrimalDual algorithm in the field of deep learning based \acCT reconstruction, is that it gives reconstructions working directly from data, without any initial reconstruction as input.
Since the algorithm is iterative, we can visualize the iterates to gain insight into how it works. In fig. 5 we show some iterates with the nonlinear forward operator. We note that the reconstruction stays very bad until the 8:th iterate when most structures seem to come in place, but the image is still noisy. Between the 8:th and 10:th iterate, we see that the algorithm seems to perform an edgeenhancing step. It thus seems like the learned iterative scheme works in two steps, first finding the large scale structures and then finetuning the details.
Similarly to the edgeenhancement that seems to be performed in the primal space, we note that in the dual space the sinogram that is backprojected seems to be bandpass filtered to exclude both very low and very high frequencies.
We note that in the very noisy and undersampled data used for the ellipse phantoms, the learned algorithms that make use of the forward operator, such as the Learned Gradient, Primal and PrimalDual algorithms outperform even state of the art postprocessing methods by large margins and that in this regimen, \acTV regularization performs relatively well when compared to postprocessing methods. This improvement in reconstruction quality when incorporating the forward operator, while still substantial, is not as large for the human phantom in which the data was less noisy.
To explain this, we conjecture that in the limit of highly noisy data where the initial reconstruction as given by e.g. \acFBP becomes very bad, learned schemes that incorporate the forward model and work directly from data, such as the Learned PrimalDual algorithm, has a considerable advantage over postprocessing methods and that this advantage increases with decreasing data quality.
Further along these lines, note that for the human data the postprocessing gives a large improvement in \acPSNR when compared to \acTV regularization, which is not necessarily reflected in the \acSSIM. On the other hand, the Learned PrimalDual algorithm improves upon both \acPSNR and \acSSIM. This can be by explained by the learned postprocessing being limited by the information content of the \acFBP while the Learned PrimalDual algorithm works directly with data and is thus limited by the information content of the data, which is greater or equal to that of the \acFBP. In theory, the Learned PrimalDual algorithm can thus find structures that are not present in the \acFBP, something postprocessing methods cannot.
In these experiments we found that while the algorithm seems to handle nonlinear forward models well, we did not observe any notable performance improvement by doing so. This may indicate that performing reconstructions on postlog data is preferable.
The structure of the neural network was not extensively finetuned and we suspect that better results could be obtained by a better choice of network for the learned proximal operators. We also observed that the choice of optimizer and learning rate decay had a large impact on the results, and we suspect that further research into how to correctly train learned reconstruction operators will prove fruitful.
Finally, we observe that the reconstructions, while outperforming all of the compared methods with respect to \acPSNR and \acSSIM, suffers from a perceived oversmoothing when inspected visually. We suspect that the particular choice of objective function used in this article, the squared norm creftype 4, is a main cause of this and invite future researchers to implement learned reconstruction operators that use more advanced loss functions such as perceptual losses [37].
Vii Conclusions
We have proposed a new algorithm in the family of deep learning based iterative reconstruction schemes. The algorithm is inspired by the \acPDHG algorithm, where we replace the proximal operators by learned operators. In contrast to several recently proposed algorithms, the new algorithm works directly from tomographic data and does not depend on any initial reconstruction.
We showed that the algorithm gives state of the art results on a computed tomography problem for both analytical and human phantoms. For analytical phantoms, it improves upon both classical algorithms such as \acFBP and \acTV, and postprocessing based algorithms by at least \unit6\decibel while also improving the \acSSIM. The improvements for the human phantom were more modest, but the algorithm still improves upon a \acTV regularized reconstruction by \unit6.6\decibel and gives an improvement of \unit2.2\decibel when compared to a learned postprocessing.
We hope that this algorithm will inspire further research in Learned PrimalDual schemes and that the method will be applied to other imaging modalities.
Viii Acknowledgements
The work was supported by the Swedish Foundation of Strategic Research grant AM130049 and Industrial PhD grant ID140055. The work was also supported by Elekta.
The authors also thank Dr. Cynthia McCollough, the Mayo Clinic, and the American Association of Physicists in Medicine, and acknowledge funding from grants EB017095 and EB017185 from the National Institute of Biomedical Imaging and Bioengineering, for providing the data necessary for performing comparison using a human phantom.
References
 [1] M. Bertero, H. Lantéri, and L. Zanni, “Iterative image reconstruction: a point of view,” in Proceedings of the Interdisciplinary Workshop on Mathematical Methods in Biomedical Imaging and IntensityModulated Radiation (IMRT), Pisa, Italy, Y. Censor, M. Jiang, and A. K. Louis, Eds., 2008, pp. 37–63.
 [2] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems, ser. Mathematics and its Applications. Kluwer Academic Publishers, 2000, no. 375.
 [3] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational Methods in Imaging, ser. Applied Mathematical Sciences. New York: SpringerVerlag, 2009, vol. 167.
 [4] G. Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
 [5] J. Adler and O. Öktem, “Solving illposed inverse problems using iterative deep neural networks,” Inverse Problems, vol. 33, no. 12, p. 124007, 2017.
 [6] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “LowDose CT with a Residual EncoderDecoder Convolutional Neural Network (REDCNN),” IEEE Transactions on Image Processing, 2017.
 [7] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep Convolutional Neural Network for Inverse Problems in Imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2016.
 [8] E. Kang, J. Min, and J. C. Ye, “WaveNet: a deep convolutional neural network using directional wavelets for lowdose Xray CT reconstruction,” Medical physics, vol. 44 10, pp. e360–e375, 2017.
 [9] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Lowdose xray ct reconstruction via dictionary learning,” IEEE Transactions on Medical Imaging, vol. 31, no. 9, pp. 1682–1697, Sept 2012.
 [10] T. Meinhardt, M. Möller, C. Hazirbas, and D. Cremers, “Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems,” in ICCV, October 2017.
 [11] I. Dokmanic, J. Bruna, S. Mallat, and M. de Hoop, “Inverse problems with invariant multiscale statistics,” ArXiv, vol. abs/1609.05502, 2016.
 [12] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMMNet for compressive sensing MRI,” in Advances in Neural Information Processing Systems, 2016, vol. 29, pp. 10–18.
 [13] P. Putzky and M. Welling, “Recurrent inference machines for solving inverse problems,” ArXiv, vol. abs/1706.04008, 2017.
 [14] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated mri data,” Magnetic Resonance in Medicine, 2017.
 [15] M. Mardani, E. Gong, J. Y. Cheng, S. Vasanawala, G. Zaharchuk, M. Alley, N. Thakur, S. Han, W. Dally, J. M. Pauly, and L. Xing, “Deep Generative Adversarial Networks for Compressed Sensing Automates MRI,” ArXiv, May 2017.
 [16] J. Adler, “Learned primaldual reconstruction,” Software available from https://github.com/adlerj/learned_primal_dual, 2017.
 [17] A. Chambolle and T. Pock, “A FirstOrder PrimalDual Algorithm for Convex Problems with Applications to Imaging,” Journal of Mathematical Imaging and Vision, vol. 40, no. 1, pp. 120–145, 2010.
 [18] T. Valkonen, “A primaldual hybrid gradient method for nonlinear operators with applications to MRI,” Inverse Problems, vol. 30, no. 5, p. 055012, 2014.
 [19] E. Y. Sidky, J. H. Jørgensen, and X. Pan, “Convex optimization problem prototyping for image reconstruction in computed tomography with the ChambollePock algorithm,” Physics in Medicine and Biology, vol. 57, pp. 3065–3091, May 2012.
 [20] A. Markoe, Analytic Tomography, ser. Encyclopedia of mathematics and its applications. Cambridge University Press, 2006.
 [21] F. Heide, M. Steinberger, Y.T. Tsai, M. Rouf, D. Pajak, D. Reddy, O. Gallo, J. Liu, W. Heidrich, K. Egiazarian, J. Kautz, and K. Pulli, “Flexisp: A flexible camera image processing framework,” ACM Trans. Graph., vol. 33, no. 6, pp. 231:1–231:13, 2014.
 [22] Y. Romano, M. Elad, and P. Milanfar, “The little Engine that Could: Regularization by Denoising (RED),” ArXiv, 2016.
 [23] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, no. 2, pp. 251 – 257, 1991.
 [24] C. McCollough, “Tfg207a04: Overview of the low dose ct grand challenge,” Medical Physics, vol. 43, no. 6, pp. 3759–3760, 2016.
 [25] J. Adler, H. Kohr, and O. Öktem, “Operator discretization library (ODL),” Software available from https://github.com/odlgroup/odl, 2017.
 [26] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, J. Yangqing, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Largescale machine learning on heterogeneous systems,” ArXiv, no. 1603.04467, 2015.
 [27] W. van Aarle, W. J. Palenstijn, J. Cant, E. Janssens, F. Bleichrodt, A. Dabravolski, J. Beenhouwer, K. J. Batenburg, and J. Sijbers, “Fast and flexible Xray tomography using the ASTRA toolbox,” Optics Express, vol. 24, no. 22, pp. 25 129–25 147, 2016.
 [28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
 [29] K. He, X. Zhang, S. Ren and J. Sun, “Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
 [30] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10), 2010.
 [31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ArXiv, vol. abs/1412.6980, 2014.
 [32] I. Loshchilov and F. Hutter, “SGDR: stochastic gradient descent with restarts,” ArXiv, vol. abs/1608.03983, 2016.
 [33] R. Pascanu, T. Mikolov, and Y. Bengio, “Understanding the exploding gradient problem,” ArXiv, vol. abs/1211.5063, 2012.
 [34] J. Adler, “Solving illposed inverse problems using iterative deep neural networks,” Software available from https://github.com/adlerj/learned_gradient_tomography, 2017.
 [35] O. Ronneberger, P. Fischer, and T. Brox, UNet: Convolutional Networks for Biomedical Image Segmentation. Springer International Publishing, 2015, pp. 234–241.
 [36] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.
 [37] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 1114, 2016, Proceedings, Part II, 2016, pp. 694–711.