Learned Primal-dual Reconstruction

Learned Primal-dual Reconstruction

Jonas Adler12 and Ozan Öktem1
1Department of Mathematics, KTH - Royal Institute of Technology
2Elekta AB, Box 7593, SE-103 93 Stockholm, Sweden
Email: {jonasadl, ozan}@kth.se
Abstract

We propose the Learned Primal-Dual algorithm for tomographic reconstruction. The algorithm accounts for a (possibly non-linear) forward operator in a deep neural network by unrolling a proximal primal-dual optimization method, but where the proximal operators have been replaced with convolutional neural networks. The algorithm is trained end-to-end, working directly from raw measured data and it does not depend on any initial reconstruction such as \acFBP.

We compare performance of the proposed method on low dose \aclCT reconstruction against \acFBP, \acTV, and deep learning based post-processing of \acFBP. For the Shepp-Logan phantom we obtain \acsPSNR improvement against all compared methods. For human phantoms the corresponding improvement is \unit6.6\decibel over \acTV and \unit2.2\decibel over learned post-processing along with a substantial improvement in the \aclSSIM. Finally, our algorithm involves only ten forward-back-projection computations, making the method feasible for time critical clinical applications.

Inverse problems, Tomography, Deep learning, Primal-Dual, Optimization
\acuse

MRI,ADMM,GPU,CPU \addunit\pixelpixel \addunit\pixelspixels \addunit\voxelvoxel \addunit\decibeldB \addunit\byteB \addunit\hounsfieldHU

\acresetall\acuse

ADMM,GPU,CPU

I Introduction

In an inverse problem, the goal is to reconstruct parameters characterizing the system under investigation from indirect observations. Such problems arise in several areas of science and engineering, like tomographic imaging where one seeks to visualize the interior structure of an object (2D/3D image) from indirect observations. Imaging technologies of this type, such as \acCT and \acMRI imaging, are indispensable for contemporary medical diagnostics, intervention and monitoring.

There is by now a rich theory for model driven tomographic image reconstruction. A key component is knowledge about the forward model, which describes the data formation process in absence of noise. In many applications, an explicit forward model can be derived starting out from the underlying physical principles that are utilized by the imaging modality. Another component accounts for the knowledge about the statistical properties of data and a priori information about the image to be recovered.

A parallel line of development in signal processing has been the usage of deep learning for solving a wide range of tasks that can be cast as supervised learning. The success of these data driven approaches are this far confined to tasks where knowledge about the forward model is not needed, or has little importance. As an example, an entirely data driven approach for tomographic image reconstruction applicable to clinical sized problems has yet to be demonstrated.

A central question is whether one can combine elements of model and data driven approaches for solving ill-posed inverse problems. In particular, is there a framework for incorporating the knowledge of a forward model when designing a neural network for reconstruction? The Learned Primal-Dual reconstruction method developed in this paper is such a framework. It applies to general inverse problems and it is best described in an abstract setting, so our starting point is to formalize the notion of an inverse problem.

Mathematically, an inverse problem can be formulated as reconstructing (estimating) a signal from data where

(1)

Here, the reconstruction space and the data space are typically Hilbert Spaces, is the forward operator that models how a signal gives rise to data in absence of noise, and is a single sample of a -valued random variable that represents the noise component of data.

I-a Variational regularization

A common model driven approach for solving creftype 1 is to maximize the likelihood of the signal, or equivalently minimizing the negative data log-likelihood [1]:

(2)

This minimization is for typical choices of ill-posed, that is, a solution (if it exists) is unstable with respect to the data in the sense that small changes to data results in large changes to a reconstruction. Hence, a maximum likelihood solution typically leads to over-fitting against data.

Variational regularization, also referred to as model based iterative reconstruction in medical image processing, avoids over-fitting by introducing a functional (regularization functional) that encodes a priori information about the true (unknown) and penalizes unlikely solutions [2, 3]. Hence, instead of minimizing only the negative data log-likelihood as in creftype 2, one now seeks to minimize a regularized objective functional by solving

(3)

In the above, (regularization parameter) governs the influence of the a priori knowledge encoded by the regularization functional against the need to fit data.

I-B Machine learning in inverse problems

We here review results on learned iterative schemes, see [4] for a wider review on machine learning for medical imaging and [5] for usage of machine learning for solving inverse problems in general.

Machine learning is widely used for non-linear function approximation under weak assumptions and has recently emerged as the state of the art for several image processing tasks such as classification and segmentation. Applied to the inverse problem in creftype 1, it can be phrased as the problem of finding a (non-linear) mapping satisfying the following pseudo-inverse property:

A key element in machine learning approaches is to parametrize the set of such pseudo-inverse operators by a parameter where is some parameter space and the main algorithmic complication is to select an appropriate structure of such that, given appropriate training, the pseudo-inverse property is satisfied as well as possible.

In the context of tomographic reconstruction, three main research directions have been proposed. The first is so called learned post-processing or learned denoisers. Here, the learned reconstruction operator is of the form

where is a learned post-processing operator and is some approximate pseudo-inverse, e.g. given by \acFBP in \acCT reconstruction. This type of method is relatively easy to implement, given that the pseudo-inverse can be applied off-line, before the learning is performed, which reduces the learning to inferring an transformation. This has been investigated by several authors [6, 7, 8].

Another method is to learn a regularizer and use this regularizer in a classical variational reconstruction scheme according to creftype 3. Examples of this include dictionary learning [9], but several alternative methods have been investigated, such as learning a variational auto-encoder [10] or using a cascade of wavelet transforms (scattering transform) [11].

Finally, some authors investigate learning the full reconstruction operator, going all the way from data to reconstruction. Doing this in one step is typically very computationally expensive and does not scale to the data sizes encountered in tomographic reconstruction. Instead, learned iterative schemes have been studied. These schemes resemble classical optimization methods used for tomographic reconstruction but use machine learning to find the best update in each iteration given the last iterate and results of applying the forward operator and its adjoint as input.

One of the first works on learned iterative schemes is [12], which learns an \acADMM-like scheme for \acMRI reconstruction. A further development along this lines is given in [13], which learns over a broader class of schemes instead of \acADMM-type of schemes. The application is to finite dimensional inverse problems typically arising in image restoration. This approach was in [5] further extended to non-linear forward operators in to the infinite dimensional setting, which also applies learned iterative schemes to (non-linear, pre-log) \acCT. Similar approaches for \acMRI reconstruction have also been considered [14, 15]. Here, the situation is simpler than \acCT reconstruction since the forward operator is approximated by a Fourier transform, i.e. \acMRI reconstruction amounts to inverting the Fourier transform.

Given a structure of , the ”learning” part refers to choosing an ”optimal” set of parameters given some training data, where the concept of optimality is typically quantified through a loss functional that measures the quality of a learned pseudo-inverse .

To define this loss functional, consider a –valued random variable with joint probability distribution . This could be e.g. the probability distribution of human bodies and corresponding noisy tomographic data. We define the optimal reconstruction operator as the one whose reconstructions have the lowest average mean squared distance to the true reconstructions, where the average is taken w.r.t. . Finding this reconstruction operator is then given by selecting the parameters so that the loss functional is minimized:

(4)

However, in practice we often do not have access to the probability distribution of the random variable . Instead, we know a finite set of samples . In this setting, we replace in creftype 4 with its empirical counterpart, so the loss function is replaced with the empirical loss

(5)

Since our main goal is to minimize the loss functional, our practical goal is thus two-fold: we want to find a learned reconstruction scheme that minimizes the empirical loss, and we also want it to generalize to new, unseen data.

Ii Contribution and overview of paper

This paper proposes the Learned Primal-Dual algorithm, a general framework for solving inverse problems that combines deep learning with model based reconstruction. The proposed learned iterative reconstruction scheme involves \acpCNN in both the reconstruction and data space, and these are connected by the forward operator and its adjoint. We train the networks to minimize the mean squared error of the reconstruction and demonstrate that this achieves very high performance in \acCT reconstruction, surpassing recent learning based methods on both analytical and human data.

We emphasize that we learn the whole reconstruction operator, mapping data to reconstruction, and not just a post-processing nor only the prior in isolation.

In addition, we make all of our code and learned parameters open source so that the community can reproduce the results and apply the methods to other inverse problems [16].

Iii The Learned Primal-Dual algorithm

We here introduce how primal-dual algorithms can be learned from data and how this can be used to solve inverse problems.

Iii-a Primal-Dual optimization schemes

In imaging, the minimization in creftype 3 is a large scale optimization problem, which traditionally has been addressed using gradient based methods such as gradient descent or its extensions to higher order derivatives, e.g. quasi-Newton or Newton methods. However, many regularizers of interest result in a non-differentiable objective functional, so gradient based methods are not applicable. A common approach to handle this difficulty is to consider a smooth approximation, which however introduces additional parameters and gives non-exact solutions.

An alternative approach is to use methods from non-smooth convex optimization. Proximal methods have been developed in order to directly work with non-smooth objective functionals. Here, a proximal step replaces the gradient step. The simplest example of such an algorithm is the proximal point algorithm for minimizing an objective functional . It can be seen as the proximal equivalent of the gradient descent scheme and is given by

(6)

where is a step size and the proximal operator is defined by

(7)

While this algorithm could, in theory, be applied to solve creftype 3 it is rarely used directly since creftype 7 does not have a closed form solution. Proximal primal-dual schemes offer a work around. In these schemes, an auxiliary dual variable in the range of the operator is introduced and the primal () and dual variables are updated in an alternating manner.

One well known primal-dual scheme is the \acPDHG algorithm [17], also known as the Chambolle-Pock algorithm, with a recent extension to non-linear operators [18]. The scheme (algorithm 1) is adapted for minimization problems with the following structure:

(8)

where is a (possibly non-linear) operator, is a Hilbert space and and are functionals on the dual/primal spaces. Note that creftype 3 is a special case of creftype 8 if we set , and .

1:Given: s.t. , and , .
2:for  do
3:     
4:     
5:     
Algorithm 1 Non-linear \aclPDHG

In algorithm 1, denotes the Fenchel conjugate of , is the dual variable and is the adjoint of the (Fréchet) derivative of in point .

Example: \AcTV regularized \acCt

The \acPDHG method has been widely applied to \acCT [19]. In \acCT, the forward operator is given by the ray-transform , which integrates the signal over a set of lines given by the acquisition geometry. Hence, elements in are functions on lines

and the adjoint of the derivative is the back-projection [20].

A typical example of variational regularization in imaging is \acTV regularization, which applies to signals that are represented by scalar functions of bounded variation. The corresponding regularization functional is then given as the 1-norm of the gradient magnitude, i.e. , , is the dimension of the space.

The \acPDHG method can be used to solve the \acTV regularized \acCT optimization problem

Since the proximal of is hard to compute, the following identification is better suited for recasting the above into (8):

Iii-B Learned \acPdhg

The aim is to derive a learned reconstruction scheme inspired by \acPDHG, algorithm 1. We follow the observation in [21, 22], that proximal operators can be replaced by other operators that are not necessarily proximal operators. The aforementioned publications replace a proximal operator with a denoising operator such as Block Matching 3D (BM3D). Our idea is to replace the proximal operators by parametrized operators where the parameters are learned from training data, resulting in a learned reconstruction operator.

In order to make the learned reconstruction operator well defined and implementable on a computer we also need to select a stopping criterion. Choosing a proper stopping criterion is an active research area, but for simplicity and usability we use a fixed number of iterates. By selecting a fixed number of iterations, the computation budget is also fixed prior to training, which is a highly desirable property in time critical applications.

Algorithm 2 below outlines the resulting variant of the \acPDHG algorithm with iterations in which the primal proximal has been replaced by a learned proximal, and the dual proximal by a learned proximal . Note that in this article we consider only a single forward model and no regularizing operator, so we have , , but we give the algorithm in full generality for completeness.

1:Initialize
2:for  do
3:     
4:     
5:     
6:return
Algorithm 2 Learned \acPDHG

In algorithm 2, there are several parameters that need to be selected. These are the parameters of the dual proximal, , the primal proximal, , the step lengths, , and the overrelaxation parameter, . In a learned \acPDHG algorithm these would all be infered, learned, from training data.

We implemented this algorithm and show its performance in the results section. While the performance was comparable to traditional methods, it did not improve upon the state of the art in deep learning based image reconstruction.

Iii-C Learned Primal-Dual

To gain substantial improvements, guided by recent advances in machine learning, the following modifications to the learned \acPDHG algorithm were done.

  • Following [13, 5], extend the primal space to allow the algorithm some ”memory” between the iterations.

    Similarly extend the dual space to .

  • Instead of explicitly enforcing updates of the form , allow the network to learn how to combine the previous update with the result of the operator evaluation.

  • Instead of hard-coding the over-relaxation , let the network freely learn in what point the forward operator should be evaluated.

  • Instead of using the same learned proximal operators in each iteration allow them to differ. This increases the size of the parameter space but it also notably improves reconstruction quality.

The above modifications result in a new algorithm, henceforth called the Learned Primal-Dual algorithm, that is outlined in algorithm 3.

1:Initialize
2:for  do
3:     
4:     
5:return
Algorithm 3 Learned Primal-Dual

Iii-C1 Choice of starting point

In theory, the Learned Primal-Dual algorithm can be used with any choice of starting points and . The most simple starting point, both from a conceptual and computational perspective, is zero-initialization

(9)

where is the zero element in the primal or dual space.

In cases where a good starting guess is available, it would make sense to use it. One such option is to assume that there exists a pseudo-inverse , e.g. \acFBP for \acCT. For the dual variable, the data enters into each iterate so there is no need for a good initial guess. This gives the starting point

(10)

In our tests we found that providing the Learned Primal-Dual algorithm with such an initial guess marginally decreased training time, but did not give better final results. Given that using the pseudo-inverse adds more complexity by making the learned reconstruction operator depend on an earlier reconstruction, we report values only from zero-initialization.

Iii-D Connection to variational regularization

We note that by selecting and the Learned Primal-Dual algorithm naturally reduces to the classical \acPDHG algorithm by making the following choices:

Even if the learned proximal operators do not have explicit access to the proximals, the universal approximation property of neural networks [23] guarantees that given sufficient training data these equalities can be approximated arbitrarily well.

A wide range of other optimization schemes can also be seen as special cases of the Learned Primal-Dual algorithm. For example, the gradient descent algorithm with step-length for solving creftype 8 is given by

and can be obtained by selecting

More advanced gradient based methods such as Limited memory BFGS are likewise sub-cases obtained by appropriate choices of learned proximal operators.

In summary, the Learned Primal-Dual algorithm contains a wide range of optimization schemes as special cases. If the parameters are appropriately selected, then the proposed algorithm should always perform at least as well as current variational regularization schemes given the same stopping criteria, which here is a fixed number of iterates.

Iv Implementation and evaluation

We evaluate the algorithm on two low dose \acCT problems. One simplified using analytical phantoms based on ellipses and one with a more realistic forward model and human phantoms. We briefly describe these test cases and how we implemented the Learned Primal-Dual algorithm. We also describe the methods we compare against.

Iv-a Test cases

Ellipse phantoms

This problem is identical to [5] and we restate it briefly. Training data is randomly generated ellipses on a \unit128 ×128\pixel domain. The forward operator is the ray transform and hence .

The projection geometry was a sparse 30 view parallel beam geometry with 182 detector pixels. 5% additive Gaussian noise was added to the projections. Since the forward operator is linear, the adjoint of the derivative is simply the adjoint, which for the ray transform is the back-projection

Human phantoms

In order to evaluate the algorithm on a clinically realistic use-case we consider reconstruction of simulated data from human abdomen \acCT scans as provided by Mayo Clinic for the AAPM Low Dose CT Grand Challengfe [24]. The data includes full dose \acCT scans from 10 patients, of which we used 9 for training and 1 for evaluation. We used the \unit3\milli\meter slice thickness reconstructions, resulting in 2168 training images, each in size. Thus, given that we minimize the pointwise error, the total number of data-points is .

We used a two-dimensional fan-beam geometry with 1000 angles, \unit1000\pixels, source to axis distance \unit500\milli\meter and axis to detector distance \unit500\milli\meter. In this setting, we consider the more physically correct non-linear forward model given by Beer-Lamberts law

where the unit of is \unit\gram/\centi\cubic\meter and is the mass attenuation coefficient, in this work selected to \unit0.2\centi\squaren\meter/\gram which is approximately the value in water at x-ray energies. We used Poisson noise corresponding to incident photons per pixel before attenuation, which would correspond to a low dose \acCT scan. We find the action of the adjoint of the derivative by straightforward computation

The forward model can also be linearised by applying to both the data and forward operator, which then simply becomes the ray-transform as for the ellipse data. We implemented both the pre-log (non-linear) and post-log (linear) forward models and compare their results.

For validation of the ellipse data case, we simply use the (modified) Shepp-Logan phantom and for the human phantom data we use one held out set of patient data. See fig. 1 for examples.

(a) Ellipse training phantom
(b) Ellipse training sinogram
(c) Shepp-Logan phantom
(d) Shepp-Logan sinogram
(e) Human phantom
(f) Human pre-log sinogram
Fig. 1: Example of data which are used for training and validation. Top: Randomly generated ellipses. Middle: Validation data, here given by the modified Shepp-Logan phantom. Bottom: Validation data generated from the human phantoms.

Iv-B Implementation

The methods described above were implemented in Python using \acODL [25] and TensorFlow [26]. All operator-related components, such as the forward operator , were implemented in \acODL, and these were then converted into TensorFlow layers using the as_tensorflow_layer functionality of \acODL. The neural network layers and training were implemented using TensorFlow. The implementation utilizes abstract \acODL structures for representing functional analytic notions and is therefore generic and easily adaptable to other inverse problems. In particular, the code can be easily adapted to other imaging modalities.

We used the \acODL operator RayTransform in order to evaluate the ray transform and its adjoint using the \acGPU accelerated ’astra_gpu’ backend [27].

Iv-B1 Deep neural network and training details

Given the general Learned Primal-Dual scheme in algorithm 3, a parametrization of the learned proximal operators is needed in order to proceed. In many inverse problems, and particularly in \acCT and \acMRI reconstruction, most of the useful properties for both the forward operator and prior are approximately translation invariant. For this reason the resulting reconstruction operator should be approximately translation invariant, which indicates that \acpCNN are suitable for parametrizing the aforementioned reconstruction operator.

We used learned proximal operators of the form

where is the identity operator that makes the network a residual network. There are two main reasons for choosing such a structure. First, proximal operators (as the name implies) are typically close to the identity and second, there is rich evidence in the machine learning literature [28] that networks of this type are easier to train. Heuristically this is because each update does not need to learn the whole update, but only a small offset from the identity.

Additionally, we used affine operators parametrized by weights and biases . The affine operators are defined in terms of so called convolution operators (here given on the primal space, but equivalently on the dual space). These are given as affine combinations of regular convolutions, more specifically:

where the :th component is given by

where and .

The non-linearities were chosen to be Parametric Rectified Linear Units (PReLU) functions

This type of non-linearity has proven successful in other applications such as classification [29].

We let the number of data that persists between the iterates be . The convolutions were all pixel size, and the number of channels was, for each primal learned proximal, , and for the duals where the higher number of inputs is due to the data being supplied to the dual proximals.

We let the number of unrolled iterations be , that is the operator and the adjoint of its derivative are both evaluated 10 times by the network. Since each iterate involves two 3-layer networks, one for each proximal, the total depth of the network is 60 convolutional layers and the total number of parameters approximately . In the context of deep learning, this is a deep network but with a small number of parameters. The network is visualized in fig. 2.

We used the Xavier initialization scheme [30] for the convolution parameters, and initialized all biases to zero.

We trained the network by minimizing the empirical loss creftype 5 using training data as explained above using the ADAM optimizer in TensorFlow [31]. We used batches on each problem and used a learning rate schedule according to cosine annealing [32], i.e. the learning rate in step was

where the initial learning rate was set to . We also let the parameter of the ADAM optimizer to and let all other parameters use the default choices. We performed global gradient norm clipping [33], limiting the gradient norms to 1 in order to improve training stability and used a batch size of 5 for the ellipse data and 1 for the human phantoms.

We did not use any regularization of the learned parameters, nor did we utilize any tricks such as dropout or batch normalization. Neither did we perform any data augmentation.

The training was done using a single GTX 1080 TI \acGPU and took about 11 hours for the ellipse data and 40 hours for the human phantoms.

Fig. 2: Network architecture used to solve the tomography problem. The dual iterates are given in blue boxes, while the primal iterates are in the red boxes. The blue/red boxes all have the same architecture, which is illustrated in the corresponding large boxes. Several arrows pointing to one box indicates concatenation. The initial guesses enter from the left, while the data is supplied to the dual iterates. In the classical \acPDHG algorithm, the primal iterates would instead of a \acCNN be given by with over-relaxation, and the dual iterates would be given by .

Iv-B2 Incorporating the forward operator in neural networks

In order to minimize the loss function creftype 4, \acSGD type methods are typically used and these require (an estimate of) the gradient of the loss function

where is the adjoint of the derivative (with respect to ) of the learned reconstruction operator applied in , often called gradient in the machine learning literature. This introduces a challenge since it will depends on each component of the neural network, including the learned proximal operators but also the forward operator and the backward operator , propagated through all iterations.

To solve this, we used the built in automatic differentiation functionality of TensorFlow which uses the chain rule (back-propagation). This in turn requires the adjoints of the derivatives of each individual component which for the proximals were computed by TensorFlow and for the operators by ODL.

Iv-C Comparison

We compare the algorithm to several widely used algorithms, including standard \acFBP and (isotropic) \acTV regularized reconstruction. We also compare against several learned schemes. These are briefly summarize here, see the references for full descriptions.

The \acFBP reconstructions were done with a Hann filter and used the method fbp_op in ODL. The \acTV reconstruction was performed using 1000 iterations of the classical \acPDHG algorithm, implemented in ODL as pdhg. The filter bandwith in the \acFBP reconstruction and the regularization parameter in the \acTV reconstruction were selected in order to maximize the \acPSNR.

The partially Learned Gradient method in [5] is similar to the algorithm proposed in this article, but differs in that instead of learning proximal operators it learns a gradient operator and the forward operator enters into the neural network through the gradient of the data likelihood. Publicly available code and parameters [34] were used.

The next comparison is against a deep learning based approach for post-processing based on a so called U-Net [35]. The U-Net was first proposed for image segmentation, but by changing the number of output channels to one, it can also be used for post-processing as was done in [7]. Here an initial reconstruction is first performed using \acFBP and a neural network is trained on pairs of noisy \acFBP images and noiseless/low noise ground truth images, learning a mapping between them. We re-implemented the algorithm from [7] but found that using the training procedure as stated in the paper gave sub-optimal results. We hence report values from using the same training scheme as for our other algorithms in order to give a more fair comparison.

Additionally, our comparison includes learned \acPDHG, algorithm 2, as well as the following two simplified versions of the Learned Primal-Dual algorithm. The first is a Learned Primal algorithm, which does not learn any parameters for the dual proximal, instead it returns the residual

The second, FBP + residual denoising algorithm, further simplifies the problem by discarding the forward operator completely, and can be seen as selecting

Since this method does not have access to the data , we select the initial guess according to a \acFBP, see section III-C1. This makes the algorithm a learned denoiser.

For the human phantoms we compare both non-linear and linearized versions of the forward operator, but given that training times are noticeably longer, we only compare to the previously established methods of \acFBP, \acTV and U-Net denoising.

All learned algorithms were trained using the same training scheme as outlined in section IV-B1, and measure the run-time, \acPSNR and the \acSSIM [36].

All methods that we compare against are available in the accompanying source code.

V Results

The quantitative results for the ellipse data is given in table I, where we can see that the proposed Learned Primal-Dual algorithm out-performs the classical schemes (\acFBP and \acTV) significantly w.r.t. the reconstruction error as measured by both \acPSNR and \acSSIM. We also note that the Learned Primal-Dual algorithm gives a large improvement over the previous deep learning based methods such as the learned gradient scheme and U-Net based post-processing, giving an improvement exceeding \unit6\decibel. The Learned Primal-Dual algorithm also outperforms the Learned \acPDHG and the FBP + residual denoising algorithms by wide margins.

The only method that achieves results close to the Learned Primal-Dual algorithm is the Learned Primal method, but the Learned Primal-Dual algorithm gives a noticeable improvement of \unit1.3\decibel.

The results are visualized in fig. 3. We note that small structures, such as the small inserts, are much more clearly visible in the Learned Primal and Learned Primal-Dual reconstructions than in the other reconstructions. We also note that both the Learned \acPDHG and Learned Primal reconstruction seem to have a halo artefact close to the outer bone which is absent in the Learned Primal-Dual reconstruction.

With respect to run-time the learned methods that involve calls to the forward operator (Learned Gradient, \acPDHG, Primal, Primal-Dual) are slower than the methods that do not (FBP + U-Net denoising, Residual) by a factor . When compared to \acTV regularized reconstruction all learned methods are at least 2 orders of magnitude faster.

Method \acsPSNR SSIM Runtime Parameters
\acFBP 19.75 4 1
\acTV 28.06 5 166 1
FBP + U-Net denoising 29.20 9
FBP + residual denoising 32.38 9
Learned Gradient 32.29 56
Learned \acPDHG 28.32 48
Learned Primal 36.97 43
Learned Primal-Dual 38.28 0.988749956718 49
TABLE I: Comparison of reconstruction methods for the ellipses. \AcPSNR measured in \decibel and runtime in \milli\second.
(a) \AcFBP
(b) \AcTV
(c) FBP + U-Net denoising
(d) FBP + residual denoising
(e) Learned gradient
(f) Learned \acPDHG
(g) Learned Primal
(h) Learned Primal-Dual
Fig. 3: Reconstructions for the ellipse data using the compared methods. The window is set to , corresponding to the soft tissue of the modified Shepp-Logan phantom.
Method \acsPSNR SSIM Runtime Parameters
\acFBP 33.65 423 1
\acTV 37.48 64 371 1
FBP + U-Net denoising 41.92 463
Learned Primal-Dual, linear 44.11 0.9694 620
Learned Primal-Dual, non-linear 43.91 0.96873 670
TABLE II: Comparison of the Learned Primal-Dual algorithm with other methods for the Human phantom data. Units for entries are the same as in table I.

(a) human phantom

(b) \AcfFBP
\acPSNR \unit33.65\decibel, \acSSIM 0.830, \unit423\milli\second

(c) \AcfTV
\acPSNR \unit37.48\decibel, \acSSIM 0.946, \unit64 371\milli\second

(d) FBP + U-Net denoising
\acPSNR \unit41.92\decibel, \acSSIM 0.941, \unit463\milli\second

(e) Primal-Dual, linear
\acPSNR \unit44.10\decibel, \acSSIM 0.969, \unit620\milli\second

(f) Primal-Dual, non-linear
\acPSNR \unit43.91\decibel, \acSSIM 0.969, \unit670\milli\second
Fig. 4: Reconstructions of a human phantom along with two zoomed in regions indicated by small circles. The left zoom-in has a true feature whereas texture in right zoom-in is uniform. The window is set to . Among the methods tested, only the Learned Primal-Dual algorithm correctly recovers these regions. In the others, the true feature in the left zoom-in is indistinguishable from other false features of same size/contrast and right-zoom in has a streak artifact. Note also the clinically feasible runtime of the Learned Primal-Dual algorithm. To summarize, the Learned Primal-Dual algorithm offers performance advantages over other methods that translate into true clinical usefulness.

Quantitative results for the human phantoms data are presented in table II. We note that the \acFBP reconstruction has a much more competitive image quality than it had for the ellipse data, both quantitatively and visually. It is likely for this reason that the FBP + U-Net denoising performs better than it did on the ellipses, outperforming \acTV by \unit4.4\decibel. However, if we look at the \acSSIM we note that this improvement does not translate as well to the structural similarity, where the method is comparable to \acTV regularization.

Both quantitatively and visually, the linear and non-linear versions of the Learned Primal-Dual algorithm give very similar results. We will focus on the linear version which gave slightly better results.

The Learned Primal-Dual algorithm gives a \unit10.5\decibel improvement over the \acFBP reconstruction, a \unit6.6\decibel improvement over \acTV and \unit2.2\decibel over the U-Net denoiser. This is less than for the ellipse data, but still represents a large improvement. On the other hand, while the U-Net denoiser did not improve the \acSSIM as compared to \acTV regularization, the Learned Primal-Dual algorithm gives a large improvement.

This improvement is also present in the images when inspected visually in fig. 4. In particular, we see that some artifacts visible in the \acFBP reconstruction are still discernible in the U-Net denoiser and \acTV reconstructions. Examples include streak artifacts, especially around the edges of the phantom and structures spuriously created from noise, such as a line in the muscle above the right bone. These are mostly absent in the Learned Primal-Dual reconstruction. However, we do note that the images do look slightly over-smoothed. Both of these observations become especially apparent if we look at the zoomed in regions, where we note that the Learned Primal-Dual algorithm is able to reconstruct finer detail than the other algorithms, but gives a very smooth texture.

With respect to the run time, the Learned Primal-Dual is more competitive with the \acFBP and U-Net denoiser algorithms for full size data than for the ellipse data. This is because the size of the data is much larger, which increases the runtime of the \acFBP reconstruction, which is also needed to compute the initial guess for the U-Net denoiser. As for the ellipse data, both learned methods outperform \acTV regularized reconstruction by two orders of magnitude with respect to runtime.

Vi Discussion

The results show that the Learned Primal-Dual algorithm outperforms classical reconstruction algorithm by large margins as measured in both \acPSNR and \acSSIM and also improves upon learned post-processing methods for both simplified ellipse data and for human phantoms. In addition, especially for the human phantoms, the reconstruction time is comparable with even filtered back-projection and learned post-processing.

One interesting, and to the best of our knowledge, unique feature of the Learned Primal-Dual algorithm in the field of deep learning based \acCT reconstruction, is that it gives reconstructions working directly from data, without any initial reconstruction as input.

Since the algorithm is iterative, we can visualize the iterates to gain insight into how it works. In fig. 5 we show some iterates with the non-linear forward operator. We note that the reconstruction stays very bad until the 8:th iterate when most structures seem to come in place, but the image is still noisy. Between the 8:th and 10:th iterate, we see that the algorithm seems to perform an edge-enhancing step. It thus seems like the learned iterative scheme works in two steps, first finding the large scale structures and then fine-tuning the details.

Similarly to the edge-enhancement that seems to be performed in the primal space, we note that in the dual space the sinogram that is back-projected seems to be band-pass filtered to exclude both very low and very high frequencies.

We note that in the very noisy and under-sampled data used for the ellipse phantoms, the learned algorithms that make use of the forward operator, such as the Learned Gradient, Primal and Primal-Dual algorithms outperform even state of the art post-processing methods by large margins and that in this regimen, \acTV regularization performs relatively well when compared to post-processing methods. This improvement in reconstruction quality when incorporating the forward operator, while still substantial, is not as large for the human phantom in which the data was less noisy.

To explain this, we conjecture that in the limit of highly noisy data where the initial reconstruction as given by e.g. \acFBP becomes very bad, learned schemes that incorporate the forward model and work directly from data, such as the Learned Primal-Dual algorithm, has a considerable advantage over post-processing methods and that this advantage increases with decreasing data quality.

Further along these lines, note that for the human data the post-processing gives a large improvement in \acPSNR when compared to \acTV regularization, which is not necessarily reflected in the \acSSIM. On the other hand, the Learned Primal-Dual algorithm improves upon both \acPSNR and \acSSIM. This can be by explained by the learned post-processing being limited by the information content of the \acFBP while the Learned Primal-Dual algorithm works directly with data and is thus limited by the information content of the data, which is greater or equal to that of the \acFBP. In theory, the Learned Primal-Dual algorithm can thus find structures that are not present in the \acFBP, something post-processing methods cannot.

In these experiments we found that while the algorithm seems to handle non-linear forward models well, we did not observe any notable performance improvement by doing so. This may indicate that performing reconstructions on post-log data is preferable.

The structure of the neural network was not extensively fine-tuned and we suspect that better results could be obtained by a better choice of network for the learned proximal operators. We also observed that the choice of optimizer and learning rate decay had a large impact on the results, and we suspect that further research into how to correctly train learned reconstruction operators will prove fruitful.

Finally, we observe that the reconstructions, while outperforming all of the compared methods with respect to \acPSNR and \acSSIM, suffers from a perceived over-smoothing when inspected visually. We suspect that the particular choice of objective function used in this article, the squared norm creftype 4, is a main cause of this and invite future researchers to implement learned reconstruction operators that use more advanced loss functions such as perceptual losses [37].

Fig. 5: Iterates 2, 4, 6, 8 and 10 in the Learned Primal-Dual algorithm when reconstructing the human phantoms using a non-linear forward model. Left: Reconstruction (). Middle: Point of evaluation for the forward operator (). Right: Point of evaluation for the adjoint of the derivative (). Windows selected to cover most of the range of the values.

Vii Conclusions

We have proposed a new algorithm in the family of deep learning based iterative reconstruction schemes. The algorithm is inspired by the \acPDHG algorithm, where we replace the proximal operators by learned operators. In contrast to several recently proposed algorithms, the new algorithm works directly from tomographic data and does not depend on any initial reconstruction.

We showed that the algorithm gives state of the art results on a computed tomography problem for both analytical and human phantoms. For analytical phantoms, it improves upon both classical algorithms such as \acFBP and \acTV, and post-processing based algorithms by at least \unit6\decibel while also improving the \acSSIM. The improvements for the human phantom were more modest, but the algorithm still improves upon a \acTV regularized reconstruction by \unit6.6\decibel and gives an improvement of \unit2.2\decibel when compared to a learned post-processing.

We hope that this algorithm will inspire further research in Learned Primal-Dual schemes and that the method will be applied to other imaging modalities.

Viii Acknowledgements

The work was supported by the Swedish Foundation of Strategic Research grant AM13-0049 and Industrial PhD grant ID14-0055. The work was also supported by Elekta.

The authors also thank Dr. Cynthia McCollough, the Mayo Clinic, and the American Association of Physicists in Medicine, and acknowledge funding from grants EB017095 and EB017185 from the National Institute of Biomedical Imaging and Bioengineering, for providing the data necessary for performing comparison using a human phantom.

References

  • [1] M. Bertero, H. Lantéri, and L. Zanni, “Iterative image reconstruction: a point of view,” in Proceedings of the Interdisciplinary Workshop on Mathematical Methods in Biomedical Imaging and Intensity-Modulated Radiation (IMRT), Pisa, Italy, Y. Censor, M. Jiang, and A. K. Louis, Eds., 2008, pp. 37–63.
  • [2] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems, ser. Mathematics and its Applications.   Kluwer Academic Publishers, 2000, no. 375.
  • [3] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational Methods in Imaging, ser. Applied Mathematical Sciences.   New York: Springer-Verlag, 2009, vol. 167.
  • [4] G. Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
  • [5] J. Adler and O. Öktem, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Problems, vol. 33, no. 12, p. 124007, 2017.
  • [6] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-Dose CT with a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN),” IEEE Transactions on Image Processing, 2017.
  • [7] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep Convolutional Neural Network for Inverse Problems in Imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2016.
  • [8] E. Kang, J. Min, and J. C. Ye, “WaveNet: a deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction,” Medical physics, vol. 44 10, pp. e360–e375, 2017.
  • [9] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Low-dose x-ray ct reconstruction via dictionary learning,” IEEE Transactions on Medical Imaging, vol. 31, no. 9, pp. 1682–1697, Sept 2012.
  • [10] T. Meinhardt, M. Möller, C. Hazirbas, and D. Cremers, “Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems,” in ICCV, October 2017.
  • [11] I. Dokmanic, J. Bruna, S. Mallat, and M. de Hoop, “Inverse problems with invariant multiscale statistics,” ArXiv, vol. abs/1609.05502, 2016.
  • [12] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressive sensing MRI,” in Advances in Neural Information Processing Systems, 2016, vol. 29, pp. 10–18.
  • [13] P. Putzky and M. Welling, “Recurrent inference machines for solving inverse problems,” ArXiv, vol. abs/1706.04008, 2017.
  • [14] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated mri data,” Magnetic Resonance in Medicine, 2017.
  • [15] M. Mardani, E. Gong, J. Y. Cheng, S. Vasanawala, G. Zaharchuk, M. Alley, N. Thakur, S. Han, W. Dally, J. M. Pauly, and L. Xing, “Deep Generative Adversarial Networks for Compressed Sensing Automates MRI,” ArXiv, May 2017.
  • [16] J. Adler, “Learned primal-dual reconstruction,” Software available from https://github.com/adler-j/learned_primal_dual, 2017.
  • [17] A. Chambolle and T. Pock, “A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging,” Journal of Mathematical Imaging and Vision, vol. 40, no. 1, pp. 120–145, 2010.
  • [18] T. Valkonen, “A primal-dual hybrid gradient method for nonlinear operators with applications to MRI,” Inverse Problems, vol. 30, no. 5, p. 055012, 2014.
  • [19] E. Y. Sidky, J. H. Jørgensen, and X. Pan, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm,” Physics in Medicine and Biology, vol. 57, pp. 3065–3091, May 2012.
  • [20] A. Markoe, Analytic Tomography, ser. Encyclopedia of mathematics and its applications.   Cambridge University Press, 2006.
  • [21] F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak, D. Reddy, O. Gallo, J. Liu, W. Heidrich, K. Egiazarian, J. Kautz, and K. Pulli, “Flexisp: A flexible camera image processing framework,” ACM Trans. Graph., vol. 33, no. 6, pp. 231:1–231:13, 2014.
  • [22] Y. Romano, M. Elad, and P. Milanfar, “The little Engine that Could: Regularization by Denoising (RED),” ArXiv, 2016.
  • [23] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, no. 2, pp. 251 – 257, 1991.
  • [24] C. McCollough, “Tfg-207a-04: Overview of the low dose ct grand challenge,” Medical Physics, vol. 43, no. 6, pp. 3759–3760, 2016.
  • [25] J. Adler, H. Kohr, and O. Öktem, “Operator discretization library (ODL),” Software available from https://github.com/odlgroup/odl, 2017.
  • [26] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, J. Yangqing, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” ArXiv, no. 1603.04467, 2015.
  • [27] W. van Aarle, W. J. Palenstijn, J. Cant, E. Janssens, F. Bleichrodt, A. Dabravolski, J. Beenhouwer, K. J. Batenburg, and J. Sijbers, “Fast and flexible X-ray tomography using the ASTRA toolbox,” Optics Express, vol. 24, no. 22, pp. 25 129–25 147, 2016.
  • [28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  • [29] K. He, X. Zhang, S. Ren and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
  • [30] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10), 2010.
  • [31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ArXiv, vol. abs/1412.6980, 2014.
  • [32] I. Loshchilov and F. Hutter, “SGDR: stochastic gradient descent with restarts,” ArXiv, vol. abs/1608.03983, 2016.
  • [33] R. Pascanu, T. Mikolov, and Y. Bengio, “Understanding the exploding gradient problem,” ArXiv, vol. abs/1211.5063, 2012.
  • [34] J. Adler, “Solving ill-posed inverse problems using iterative deep neural networks,” Software available from https://github.com/adler-j/learned_gradient_tomography, 2017.
  • [35] O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation.   Springer International Publishing, 2015, pp. 234–241.
  • [36] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.
  • [37] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II, 2016, pp. 694–711.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
212482
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description