A SplittingBased Iterative Algorithm for GPUAccelerated Statistical DualEnergy XRay CT Reconstruction
Abstract
For material classification in checked baggage, DualEnergy CT represents any given material with coefficients based on two attenuative effects: Compton scattering and photoelectric absorption. However, straightforward projectiondomain decomposition methods would often yield poor reconstruction due to the high dynamic range of material properties within a realworld baggage. Therefore, for better reconstruction quality under a timing constraint, we propose a splittingbased GPUaccelerated statistical DECT reconstruction algorithm. Compared to prior art, our main contribution lies in the significant acceleration made possible by separating reconstruction and decomposition within an ADMM framework. Experimental results, on both synthetic and realworld baggage phantoms, demonstrate significant improvement in terms of time duration needed for convergence.
A SplittingBased Iterative Algorithm for GPUAccelerated Statistical DualEnergy XRay CT Reconstruction
Author(s) Name(s) 
Author Affiliation(s) 
Index Terms— DECT, ADMM, GPU
1 Introduction
Xray Computed Tomography (CT) is a widely deployed technique for threat detection in baggage at airport security checkpoints. However, using a single Xray spectrum limits its reconstruction to only linear attenuation coefficient (LAC) or Hounsfield Unit (HU), which at best is an approximation of the underlying energydependent characteristics of materials. While LAC might suffice for material discrimination tasks in medical imaging applications, in the adversarial case of airport baggage scanning, frequent types of explosives can possess very similar LACs as common benign materials. As a result, DualEnergy Computed Tomography (DECT), where projections from two Xray spectra are collected simultaneously at each angle, has been proposed as a promising alternative for explosive detection. Essentially, DECT lends itself well to material discrimination because it allows material property to be recovered in an additional dimension by using an energydependent attenuation model.
Commonly DECT models Xray attenuation as a combined effect of Compton scattering and photoelectric absorption, which can be written as:
(1) 
where and denote the energydependent multipliers^{1}^{1}1See [ying2006dual] for detailed formulation of and . to Compton coefficient and photoelectric coefficient , respectively. Therefore, the task of DECT is to recover both Compton and photoelectric coefficients of the object, using projections and measured at two Xray spectra. More specifically, DECT reconstruction amounts to simultaneously solving for and from:
(2) 
where denotes the projection matrix, and denote the number of photons emitted by the Xray source at energy level in the high and lowenergy spectrum, respectively.
Despite the direct physical interpretation of Compton/PE bases, its reconstruction quality is hampered by the severe imbalance of sensitivity. At most energy levels, Compton scattering is the dominant contribution for attenuation, since has a cubicle decay. As a result, stable recovery of photoelectric coefficients would require more sophisticated inversion algorithms, such as the statistical methods discussed in Section 2. In this paper, since our targeted setting is highvolume baggage screening, the focus is on improving the computational efficiency of the statistical inversion algorithms.
2 Related Work
In essence, DECT reconstruction entails two steps of different nature: dualenergy decomposition and tomographic reconstruction. The two tasks can be done either sequentially, as in the projectionwise decomposition methods, or in one unified step as by the iterative statistical approaches.
Projectionwise Decomposition: One of the earliest approaches for DECT decomposition is Constrained Decomposition Method (CDM) [ying2006dual], where high and lowenergy projections are directly converted to Compton/PE lineintegrals, followed by FBP reconstruction. For each pair of measured projections, the decomposed Compton/PE line integrals minimize the quadratic cost of measurement mismatch. Additionally, [yuan2016robust] generalizes CDM to MultiEnergy CT and proposed to weigh the quadratic errors with weights proportional to photon counts. A major disadvantage of CDM is that it guards itself poorly against artifacts, especially within photoelectric coefficients. However, projectionwise decomposition enables great parallelism and is integral for the acceleration achieved by our proposed method.
Iterative Statistical Approaches: For DECT, statistical approaches solve for the MAP estimate of Compton and photoelectric coefficients that best suits both the measurement and prior knowledge. The literature on MultiEnergy CT has been largely focused on designing models and priors that best leverage the structual similarity across bases. In [semerci2012parametric], Compton/PE images are reconstructed on a set of exponential basis functions, whose parameters are directly solved from projections at two energy levels using LevenburgMarquardt (LM). An edgecorrelation penalty term is introduced to stabilize the recovery of photoelectric image. This idea of encouraging geometric similarity between the more stable Compton image and the photoelectric image is further explored in [tracey2015stabilizing]. For this purpose, the authors proposed a new NonLocal Mean (NLM) regularizer on photoelectric image and laid out an ADMM formation that scales up to a problem size of practical interest. By treating energy level as the third dimension, [zhang2017tensor, semerci2014tensor] adopted a tensorbased model. More specifically, [zhang2017tensor] first learns a patchbased dictionary of tensors from FBP reconstructions in the training phase, then during testing seeks the reconstruction as a sparse code of the atoms in the dictionary. On the other hand, [semerci2014tensor] treats stacked images across energylevels as a tensor and performs inversion with the added penalty on the tensor nuclear norm.
Despite being able to produce highquality reconstructions, the common downfall of statistical algorithms is that they fail to address the timing constraint in practical applications. Compared to LAC reconstruction, DECT has the added computational complexity for decomposition. Moreover, in existing approaches, solving decomposition and reconstruction in one step is inefficient, due to the combined complexity and highdimensionality of the problem. Therefore, we propose to a statistical DECT approach that separates decomposition and reconstruction. More specifically, we employ a splittingbased MAP estimation method embedded in an ADMM framework. As shown in Section 4, our new splitting scheme not only provides better convergence rate, but also allows powerful hardwareenabled acceleration.
3 Proposed Method
3.1 Problem Formation
First, we define the forward model — the nonlinear transform from Compton/PE coefficients to logarithmic projection :
(3)  
where are constants. Now given a pair of measurements , which is corrupted by Poisson photon noise and possibly other sources of artifact (e.g. photon starvation and white detector noise), the most basic statistical estimate, the ML estimate, is the minimizer of the following quadratic cost:
(4) 
where is a diagonal matrix with photon counts on the diagonal. In practice, it is common to incorporate prior knowledge into the estimate to find a solution that balances fitting the measurement and being feasible. Therefore, we employ a MAP estimate by expanding Equation (4) with Total Variation (TV) and nonnegativity prior:
(5) 
where and is the regularization parameter.
The unconstrained optimization posed by Equation (5) is both intractable and inefficient to solve directly. Fortunately, Alternating Direction Method of Multipliers (ADMM) provides a flexible splittingbased framework that ensures both optimality and convergence [tracey2015stabilizing].
3.2 Formation for ADMM
The derivation for solving the MAP estimate using ADMM begins with converting Equation (5) to its constrained equivalent. By introducing two new auxiliary variables and for the TV term and nonnegativity term respectively, and posing as the primal variable, the MAP minimization can be transformed into:
(6)  
where denotes the finite difference operator along the horizontal and vertical directions. Subsequently, the corresponding Augmented Lagrangian (AL) can be written as
(7)  
where denotes the dual variable, or the scaled Lagrangian multipliers, and denotes the penalty parameter. ADMM splits the minimization of AL into subproblems to be solved separately. One intuitive splitting scheme of ADMM proposed by [tracey2015stabilizing] is to perform the updates according to Equation (8) through (13) in every iteration. First, reestimate the primal variables using
(8)  
(9)  
Note that the update to is made subsequent to for that would help stabilize the recovery of the more noiseprone . Subsequently, update the auxiliary variables and corresponding to the TV and nonnegativity terms, respectively.
(10)  
(11)  
The dual variable , or essentially the scaled Lagrangian multiplier, is updated with
(12) 
And finally the penalty parameter :
(13) 
where and are the primal and dual residuals as defined in [boyd2011distributed]. Intuitively, adaptively balances the primal variable update between minimizing the ML objective and reinforcing the equality constraint.
Efficiencywise, while Equation (10)(12) can be realized by straightforward elementwise operations, for solving Equation (8) and (9) one must use a nonlinear least squares algorithm such as LevenbergMarquardt. Despite its robustness, LM performs expensive backtracking and has little room for parallelization, resulting in poor scalability. On a higher level, the inefficiency is essentially attributed to addressing two problems of very different nature at once: nonlinear dualenergy basis decomposition and regular linear tomographic reconstruction.
3.3 Proposed Splitting Scheme
As a result, for the purpose of speeding up, we propose to further augment Equation (8)(9) into two simpler subproblems: tomographic reconstruction followed by dualenergy decomposition. More specifically, this is achieved by viewing measurement fitting as an additional constraint and introducing a new auxiliary variable designated for meeting that. On one hand, tomographic reconstruction is posed as an unweighted unconstrained optimization problem:
(14) 
In the literature, such inverse problems have been extensively studied and we choose to use a CG solver, because our system is huge and sparse. Additionally, since the projectionwise weights are now absorbed by the later decomposition step, the underlying linear system behind Equation (14) is shiftinvariant. Therefore, a preconditioner can be applied prior each CG iteration to provide improved convergence rate, as discussed in Subsection 3.4.
On the other hand, the second subproblem, dualenergy decomposition in the projection domain, can be solved by finding the pair of Compton/PE coefficients that minimizes the cost function at each projection location, independently. Therefore, decomposition can be achieved using a CDMlike algorithm except unconstrained:
(15)  
With Equation (14) and (15), the overall ADMM procedure with our proposed splitting scheme is summarized by Algorithm 1. Compared to Equation (8) and (9), the proposed updates have two advantages. First, treating decomposition and reconstruction separately allows two entirely different algorithms tailored for each problem to be used, which in our case are PCG and CDM. Secondly, it opens up possibilities for parallelization. More specifically, decomposition into the dualenergy basis can now leverage powerful hardware specialized for mass parallelization such as GPU.
3.4 Preconditioned Conjugate Gradient
Applying a proper preconditioner prior CG iterations can result in improved convergence rate. In our case, for Equation (14), the solution is essentially the same as the solution of the following linear shiftinvariant system in the least squares sense:
(16) 
In practice, a preconditioning filter is precomputed based on the actual system . One benefit in doing so is that subsequently the preconditioning can be conveniently carried out by filtering in frequency domain. Intuitively, since backprojection inherently blurs the output, a desirable preconditioning filter should behave like a highpass filter. We use a method to compute the filter based on the point spread function (PSF) of the system , as provided by [Fessler1999Conjugate]. More specifically, its steps are given as:

Compute the , where is an onehot image at the center pixel;

Transform the PSF into frequency domain using 2DFFT;

Compute and store the inverse filter in frequency domain as the preconditioning filter.
4 Experimental Results
We present qualitative and numerical results on two phantoms: simulated phantom sim18, and realworld baggage phantom Clutter. The following algorithms are implemented and evaluated:

CDM+FBP: CDM [ying2006dual] followed by FBP;

ADMMLM(): ADMM in [tracey2015stabilizing] with LM iterations;

ADMM(P)CG(): proposed ADMM method with (P)CG iterations.
Our Python implementations use the Astra toolbox [van2016fast] for GPUaccelerated forward/backward projection. Additionally, as a key factor for acceleration, we use Gpufit [przybylski2017gpufit] to parallelize the decomposition of projection pairs in Equation (15). All experiments are carried out on a single computing cluster node with 16 cores, 20GB RAM and a Nvidia 1080Ti GPU.
Quality of statistical reconstruction is predicated on the quality initialization. In our experiment, we have chosen the CDM output as the initial Compton coefficients, since it is generally stable, while using a scaled version of the Compton coefficients as the initial PE coefficients. For both Compton and PE, we used as the TV regularization parameter and as the initial penalty parameter. Quantitative evaluation of reconstruction quality is accessed with the normalized distance between and :
(17) 
Note that is the result obtained with 200 iterations of the ADMMLM algorithm, or the ground truth image if available.
The first phantom, sim18, contains seven uniform regions of different materials including aluminum in the center and it is designed for testing the accuracy of the recovered coefficients. The reconstruction has size , and the photon noiseaffected projection is produced with 720 angles and 725 detectors. Figure 1 shows the reconstructions by CDM and ADMMPCG5. In Figure 2, we enumerate the average duration per iteration and plot v.s. time to compare the convergence rate for each algorithm. In general, the proposed ADMM(P)CG algorithms require significantly shorter total duration than ADMMLM to satisfy the same fixedpoint stopping criteria. Among configurations of ADMM(P)CG, faster convergence rate can be observed at configurations with less CG steps per ADMM iteration. The use of a CG preconditioner does not seem to provide any significant benefit.
The realworld baggage phantom, Clutter, is a particularly challenging phantom for DECT decomposition because of the presence of metallic objects. Both high and lowenergy projections are subsampled by half to 360 angles and 512 bins. The measurements were taken with about photons per ray, and two energy spectra at 95keV and 130keV. Reconstruction results by CDM and 100 iterations of ADMMPCG5 are shown in Figure 3. ADMMPCG5 achieves a speedup with two orders of magnitude, taking on average 0.61 seconds per iteration, while ADMMLM1 takes 46.78s. While objects in the CDM PE reconstruction are overshadowed by streaking artifacts, the proposed ADMMPCG algorithm is able to recover object shapes in the PE reconstruction.
5 Conclusions and Future Work
In this paper, we proposed a new splitting scheme for using ADMM to reconstruct Compton and photoelectric coefficients based on dualenergy measurements. By separating reconstruction and decomposition, we achieved up to two magnitudes of speedup in time. Future work shall be directed toward relaxation heuristics such as coarsetofine initialization.
6 Appendix
6.1 Primal and Dual Residuals
The same update rule applies for both and . Intuitively, adaptively balances the variable updates between minimizing the objective and reinforcing the equality constraint. The primal and dual residuals used in updating and are defined as:
(18) 
(19) 
6.2 Preconditioned Conjugate Gradient
For a sparse linear system , Conjugate Gradient is a versatile solver when the system is symmetric positivedefinite. Moreover, its convergence rate has the following relationship with the condition number of the linear system :
(20) 
More specifically, this relationship implies that CG is more efficient when is lower. One way of reducing is to introduce a preconditioner and solve the new linear system instead:
(21) 
Theoretically, the ideal preconditioner should be the exact inverse of : , which is often infeasible to compute in practice. Therefore, in order for effective preconditioning, a practical preconditioner should have several desirable properties: first, should be a reasonable approximate of ; secondly, must also be symmetric positivedefinite; and lastly, the matrixvector product should be inexpensive to compute, as is required once in every CG iteration.