A Splitting-Based Iterative Algorithm for GPU-Accelerated Statistical Dual-Energy X-Ray CT Reconstruction
For material classification in checked baggage, Dual-Energy CT represents any given material with coefficients based on two attenuative effects: Compton scattering and photoelectric absorption. However, straightforward projection-domain decomposition methods would often yield poor reconstruction due to the high dynamic range of material properties within a real-world baggage. Therefore, for better reconstruction quality under a timing constraint, we propose a splitting-based GPU-accelerated statistical DECT reconstruction algorithm. Compared to prior art, our main contribution lies in the significant acceleration made possible by separating reconstruction and decomposition within an ADMM framework. Experimental results, on both synthetic and real-world baggage phantoms, demonstrate significant improvement in terms of time duration needed for convergence.
A Splitting-Based Iterative Algorithm for GPU-Accelerated Statistical Dual-Energy X-Ray CT Reconstruction
Index Terms— DECT, ADMM, GPU
X-ray Computed Tomography (CT) is a widely deployed technique for threat detection in baggage at airport security checkpoints. However, using a single X-ray spectrum limits its reconstruction to only linear attenuation coefficient (LAC) or Hounsfield Unit (HU), which at best is an approximation of the underlying energy-dependent characteristics of materials. While LAC might suffice for material discrimination tasks in medical imaging applications, in the adversarial case of airport baggage scanning, frequent types of explosives can possess very similar LACs as common benign materials. As a result, Dual-Energy Computed Tomography (DECT), where projections from two X-ray spectra are collected simultaneously at each angle, has been proposed as a promising alternative for explosive detection. Essentially, DECT lends itself well to material discrimination because it allows material property to be recovered in an additional dimension by using an energy-dependent attenuation model.
Commonly DECT models X-ray attenuation as a combined effect of Compton scattering and photoelectric absorption, which can be written as:
where and denote the energy-dependent multipliers111See [ying2006dual] for detailed formulation of and . to Compton coefficient and photoelectric coefficient , respectively. Therefore, the task of DECT is to recover both Compton and photoelectric coefficients of the object, using projections and measured at two X-ray spectra. More specifically, DECT reconstruction amounts to simultaneously solving for and from:
where denotes the projection matrix, and denote the number of photons emitted by the X-ray source at energy level in the high- and low-energy spectrum, respectively.
Despite the direct physical interpretation of Compton/PE bases, its reconstruction quality is hampered by the severe imbalance of sensitivity. At most energy levels, Compton scattering is the dominant contribution for attenuation, since has a cubicle decay. As a result, stable recovery of photoelectric coefficients would require more sophisticated inversion algorithms, such as the statistical methods discussed in Section 2. In this paper, since our targeted setting is high-volume baggage screening, the focus is on improving the computational efficiency of the statistical inversion algorithms.
2 Related Work
In essence, DECT reconstruction entails two steps of different nature: dual-energy decomposition and tomographic reconstruction. The two tasks can be done either sequentially, as in the projection-wise decomposition methods, or in one unified step as by the iterative statistical approaches.
Projection-wise Decomposition: One of the earliest approaches for DECT decomposition is Constrained Decomposition Method (CDM) [ying2006dual], where high- and low-energy projections are directly converted to Compton/PE line-integrals, followed by FBP reconstruction. For each pair of measured projections, the decomposed Compton/PE line integrals minimize the quadratic cost of measurement mismatch. Additionally, [yuan2016robust] generalizes CDM to Multi-Energy CT and proposed to weigh the quadratic errors with weights proportional to photon counts. A major disadvantage of CDM is that it guards itself poorly against artifacts, especially within photoelectric coefficients. However, projection-wise decomposition enables great parallelism and is integral for the acceleration achieved by our proposed method.
Iterative Statistical Approaches: For DECT, statistical approaches solve for the MAP estimate of Compton and photoelectric coefficients that best suits both the measurement and prior knowledge. The literature on Multi-Energy CT has been largely focused on designing models and priors that best leverage the structual similarity across bases. In [semerci2012parametric], Compton/PE images are reconstructed on a set of exponential basis functions, whose parameters are directly solved from projections at two energy levels using Levenburg-Marquardt (LM). An edge-correlation penalty term is introduced to stabilize the recovery of photoelectric image. This idea of encouraging geometric similarity between the more stable Compton image and the photoelectric image is further explored in [tracey2015stabilizing]. For this purpose, the authors proposed a new Non-Local Mean (NLM) regularizer on photoelectric image and laid out an ADMM formation that scales up to a problem size of practical interest. By treating energy level as the third dimension, [zhang2017tensor, semerci2014tensor] adopted a tensor-based model. More specifically, [zhang2017tensor] first learns a patch-based dictionary of tensors from FBP reconstructions in the training phase, then during testing seeks the reconstruction as a sparse code of the atoms in the dictionary. On the other hand, [semerci2014tensor] treats stacked images across energy-levels as a tensor and performs inversion with the added penalty on the tensor nuclear norm.
Despite being able to produce high-quality reconstructions, the common downfall of statistical algorithms is that they fail to address the timing constraint in practical applications. Compared to LAC reconstruction, DECT has the added computational complexity for decomposition. Moreover, in existing approaches, solving decomposition and reconstruction in one step is inefficient, due to the combined complexity and high-dimensionality of the problem. Therefore, we propose to a statistical DECT approach that separates decomposition and reconstruction. More specifically, we employ a splitting-based MAP estimation method embedded in an ADMM framework. As shown in Section 4, our new splitting scheme not only provides better convergence rate, but also allows powerful hardware-enabled acceleration.
3 Proposed Method
3.1 Problem Formation
First, we define the forward model — the nonlinear transform from Compton/PE coefficients to logarithmic projection :
where are constants. Now given a pair of measurements , which is corrupted by Poisson photon noise and possibly other sources of artifact (e.g. photon starvation and white detector noise), the most basic statistical estimate, the ML estimate, is the minimizer of the following quadratic cost:
where is a diagonal matrix with photon counts on the diagonal. In practice, it is common to incorporate prior knowledge into the estimate to find a solution that balances fitting the measurement and being feasible. Therefore, we employ a MAP estimate by expanding Equation (4) with Total Variation (TV) and non-negativity prior:
where and is the regularization parameter.
The unconstrained optimization posed by Equation (5) is both intractable and inefficient to solve directly. Fortunately, Alternating Direction Method of Multipliers (ADMM) provides a flexible splitting-based framework that ensures both optimality and convergence [tracey2015stabilizing].
3.2 Formation for ADMM
The derivation for solving the MAP estimate using ADMM begins with converting Equation (5) to its constrained equivalent. By introducing two new auxiliary variables and for the TV term and non-negativity term respectively, and posing as the primal variable, the MAP minimization can be transformed into:
where denotes the finite difference operator along the horizontal and vertical directions. Subsequently, the corresponding Augmented Lagrangian (AL) can be written as
where denotes the dual variable, or the scaled Lagrangian multipliers, and denotes the penalty parameter. ADMM splits the minimization of AL into subproblems to be solved separately. One intuitive splitting scheme of ADMM proposed by [tracey2015stabilizing] is to perform the updates according to Equation (8) through (13) in every iteration. First, reestimate the primal variables using
Note that the update to is made subsequent to for that would help stabilize the recovery of the more noise-prone . Subsequently, update the auxiliary variables and corresponding to the TV and non-negativity terms, respectively.
The dual variable , or essentially the scaled Lagrangian multiplier, is updated with
And finally the penalty parameter :
where and are the primal and dual residuals as defined in [boyd2011distributed]. Intuitively, adaptively balances the primal variable update between minimizing the ML objective and reinforcing the equality constraint.
Efficiency-wise, while Equation (10)-(12) can be realized by straightforward element-wise operations, for solving Equation (8) and (9) one must use a non-linear least squares algorithm such as Levenberg-Marquardt. Despite its robustness, LM performs expensive backtracking and has little room for parallelization, resulting in poor scalability. On a higher level, the inefficiency is essentially attributed to addressing two problems of very different nature at once: non-linear dual-energy basis decomposition and regular linear tomographic reconstruction.
3.3 Proposed Splitting Scheme
As a result, for the purpose of speeding up, we propose to further augment Equation (8)-(9) into two simpler subproblems: tomographic reconstruction followed by dual-energy decomposition. More specifically, this is achieved by viewing measurement fitting as an additional constraint and introducing a new auxiliary variable designated for meeting that. On one hand, tomographic reconstruction is posed as an unweighted unconstrained optimization problem:
In the literature, such inverse problems have been extensively studied and we choose to use a CG solver, because our system is huge and sparse. Additionally, since the projection-wise weights are now absorbed by the later decomposition step, the underlying linear system behind Equation (14) is shift-invariant. Therefore, a preconditioner can be applied prior each CG iteration to provide improved convergence rate, as discussed in Subsection 3.4.
On the other hand, the second subproblem, dual-energy decomposition in the projection domain, can be solved by finding the pair of Compton/PE coefficients that minimizes the cost function at each projection location, independently. Therefore, decomposition can be achieved using a CDM-like algorithm except unconstrained:
With Equation (14) and (15), the overall ADMM procedure with our proposed splitting scheme is summarized by Algorithm 1. Compared to Equation (8) and (9), the proposed updates have two advantages. First, treating decomposition and reconstruction separately allows two entirely different algorithms tailored for each problem to be used, which in our case are PCG and CDM. Secondly, it opens up possibilities for parallelization. More specifically, decomposition into the dual-energy basis can now leverage powerful hardware specialized for mass parallelization such as GPU.
3.4 Preconditioned Conjugate Gradient
Applying a proper preconditioner prior CG iterations can result in improved convergence rate. In our case, for Equation (14), the solution is essentially the same as the solution of the following linear shift-invariant system in the least squares sense:
In practice, a preconditioning filter is precomputed based on the actual system . One benefit in doing so is that subsequently the preconditioning can be conveniently carried out by filtering in frequency domain. Intuitively, since back-projection inherently blurs the output, a desirable preconditioning filter should behave like a high-pass filter. We use a method to compute the filter based on the point spread function (PSF) of the system , as provided by [Fessler1999Conjugate]. More specifically, its steps are given as:
Compute the , where is an one-hot image at the center pixel;
Transform the PSF into frequency domain using 2DFFT;
Compute and store the inverse filter in frequency domain as the preconditioning filter.
4 Experimental Results
We present qualitative and numerical results on two phantoms: simulated phantom sim18, and real-world baggage phantom Clutter. The following algorithms are implemented and evaluated:
CDM+FBP: CDM [ying2006dual] followed by FBP;
ADMM-LM(): ADMM in [tracey2015stabilizing] with LM iterations;
ADMM-(P)CG(): proposed ADMM method with (P)CG iterations.
Our Python implementations use the Astra toolbox [van2016fast] for GPU-accelerated forward/backward projection. Additionally, as a key factor for acceleration, we use Gpufit [przybylski2017gpufit] to parallelize the decomposition of projection pairs in Equation (15). All experiments are carried out on a single computing cluster node with 16 cores, 20GB RAM and a Nvidia 1080Ti GPU.
Quality of statistical reconstruction is predicated on the quality initialization. In our experiment, we have chosen the CDM output as the initial Compton coefficients, since it is generally stable, while using a scaled version of the Compton coefficients as the initial PE coefficients. For both Compton and PE, we used as the TV regularization parameter and as the initial penalty parameter. Quantitative evaluation of reconstruction quality is accessed with the normalized -distance between and :
Note that is the result obtained with 200 iterations of the ADMM-LM algorithm, or the ground truth image if available.
The first phantom, sim18, contains seven uniform regions of different materials including aluminum in the center and it is designed for testing the accuracy of the recovered coefficients. The reconstruction has size , and the photon noise-affected projection is produced with 720 angles and 725 detectors. Figure 1 shows the reconstructions by CDM and ADMM-PCG5. In Figure 2, we enumerate the average duration per iteration and plot v.s. time to compare the convergence rate for each algorithm. In general, the proposed ADMM-(P)CG algorithms require significantly shorter total duration than ADMM-LM to satisfy the same fixed-point stopping criteria. Among configurations of ADMM-(P)CG, faster convergence rate can be observed at configurations with less CG steps per ADMM iteration. The use of a CG preconditioner does not seem to provide any significant benefit.
The real-world baggage phantom, Clutter, is a particularly challenging phantom for DECT decomposition because of the presence of metallic objects. Both high- and low-energy projections are subsampled by half to 360 angles and 512 bins. The measurements were taken with about photons per ray, and two energy spectra at 95keV and 130keV. Reconstruction results by CDM and 100 iterations of ADMM-PCG5 are shown in Figure 3. ADMM-PCG5 achieves a speedup with two orders of magnitude, taking on average 0.61 seconds per iteration, while ADMM-LM1 takes 46.78s. While objects in the CDM PE reconstruction are overshadowed by streaking artifacts, the proposed ADMM-PCG algorithm is able to recover object shapes in the PE reconstruction.
5 Conclusions and Future Work
In this paper, we proposed a new splitting scheme for using ADMM to reconstruct Compton and photoelectric coefficients based on dual-energy measurements. By separating reconstruction and decomposition, we achieved up to two magnitudes of speedup in time. Future work shall be directed toward relaxation heuristics such as coarse-to-fine initialization.
6.1 Primal and Dual Residuals
The same update rule applies for both and . Intuitively, adaptively balances the variable updates between minimizing the objective and reinforcing the equality constraint. The primal and dual residuals used in updating and are defined as:
6.2 Preconditioned Conjugate Gradient
For a sparse linear system , Conjugate Gradient is a versatile solver when the system is symmetric positive-definite. Moreover, its convergence rate has the following relationship with the condition number of the linear system :
More specifically, this relationship implies that CG is more efficient when is lower. One way of reducing is to introduce a preconditioner and solve the new linear system instead:
Theoretically, the ideal preconditioner should be the exact inverse of : , which is often infeasible to compute in practice. Therefore, in order for effective preconditioning, a practical preconditioner should have several desirable properties: first, should be a reasonable approximate of ; secondly, must also be symmetric positive-definite; and lastly, the matrix-vector product should be inexpensive to compute, as is required once in every CG iteration.