# Real-Time 2D-3D Deformable Registration with Deep Learning and Application to Lung Radiotherapy Targeting

###### Abstract

Radiation therapy presents a need for dynamic tracking of a target tumor volume. Fiducial markers such as implanted gold seeds have been used to gate radiation delivery but the markers are invasive and gating significantly increases treatment time. Pretreatment acquisition of a respiratory correlated 4DCT allows for determination of accurate motion tracking which is useful in treatment planning. We design a patient-specific motion subspace and a deep convolutional neural network to recover anatomical positions from a single fluoroscopic projection in real-time. We use this deep network to approximate the nonlinear inverse of a diffeomorphic deformation composed with radiographic projection. This network recovers subspace coordinates to define the patient-specific deformation of the lungs from a baseline anatomic position. The geometric accuracy of the subspace deformations on real patient data is similar to accuracy attained by original image registration between individual respiratory-phase image volumes.

###### Keywords:

Therapy Target Tracking Lung Cancer Real-Time Computed Tomography Fluoroscopy Deep Learning Convolutional Neural Network Motion Analysis Image Registration Computational Anatomy## 1 Introduction

According to the United States Centers for Disease Control and Prevention, lung cancer is the leading cause of cancer death, accounting for 27% of all cancer deaths in the United States [22]. Accurate estimation of organ movement and normal tissue deformations plays a crucial role in dose calculations and treatment decisions in radiation therapy of lung cancer [3]. State-of-the-art radiation treatment planning uses 4D (3D + time) respiratory correlated computed tomography (4D-RCCT) scans as a planning tool to deliver radiation dose to the tumor during treatment [6] and minimize dose to the healthy tissue and vital organs. Understanding respiratory motion allows for more accurate treatment planning and delivery, resulting in more targeted dose delivery to only the tumor.

While general motion patterns in radiotherapy patients are relatively well understood, cycle-to-cycle variations remain a significant challenge and may account for observed discrepancies between predictive treatment plan indicators and clinical patient outcomes, especially due to an increased irradiated volume which limits adequate dose to the target [18]. Gating methods have been developed but require implanted markers and lengthen treatment time. Real-time motion tracking using magnetic resonance imaging is only applied within the 2D imaging slices due to time constraints for imaging [4]. Accurate and fast noninvasive target tracking that accounts for real-time cycle-to-cycle variations in respiratory motion is a recognized need in radiotherapy [9, 19].

In this paper, we propose to use 4D-RCCT acquired at treatment planning time to develop a finite low dimensional patient specific space of deformations and use treatment time single angle fluoroscopy imaging in conjunction with deep convolutional neural network to recover the full deformations of the anatomy during treatment. The constant time inference capability of convolutional neural networks allows real-time recovery of the anatomic position of the tumor target and surrounding anatomy using radiographic images of the treatment area acquired using fluoroscopy. We envision our framework to be incorporated as the target position monitoring subsystem in a conformal radiotherapy system [19]. The proposed method eliminates the need for invasive fiducial markers while still producing targeted radiation delivery during variable respiration patterns. While this framework requires training of the model on a per-patient basis, only inference is required during treatment. Inference of the deep convolutional network and subsequent linear combination of patient-specific motion components can be calculated faster than real-time and drive motion-tracking of conformal radiotherapy treatment.

The proposed framework begins by creating motion subspace from CT registration among the respiratory phase images. This motion subspace is then used to generate many respiratory phase fluoroscopic images with known subspace coordinates. These labeled fluoroscopic images serve as a training dataset for a deep convolutional neural network that recovers the motion subspace coordinates. Motion coordinates recovered by the deep neural network from unseen fluoroscopy define a linear combination of deformation components that form a full 3D deformation that is represented in the 2D fluoroscopic image.

## 2 Related Work

### 2.1 Low-rank Respiratory Motion Models

The respiratory cycle exhibits low-rank behavior and hysteresis, leading to application of principal component analysis to study the deformation of lungs and other organs through time [8]. Sabouri et al. 2017 has used such a PCA approximations of the respiratory cycle to correlate the lung’s principal deformations to respiratory markers directly observable during treatment such as body surface tracking during conformal radiation therapy [17]. PCA has also been applied to MR-based image guidance in radiotherapy [4].

### 2.2 Rank Constrained Density Motion Estimation

Our real-time tracking procedure is based on the rank-constrained diffeomorphic density matching framework, summarized here for completeness (see [1, 2, 14, 15] for details). In general, the rank-constrained diffeomorphic density matching problem produces a registration between multiple image volumes that is explained by only a few principal components of deformation – less than the total number of volumes. The deformation set matrix rank is non-smooth and therefore unfit for optimization. Instead, the nuclear norm is used as a smooth and convex surrogate. The deformation set matrix is constructed with each vectorized deformation field as a row,

(1) |

where is the inverse of the deformation from the -th image in the image series to a selected reference image. We thus define the nuclear norm for deformations between images as

(2) |

where are the singular values. As there are deformations between images, giving only singular values .

As CT measures the linear attenuation coefficient which is proportional to the true density, 4DCT volumes are interpreted as densities, thus values change with compression or expansion. A density is acted upon by a diffeomorphism to compensate for changes of the density by the deformation:

(3) |

where denotes the Jacobian determinant of . The Riemannian geometry of the group of diffeomorphisms with a suitable Sobolev metric is linked to the Riemannian geometry of densities with the Fisher-Rao metric [1, 7, 10]. The Fisher-Rao metric is used as the measure for similarity due to the property that it is invariant to the action of diffeomorphisms:

(4) |

The linkage between a suitable Sobolev metric and the Fisher-Rao metric allows for evaluation of the distance in the space of diffeomorphisms in closed form. The Fisher-Rao metric, an incompressibility measure, and surrogate rank measure can then be used to match a set of densities by minimizing the energy functional:

(5) |

where is a chosen base or reference density and are the other densities in the series. The first term here penalizes dissimilarity between the two densities. The second term penalizes deviations from a volume-preserving deformation. The penalty function acts as weighting of the volume-preserving measure. The final term penalizes the surrogate rank of the subspace in which the set of deformations exist. We use a gradient flow and and iterative soft thresholding for optimizing the above energy functional. This optimization produces a related set of rank-optimized deformations with improved geometric accuracy over other registration methods due to the increased physiologic relevance of the low-rank deformations which match well with the general forward-reverse process of the inhale-exhale cycle, along with hysteresis in other components [2].

### 2.3 Digitally Reconstructed Radiographs

For a given 4D-RCCT dataset, it is sufficient for training to simulate the 2D projection images at different respiratory states from the 3D volumes acquired at different states. These digitally reconstructed radiographs (DRR) are commonly used for radiation therapy planning. This allows for simulation of the 2D projections that would be acquired at the different phases of breathing throughout a state-of-the-art radiotherapy treatment [16, 19]. A point in the DRR projection image is defined by the line integral of linear attenuation coefficients (values in the CT volume) over distance from the x-ray source to the projection image [9, 20, 21]. The path can be written as where and is the piercing point, or the unique point in the projection image that is closest to the x-ray source. The DRR operation generates a 2D projection of the 3D CT volume that can be used for training in place of actual radiography recordings. These projections closely approximate the fluoroscopic images acquired during a state-of-the-art radiotherapy treatment session.

## 3 Methodology

### 3.1 Motion Subspace Learning

A 10-phase respiratory correlated 4D-RCCT of a lung cancer patient for treatment planning from the University of Maryland was provided in anonymized form. Each 3D volume in the 4D-RCCT dataset is with resolution mm. These 10 3D CT volumes, each representing a distinct respiratory phase, are used for rank constrained density registration as described by Foote et al. [2] to produce 9 rank-optimized deformation fields that map to the full-exhale phase volume from each of the other phase volumes . The low-rank subspace of the space of diffeomorphisms in which the rank-optimized deformation components exist is determined via principal component analysis (PCA).

### 3.2 Training Dataset Generation

A dataset for training is generated by calculating DRR projections through a deformed reference CT volume The deformation applied to the reference volume is calculated as a linear combination of rank-optimized components. Only the first two rank-optimized components are used as they explain 99.7% of the deformation data (Table 1). The weights for this linear combination are generated as a dense grid of 300 evenly spaced points in first rank-optimized component direction and 200 evenly spaced points in the second (60000 total samples for training). The value of the maximum weight is set as 150% and 110% of the maximum magnitude of the corresponding rank-optimized component for the first and second directions, respectively. These weights serve as the target for training and are then used to calculate a linear combination of the rank-optimized component deformations. The resulting grid of deformations are calculated and applied to the full-exhale 4D-RCCT volume using PyCA [13] (Fig. 1). DRR images are calculated by ray casting and integrating along a cone-beam geometry of the fluoroscope through the deformed full-exhale 4DCT volume [16, 20, 21]. The geometry setup for the DRR generation is similar to the geometry that would be found in a current radiotherapy treatment linear accelerator’s imaging accessories, such as the Varian TrueBeam. Specifically, the x-ray source was positioned laterally from the center of the CT volume. The center of the virtual detector for the projection image was then positioned 150 cm horizontally from the source, through the CT volume, with a pixel array with spatial resolution mm. DRR images were preprocessed before training with variance equalization, normalizing the image intensities to , histogram equalization, and average downsampling.

Number of | Percentage of |
---|---|

Included Components | Dataset Explained |

1 | 0.90115488 |

2 | 0.99739416 |

3 | 0.99865254 |

4 | 0.99973184 |

5 | 0.99989146 |

6 | 0.99996631 |

7 | 0.99999949 |

8 | 1.0 |

### 3.3 Network Architechture

The mapping from rank-optimized component weights to a projection image is highly nonlinear as it includes both a diffeomorphic deformation and radiographic projection. We aim to approximate the inverse of this mapping with a deep convolutional neural network as a multi-output continuous regression problem. A promising network architecture for this application is DenseNet [5]. DenseNet has achieved state-of-the-art performance on several datasets while using fewer parameters and decreasing over-fitting. The direct connections between constant spatial layers allow for reusing features learned from previous layers and improve the flow of gradients throughout the network. Additionally, DenseNet is ideal for real-time applications due to reduced model complexity and depth.

We use an efficient DenseNet model in PyTorch tailored to our application so that single-channel images are used as input [12, 11]. The rank-optimized component weights used to produce the projection image are the regression points (Fig. 2). Summarized here for completeness, the DenseNet architecture convolves each input with a kernel size of 3 and filter number of before being input to the first dense block, where is the growth rate. Each subsequent dense block consists of 8 convolutional layers all with filter size of 3 and a growth rate of 4. The 4 convolutional layers within a dense block are each preceded by a batch normalization layer and a rectified linear unit (ReLU) layer. The output from a dense block enters a transition block where batch normalization and non-linear activation are followed by a convolutional feature map reduction layer (compression=0.5) and spatial reduction factor of 2 in each dimension using max-pooling. A total of 4 dense and transition blocks are used to reduce the spatial dimensions to final feature maps. The final layer of the network was a linear layer that regresses the feature maps to the rank-optimized component weights.

## 4 Results

For our experiments, we use a 10-phase RCCT of a lung radiotherapy patient as the source data. Rank-optimized deformations were calculated as described by Foote et al. and summarized in section 2.2 using the full-exhale (Phase 5) CT volume as a reference. Using PCA, rank-optimized components of these deformations were determined and used to generate a DRR training dataset as described in 3.2. In section 4.1 we outline the specific training procedure for the neural network. Then in section 4.2 we test the neural network against unseen data from a breathing model. Finally, section 4.3 describes generalization of the trained model on the phases of the RCCT dataset that were not used in training dataset generation.

### 4.1 Training and Test Performance

The network is trained for 300 epochs using a loss function with a learning rate of 0.1 and batch size of 2048. The loss independently penalizes each deformation component which avoids preferential fitting of the larger-weighted component at any point. The learning rate is decreased by a factor of 5 when the training loss plateaus within a relative threshold of for the last 20 epochs. The dataset is randomly split 80-20 between training and on-line testing to monitor for over-fitting during training. Training for 300 epochs completes in under 2 hours on a single Nvidia Quadro GV100 GPU.

### 4.2 Spline Model Deformation Validation

A motion-subspace-based breathing model was created using a spline interpolation of the first two rank-optimized component weights from the 9 rank-optimized deformations (Fig. 3). At any point along the model curve, the coordinate weights provide a linear combination of the rank-optimized component deformations to apply to the reference full-exhale volume to obtain a model volume for the corresponding breathing phase. DRR projections through these model volumes in the same manner as the training dataset generation produced fluoroscopic images for evaluation. As with the training images, these projection images are preprocessed with variance equalization, normalization, histogram equalization, and downsampling. Evaluation of these images through the trained network recovers the weights of the rank-optimized components. Inference on a single Nvidia Quadro GV100 GPU has a throughput of 1113 images/second with the same PyTorch implementation – significantly faster than real-time. The relation of these weights to the original spline model is shown in Figure 3.

Accuracy of the deformation recovered from the inferred weights is measured by maximum deformation distance error compared to the reconstruction of these deformations with the two rank-optimized deformation component weights directly from the model. From the 40 model points, the maximum error between the applied deformation and the recovered deformation was 1.22 mm.

### 4.3 RCCT Phase Patient Data and Geometric Validation

Until this point, DRR images derived from 9 of the 10 original 4DCT volumes have not been used as input for the network for either training or previous testing. Rather than a synthetic spline model of respiratory phases, we now use the original CT volumes captured at 9 stages of the respiratory cycle. These CT volumes inherently contain deformations that have not been included in any training or evaluation to this point, as the rank-constrained motion estimation produces imperfect deformations to these CT images. No deformation is applied to these CT volumes as each volume represents this intrinsic respiratory deformation relative to the full-exhale anatomical state. Evaluation of our framework on these intrinsic deformations effectively projects the true deformation into our 2-dimensional motion subspace.

Exactly as in training dataset generation, DRR images through these (undeformed) volumes are calculated and preprocessed with variance equalization, normalization, histogram equalization, and downsampling. Evaluation of the network gives rank-optimized deformation component weights describing the intrinsic deformations that can be compared in the subspace of motion component weights with the weights directly recovered from original rank-constrained density motion estimation (Fig. 4).

These weights produced by the network are used to reconstruct a deformation field as a linear combination of the rank-optimized deformation component fields. We calculate and report the error as a difference from the original rank-constrained motion estimation deformation distance that is recovered by both our deep learning framework in Figure 5. Errors for the deformation recovered at each CT phase are also calculated against the deformations from rank-constrained motion estimation (Table 2).

Phase | Average Distance Error (mm) | Maximum Distance Error (mm) |
---|---|---|

0 | 0.136 | 3.92 |

1 | 0.212 | 3.15 |

2 | 0.132 | 2.58 |

3 | 0.291 | 8.98 |

4 | 0.223 | 6.62 |

6 | 0.201 | 4.15 |

7 | 0.286 | 6.96 |

8 | 0.252 | 9.55 |

9 | 0.183 | 5.07 |

## 5 Discussion

In this paper, we have shown that estimation of anatomic position in the presence of breathing variability is possible with the combination of (1) rank-constrained density motion estimation for determination of motion components and (2) deep learning for subsequent identification of the weights of motion components in real-time. While this framework is extensible to dimensions higher than 2, the rank-constrained nature of the deformation fields produces accurate results using only two dimensions. This level of accuracy is a trade-off against the curse of dimensionality in both the dataset size and computational cost of training; however, using rank-constrained deformation components increases the accuracy of resulting deformations with the same number of dimensions.

The speed and accuracy attained by this framework is suitable for inclusion as a tumor position monitoring component of a conformal radiation therapy system. Determination of 3D tumor location from noninvasive 2D radiographic projections via deep learning instead of variational optimization approaches to the 2D-3D deformation determination problem provides real-time results for conformal radiation therapy for lung tumors.

## Acknowledgements

This work was partially supported through research funding from the National Institute of Health (R01 CA169102 and R03 EB026132) and the Huntsman Cancer Institute. The authors are grateful for the support of NVIDIA Corporation by providing the GPU used for this research.

## References

- [1] Bauer, M., Joshi, S., Modin, K.: Diffeomorphic Density Matching by Optimal Information Transport. SIAM Journal on Imaging Sciences 8(3), 1718–1751 (jan 2015). https://doi.org/10.1137/151006238, http://epubs.siam.org/doi/10.1137/151006238
- [2] Foote, M., Sabouri, P., Sawant, A., Joshi, S.: Rank Constrained Diffeomorphic Density Motion Estimation for Respiratory Correlated Computed Tomography. In: Cardoso, M., Arbel, T. (eds.) LNCS. vol. 10551, pp. 177–185. Springer Verlag (2017). https://doi.org/10.1007/978-3-319-67675-3_16, http://link.springer.com/10.1007/978-3-319-67675-3
- [3] Geneser, S., Hinkle, J., Kirby, R., Wang, B., Salter, B., Joshi, S.: Quantifying variability in radiation dose due to respiratory-induced tumor motion. Medical Image Analysis 15(4), 640–649 (aug 2011). https://doi.org/10.1016/j.media.2010.07.003
- [4] Ha, I.Y., Wilms, M., Handels, H., Heinrich, M.P.: Model-Based Sparse-to-Dense Image Registration for Realtime Respiratory Motion Estimation in Image-Guided Interventions. IEEE Transactions on Biomedical Engineering 66(2), 302–310 (feb 2019). https://doi.org/10.1109/TBME.2018.2837387
- [5] Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2261–2269. IEEE (jul 2017). https://doi.org/10.1109/CVPR.2017.243
- [6] Keall, P.J., Joshi, S., Vedam, S.S., Siebers, J.V., Kini, V.R., Mohan, R.: Four-dimensional radiotherapy planning for DMLC-based respiratory motion tracking. Medical Physics 32(4), 942–951 (mar 2005). https://doi.org/10.1118/1.1879152, http://doi.wiley.com/10.1118/1.1879152
- [7] Khesin, B., Lenells, J., Misiołek, G., Preston, S.C.: Geometry of Diffeomorphism Groups, Complete integrability and Geometric statistics. Geometric and Functional Analysis 23(1), 334–366 (feb 2013). https://doi.org/10.1007/s00039-013-0210-2
- [8] Li, R., Lewis, J.H., Jia, X., Zhao, T., Liu, W., Wuenschel, S., Lamb, J., Yang, D., Low, D.A., Jiang, S.B.: On a PCA-based lung motion model. Physics in Medicine and Biology 56(18), 6009–6030 (sep 2011). https://doi.org/10.1088/0031-9155/56/18/015
- [9] Markelj, P., Tomaževič, D., Likar, B., Pernuš, F.: A review of 3D/2D registration methods for image-guided interventions. Medical Image Analysis 16(3), 642–661 (apr 2012). https://doi.org/10.1016/j.media.2010.03.005, https://linkinghub.elsevier.com/retrieve/pii/S1361841510000368
- [10] Modin, K.: Generalized Hunter–Saxton Equations, Optimal Information Transport, and Factorization of Diffeomorphisms. Journal of Geometric Analysis 25(2), 1306–1334 (apr 2015). https://doi.org/10.1007/s12220-014-9469-2
- [11] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS-W (2017)
- [12] Pleiss, G., Chen, D., Huang, G., Li, T., van der Maaten, L., Weinberger, K.Q.: Memory-Efficient Implementation of DenseNets. arXiv preprint (jul 2017), http://arxiv.org/abs/1707.06990
- [13] Preston, J., Hinkle, J., Singh, N., Rottman, C., Joshi, S.: PyCA: Python for Computational Anatomy, https://bitbucket.org/scicompanat/pyca
- [14] Rottman, C., Bauer, M., Modin, K., Joshi, S.C.: Weighted Diffeomorphic Density Matching with Applications to Thoracic Image Registration. 5th MICCAI Workshop on Mathematical Foundations of Computational Anatomy (MFCA 2015) pp. 1–12 (2015)
- [15] Rottman, C., Larson, B., Sabouri, P., Sawant, A., Joshi, S.: Diffeomorphic Density Registration in Thoracic Computed Tomography. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9902 LNCS, 46–53 (2016). https://doi.org/10.1007/978-3-319-46726-9_6, http://link.springer.com/10.1007/978-3-319-46726-9_6
- [16] Rottman, C., McBride, L., Cheryauka, A., Whitaker, R., Joshi, S.: Mobile C-arm 3D Reconstruction in the Presence of Uncertain Geometry. In: LNCS, vol. 9350, pp. 692–699. Springer Verlag (2015). https://doi.org/10.1007/978-3-319-24571-3_83
- [17] Sabouri, P., Foote, M., Ranjbar, M., Tajdini, M., Mossahebi, S., Joshi, S., Sawant, A.: A Novel Method Using Surface Monitoring to Capture Breathing-Induced Cycle-To-Cycle Variations with 4DCT. In: 59th Annual Meeting of The American Association of Physicists in Medicine. Denver, CO (2017), http://www.aapm.org/meetings/2017AM/PRAbs.asp?mid=127&aid=37742
- [18] Sawant, A., Keall, P., Pauly, K.B., Alley, M., Vasanawala, S., Loo, B.W., Hinkle, J., Joshi, S.: Investigating the feasibility of rapid MRI for image-guided motion management in lung cancer radiotherapy. BioMed Research International 2014 (2014). https://doi.org/10.1155/2014/485067
- [19] Sawant, A., Venkat, R., Srivastava, V., Carlson, D., Povzner, S., Cattell, H., Keall, P.: Management of three-dimensional intrafraction motion through real-time DMLC tracking. Medical Physics 35(5), 2050–2061 (2008). https://doi.org/10.1118/1.2905355
- [20] Sherouse, G.W., Novins, K., Chaney, E.L.: Computation of digitally reconstructed radiographs for use in radiotherapy treatment design. International Journal of Radiation Oncology, Biology, Physics 18(3), 651–658 (1990). https://doi.org/10.1016/0360-3016(90)90074-T
- [21] Staub, D., Murphy, M.J.: A digitally reconstructed radiograph algorithm calculated from first principles. Medical Physics 40(1) (2013). https://doi.org/10.1118/1.4769413
- [22] U.S. Cancer Statistics Working Group: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999-2014 Incidence and Mortality Web-based Report. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Tech. rep., Centers for Disease Control and Prevention and National Cancer Institute (2017), https://nccd.cdc.gov/uscs/