# Metric-Driven Learning of Correspondence Weighting for 2-D/3-D Image Registration

## Abstract

Registration for pre-operative 3-D images to intra-operative 2-D fluoroscopy images is important in minimally invasive procedures. Registration can be intuitively performed by estimating the global rigid-body motion with constraints of minimizing local misalignments. However, inaccurate local correspondences challenge the registration performance. We use PointNet to estimate the optimal weights of local correspondences. We train the network directly with the criterion to minimize the registration error. For that, we propose an objective function which incorporates point-to-plane motion estimation and projection error computation. Thereby, we enable the learning of a correspondence weighting strategy which optimally fits the underlying formulation of the registration problem in an end-to-end fashion. In the evaluation of single-vertebra registration, we demonstrate an accuracy of 0.740.26 mm of our method and a highly improved robustness, increasing the success rate from 79.3 % to 94.3 % and the capture range from 3 mm to 13 mm.

###### Keywords:

2-D/3-D Registration, Deep Learning, PointNet, Minimally Invasive Interventions, Spine Surgery## 1 Introduction

Image Fusion is frequently involved in modern image-guided interventions, typically augmenting intra-operative 2-D X-ray images with overlaid pre-operative 3-D CT or MRI images. Accurate alignment between the fused images is essential for clinical applications and can be achieved using 2-D/3-D rigid registration, which aims at finding the pose of a 3-D volume to align its projections to 2-D X-ray images. Most commonly, intensity-based methods are employed, where a similarity measure between the 2-D image and the projection of the 3-D image is defined and optimized [1]. Despite decades of investigations, 2-D/3-D registration remains challenging. The difference in dimensions of input images results in an ill-posed problem. In addition, e.g. content mismatch between the preoperative and intraoperative images or poor image quality challenge the robustness of registration algorithms. Miao et al. [2] proposed a learning-based registration method that is build upon the similarity-based approach. While they achieve a high robustness, registration accuracy remains challenging.

The intuition of 2-D/3-D rigid registration is to globally minimize the visual misalignment between 2-D images and the projections of the 3-D image. Based on this intuition, Schmid and Chênes [3] decompose the target structure to local shape patches and model image forces using Hooke’s law of a spring from image block matching. Wang et al. [4] propose a point-to-plane correspondence (PPC) model for 2-D/3-D registration, which linearly constrains the global differential motion update using local correspondences. During the intervention, devices and implants, as well as locally similar anatomies, can introduce outliers for local correspondence search (see Fig. (a)a and (b)b). Weighting of local correspondences, in order to emphasize the correct correspondences, directly influences the accuracy and robustness of the registration. An iterative reweighted scheme is suggested in [4] to enhance the robustness against outliers. However, this scheme only works when outliers are a minority of the measurements.

Recently, Qi et al. [5] proposed the PointNet, a type of neural network directly processing point clouds. PointNet is capable of internally extracting global features of the cloud and relating them to local features of individual points. Thus, it is well suited for correspondence weighting in 2-D/3-D registration.

In this paper, we propose to use a modified PointNet to learn global dependencies of local correspondence features according to the point-to-plane registration metric introduced by Wang et al. [4]. The main novelty lies in the use of our proposed objective function, which is composed of the motion estimation according to the PPC model and registration error computation steps. It allows us to learn a correspondence weighting strategy by minimizing the registration error, without the need of per-correspondence ground truth weights. We treat the correspondences as a point cloud with extended per-point features and train a modified PointNet to weight the correspondences using our objective function. Our method is evaluated on single-vertebra registration. We demonstrate a highly improved robustness compared to the original PPC registration.

## 2 Materials and Methods

Wang et al. [4] measure the local misalignment between the projection of the 3-D image and the 2-D image and compute a motion which compensates for this misalignment. Surface points are extracted from using the 3-D canny detector. Apparent contour points , i. e. surface points which correspond to contours in the projection of , are projected onto the image plane as . Additionally, gradient projection images of are generated and used to perform local patch matching to find correspondences for in . Assuming that motion along contours is not detectable, the authors perform patch matching only in orthogonal direction to the contour. Therefore, the displacement of along the contour is not known, as well as the displacement along the viewing direction (image depth). These unknown directions span the plane with the normal . After the registration, should be located on . To minimize the point-to-plane distances , a linear equation is defined for each correspondence under the small angle assumption. The resulting system of equations is solved for the differential motion , which contains both rotational and translational components. The method is applied iteratively over multiple resolution levels. To increase the robustness of the motion estimation, the maximum correntropy criterion for regression (MCCR) is used to solve the system of linear equations [4]. The motion estimation is extended to coordinate systems related to the camera coordinates by a rigid transformation by Schaffert et al. [6].

### 2.1 Correspondence Weighting Learning

We aim to train the weighting of the local correspondences directly using the global criterion to minimize the registration error. To achieve this, we formulate motion estimation as well as error computation as part of our objective function. A modified PointNet [5] is trained using the objective function to weight the individual correspondences.

#### Weighted Motion Estimation

Motion estimation according to the PPC model is performed by solving a linear system of equations defined by and , where each equation corresponds to one point-to-plane correspondence. We perform the motion estimation centered at the centroid of . This allows us to use the regularized least-squares estimation

(1) |

in order to improve the robustness of the estimation. Here, and . The diagonal matrix contains correspondence weights . As Eq. (1) is differentiable w. r. t. , we obtain

(2) |

#### Projection Error Computation

The projection error (PE) [7] corresponds to the visible misalignment in the images and therefore roughly correlates to the difficulty to find correspondences by patch matching for the next registration iteration. It is computed as

(3) |

where a set of N target points is used and is the point index. is the projection onto the image plane under the estimated motion and the projection under the ground truth registration matrix .

#### Network Architecture

To learn directly on correspondence sets, we use a modified PointNet [5]. In the simplest variant, the PointNet consists of a Multi-Layer Perceptron (MLP) which transforms the given features for each point independently of other points. To describe the global properties of the point set, the resulting local descriptors are combined by Max-Pooling over all points. To obtain per-point outputs, the resulting global descriptor is concatenated to the local descriptors of each point and further processed for each point independently by another MLP. The input of our network is a set of per-correspondence feature vectors and the output are the respective per-correspondence weights. For our network, we choose MLPs with the size of and , which are smaller than in the original network [5]. We enforce the output to be in the range of by using a modified softsign activation function [8] in the last layer. Additionally, we introduce a global trainable weighting factor which is applied to all correpondences. This allows for an automatic adjustment of the strength of the regularization. During training, the number of input correspondences is fixed to 1024 for efficient batch-wise computations. As the network can process inputs of variable size, all correspondences are used during actual registration.

As we want to weight correspondences based on their geometrical relations as well as the image similarity, we use the following per-correspondence features:

(4) |

where NGC denotes the Normalized Gradient Correlation for the correspondences, which is obtained by local patch matching.

#### Training Objective

We now combine the motion estimation, PE computation and the modified PointNet to obtain the objective function as

(5) |

where is our network, the learned network parameters and the training sample index. Note that using Eq. (5), we learn directly with the objective to minimize the registration error and no per-correspondence ground truth weights are needed. Equation (2) is differentiable with respect to and Eq. (3) with respect to . Therefore, gradient-based optimization can be performed on Eq. (5).

#### Training Procedure

To obtain training data, a set of volumes is used, each with one or more 2-D images and a known . For each pair of images, multiple random initial transformations with an uniformly distributed mean target registration error (mTRE) are generated [9]. The correspondence search is performed once and the precomputed correspondences are used during training.

## 3 Experiments and Results

#### Compared Methods

We perform experiments for single-view registration of single vertebrae. To show the effectiveness of the correspondence weighting, we compare the proposed method (PPC-L) to the original PPC method. The compared methods differ in the computation of the correspondence weights and the regularizer weight . In the case of learned correspondence weighting (PPC-L), and . For PPC, we set and are the NGC values of the found correspondences, where any value below is set to , i. e. the correspondence is rejected. Additionally, the MCCR is used in the PPC method only. The minimum resolution level has a scaling of 0.25 and the highest a scaling of 1.0. For the PPC method, registration is performed on the lowest resolution level without allowing motion in depth first. This increases the robustness of the method. To differentiate between the effect of the correspondence weighting and the regularized motion estimation, we also consider registration using regularized motion estimation. We use a variant where the data weight is matched to the regularizer weight automatically by using our objective function (PPC-R). For the different resolution levels, we obtained a point weighting factor in the range . Therefore, we use and . Additionally, we empirically set the correspondence weight to , which increases the robustness of the registration while still allowing for a reasonable amount of motion (PPC-RM).

#### Data

We use clinical C-arm CT acquisitions from the thoracic and pelvic regions of the spine for training and evaluation. The ground truth is provided by the system calibration (with an accuracy of mm for the projection error at the iso-center). We register the projection images (resolution of , pixel size of 0.62 mm) to the reconstructed volumes. The training set consists of 19 acquisitions with a total of 77 vertebrae. For each vertebra, 8 different 2-D images are used. An additional validation set of 23 vertebrae from 6 acquisitions is used to monitor the training process. The registration is performed on a test set of 6 acquisitions. For each acquisition, 2 vertebrae are evaluated and registration is performed independently for both the anterior-posterior and the lateral views. In both, training and test data, external devices are present in some of the 2-D images. Individual vertebrae are selected using a volume of interest around them. To simulate realistic conditions, we add poisson noise to all 2-D images and rescale the intensities to better match fluoroscopic images.

#### Evaluation Metrics

To evaluate the registration, we follow the standardized evaluation methodology [9, 10] and compute the mean reprojection distance (mRPD), success rate (SR) and capture range (CR) with a success mRPD threshold of 2 mm. Additionally, we compute the gross success rate (GSR) [2] as well as a gross capture range (GCR) with a success criterion of 10 mm in order to further assess the robustness of the methods. We generate random start transformations in a range of 0 mm - 30 mm using a modified version of the method described by van de Kraats et al. [9].

#### Method Comparison

The accuracy (mRPD) as well as robustness (SR, CR, GSR and GCR) for the compared methods are summarized in Tab. 1. We observe that PPC-L achieves the best SR of 94.3 % and CR of 13 mm. Compared to PPC (SR of 79.3 % and CR of 3 mm), PPC-R also achieves a higher SR of 88.1 % and CR of 6 mm. For the regularized motion estimation, the accuracy decreases for increasing regularizer influence (0.790.22 mm for PPC-R and 1.180.42 mm for PPC-RM), compared to PPC (0.750.21 mm) and PPC-L (0.740.26 mm). A sample registration result using PPC-L is shown in Fig. (d)d.

Method | mRPD (meanstd.) [mm] | SR [%] | CR [mm] | GSR [%] | GCR [mm] |
---|---|---|---|---|---|

PPC | 0.750.21 | 79.26 | 3 | 81.83 | 3 |

PPC-R | 0.790.22 | 88.08 | 6 | 90.72 | 6 |

PPC-RM | 1.180.42 | 59.63 | 4 | 95.13 | 20 |

PPC-L | 0.740.26 | 94.34 | 13 | 96.28 | 22 |

For strongly regularized motion estimation, we observe a large difference of GSR and SR for strong regularization. While for PPC-R, the difference is only 2.6 %, it is very high for PPC-RM. Here a GSR of 95.1 % is achieved, while the SR is 59.6 %. This indicates that while the method is robust, the accuracy is low. Compared to the CR, the GCR is increased for PPC-L (22 mm vs. 13 mm) and especially for PPC-RM (20 mm vs. 4 mm). Overall, this shows that while some inaccurate registrations are present in PPC-L, they are very common for PPC-RM.

#### Single Iteration Evaluation

To better understand the effect of the correspondence weighting and regularization, we investigate the registration results after one iteration on the lowest resolution level. In Fig. 2, the PE in pixels for the points is shown for all cases in the validation set. As in training, 1024 correspondences are used per case for all methods. We observe that for PPC, the error has a high spread, where for some cases, it is decreased considerably, while for other cases, it is increased. For PPC-R, most cases are below the initial error. However, the error is decreased only marginally, as the regularization prevents large motions. For PPC-L, we observe that the error is drasticly decreased for most cases. This shows that PPC-L is able to estimate motion efficiently. An example for correspondence weighting in PPC-L is shown in Fig. (c)c, where we observe a set of consistent correspondences with high weights, while the remaining correspondences have low weights.

#### Method Combinations

We observed that while the PPC-RM method has a high robustness (GCR and GSR), it leads to low accuracy. For PPC-L, we observed an increased GCR compared to the CR. In both cases, this demonstrates that registrations are present with a mRPD of 2 mm-10 mm. As the PPC works reliaby for small initial errors, we combine these methods with PPC by performing PPC on the highest resolution level instead of the respective method. We denote the resulting methods as PPC-RM+ and PPC-L+. We observe that PPC-RM+ achieves an accuracy of 0.740.18 mm, an SR of 94.6 % and a CR of 18 mm, while PPC-L+ achieves an accuracy of 0.740.19 mm, an SR of 96.1 % and a CR of 19 mm. While the results are similar, we note that for PPC-RM+ a manual weight selection is necessary. Further investigations are needed to clarify the better performance of PPC compared to PPC-L on the highest resolution level. However, this result may also demonstrate the strength of MCCR for cases where the majority of correspondences are correct. We evaluate the convergence behavior of PPC-L+ and PPC-RM+ by only considering cases which were successful. For these cases, we investigate the error distribution after the first resolution level. The results can be found in Fig. 3. We observe that for PPC-L+, a mRPD of below 10 mm is achieved for all cases, while for PPC-RM+, higher misalignment of around 20 mm mRPD is present. The result for PPC-L+ is achieved after an average of 7.6 iteartions, while 11.8 iterations were performed on average for PPC-RM+ using the stop criterion defined in [4]. In combination, this further substantiates our findings in the single iteration evaluation and shows the efficiency of PPC-L and its potential for reducing the computational cost.

## 4 Conclusion

For 2-D/3-D registration, we propose a method to learn the weighting of local correspondences directly from the global criterion to minimize the registration error. We achieve this by incorporating the motion estimation and error computation steps into the objective function. We train an modified PointNet network to weight correspondences based on their geometrical properties and image similarity. We show that our method is able to effectively improve the registration robustness for single vertebra registration. We demonstrate that the learning-based correspondence weighting greatly improves the robustness of the registration while maintaining the high accuracy. While a high robustness can also be achieved by regularized motion estimation, registration using learned correspondence matching is more efficient, does not need manual parameter tuning and achieves a high accuracy. The focus of future work is to further improve the weighting method, e. g. by including more information into the decision process and optimizing the objective function for robustness and/or accuracy depending on the stage of the registration, e. g. the current resolution level.

Disclaimer: The concept and software presented in this paper are based on research and are not commercially available. Due to regulatory reasons its future availability cannot be guaranteed.

### References

- Markelj, P., Tomaževič, D., Likar, B., Pernuš, F.: A Review of 3D/2D Registration Methods for Image-Guided Interventions. Med. Image Anal. 16(3) (2012) 642–661
- Miao, S., Piat, S., Fischer, P., Tuysuzoglu, A., Mewes, P., Mansi, T., Liao, R.: Dilated FCN for Multi-Agent 2D/3D Medical Image Registration. In: AAAI. (2018)
- Schmid, J., Chênes, C.: Segmentation of X-ray Images by 3D-2D Registration based on Multibody Physics. In: ACCV. (2014)
- Wang, J., Schaffert, R., Borsdorf, A., Heigl, B., Huang, X., Hornegger, J., Maier, A.: Dynamic 2-D/3-D Rigid Registration Framework Using Point-To-Plane Correspondence Model. TMI 36(9) (2017) 1939–1954
- Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In: CVPR. (2017)
- Schaffert, R., Wang, J., Fischer, P., Borsdorf, A., Maier, A.: Multi-View Depth-Aware Rigid 2-D/3-D Registration. In: NSS/MIC. (To appear)
- Wang, J., Borsdorf, A., Heigl, B., Köhler, T., Hornegger, J.: Gradient-Based Differential Approach for 3-D Motion Compensation in Interventional 2-D/3-D Image Fusion. In: 3DV. (2014) 293–300
- Elliott, D.L.: A better activation function for artificial neural networks. Technical report (1993)
- van de Kraats, E.B., Penney, G.P., Tomaževič, D., van Walsum, T., Niessen, W.J.: Standardized Evaluation Methodology for 2-D-3-D Registration. TMI 24(9) (2005) 1177–1189
- Mitrović, U., Spiclin, Z., Likar, B., Pernuš, F.: 3D-2D Registration of Cerebral Angiograms: A Method and Evaluation on Clinical Images. TMI 32(8) (2013) 1550–1563