# 2D/3D Megavoltage Image Registration Using Convolutional Neural Networks

###### Abstract

We presented a 2D/3D MV image registration method based on a Convolutional Neural Network. Most of the traditional image registration method intensity-based, which use optimization algorithms to maximize the similarity between to images. Although these methods can achieve good results for kilovoltage images, the same does not occur for megavoltage images due to the lower image quality. Also, these methods most of the times do not present a good capture range. To deal with this problem, we propose the use of Convolutional Neural Network. The experiments were performed using a dataset of 50 brain images. The results showed to be promising compared to traditional image registration methods.

###### keywords:

2D/3D Image Registration, Megavoltage Image, Convolutional Neural Networks^{†}

^{†}journal: Journal Name

## 1 Introduction

The correct patient positioning is an essential procedure for the success of radiotherapy treatment. During the preoperative phase, a three-dimensional CT (Computed Tomography) of the planning is used both to set the location of the tumor and to define the treatment parameters (number of sessions, dose, etc.). In subsequent treatment sessions, the patient should be well positioned so that the doses can be distributed at the planned sites. Several techniques are used for guiding the patient position, such as body markings and laser systems, but still positioning errors are a reality. In fact, such errors can reach 10-20 mm in some types of cancer ramsey; SCHUBERT20091260. One of the most effective alternatives to reduce these errors consists of Image-Guided Radiation Therapy (IGPR), in which images are acquired and compared with the 3D image of the planning CT to verify the patient’s positioning before the application of the treatment dose. The comparison between the images has the objective of estimating the displacements necessary for the correction of the patient’s positioning. This task is called Image Registration (IR), which can be performed by visually inspecting the overlapping images (manual registration) or through the use of algorithms that search for the best match between the images (automatic registration). Depending on the machine used, registration can be performed using two-dimensional MegaVoltage (MV) or KiloVoltage (KV) images, or using three-dimensional (Cone-beam CTs) images. Typically, the same machine on which the treatment is performed is capable of producing megavoltage images without any additional apparatus, which makes such an approach still the most widely used. In this approach, a pair of two-dimensional MV images is compared to two-dimensional Digitally Reconstructed Radiographs (DRRs) extracted from the planning CT.

The most used methods for 2D/3D registration are intensity-based, which use optimization algorithms to maximize the similarity between the 2D images and the DRRs. Although these methods can achieve good results for kilovoltage images, the same does not occur for megavoltage images due to the lower image quality. Furthermore, such methods do not have a good capture range, which implies the need for numerous maximization iterations. Since at each iteration a 2D DRR is extracted from the CT, such methods are hardly capable of being executed in real time.

To deal with this problem, recently Miao et. al proposed the use of a hierarchical regression method for real-time X-ray 2D/3D image registration miao2016cnn. The regression is performed applying a Convolutional Neural Network, whose training is performed using two-dimensional artificial X-ray images. The trained model is then able to infer the displacements necessary for the correction of positioning errors without the explicit definition of any measure of similarity. As the problem of inferring the displacements has extremely high complexity, the authors proposed the use of different regressors, which are specialized in different ranges of displacement values. This hierarchical approach, together with iterative execution, proved to be quite effective, presenting results superior to those achieved by methods based on image similarity. Moreover, such results were achieved with few regression iterations (less than ten), which makes it possible to use the real-time approach. In this work, we follow the same approach proposed by Miao et. al. However, we focused on image registration using MV images. In addition, the regression models have significant differences in their architectures which are described in the next sections.

## 2 Proposed 2D/3D Registration Method

The proposed method is based on the concept of regression. Unlike the similarity-based methods, where an optimization algorithm is employed to find displacements that maximize a given measure of similarity, the proposed method uses a regression model whose parameters are learned through machine learning techniques. Figure 1 shows the diagram of this approach. Starting from an initial offset value, the method uses the planning CT to generate an artificial MV image employing the current displacement values. This MV image is then processed by the regression method in conjunction with the test MV image, defining the displacement required for the shifted image to conform to the test image. At each iteration, the result of the regression is used to define the system response. Given the current displacement values, , and the predicted value by the regression model, , the new displacement - the response of the registration method to the iteration - is defined by the accumulation of these values: . Similar to the traditional optimization methods, this process can be performed iteratively until a certain convergence condition is reached ().

The regression model consists of a Convolutional Neural Network, whose parameters are learned with the purpose of estimating the displacement values âânecessary for two-dimensional images to be coupled (match). Through this approach, it is not necessary to fix a measure of similarity to be maximized. From training data, which define known displacement values ââfor image examples, learning algorithms adjust the weights of the network in order to define a model that can be used for arbitrary images. Figure 1 shows the proposed network settings. Given the test image and the artificial MV image produced with the current displacement values, we first compute the difference between the images, extracting from them the Region Of Interest (ROI) defined for the processing. The resulting image is then divided into a 4x4 grid, producing a total of 16 sub-images (or patches). The network input is defined by a three-dimensional image with 16 channels, one for each patch.

The input is then processed by two convolutional processes, which are defined by a convolutional layer with 20 nodes with 5x5 bi-dimensional filters, followed by a normalization layer of batch and one of pooling Pool 2x2 with strides = 2). Next, the network is composed of a dense layer (100% nested), maintaining the volume of the data, which is then flattened and processed by a dense layer with 250 nodes that connect to the output layer which define the two predicted displacement values. Here, we use the ReLU (Rectified Linear Unit) activation function for the convoluted and dense layers.

Using an image database with controlled shifts, the desired outputs for the data presented to the network are used to adjust the layer parameters. This adjustment was performed using the Stochastic Gradient Descending (SGD) algorithm with cost function defined by the Mean Square Error (MSE). The algorithm was executed with a learning rate of 0.0001 (with a decay of the same value), and the value for momentum set to 0.9. Initially, the parameters are initialized using the Glorot glorot2010understanding method.

## 3 Experiments

### 3.1 Dataset

In order to evaluate the proposed method in a scenario close to reality, artificial MV images generated from real CTs were used. Such images were generated following the method proposed by Kieselmann et. al Kieselmann, where the Hounsfield Units (HUs) present in the CT are mapped taking into account the differences present in the attenuation values for energies of 80keV (reference from CT ) to 0.3 MeV (reference from the MV images). Also, the model also describes beam scattering and noise addition, present in MV images. Figure 2 shows examples of DRRs extracted from CTs and artificial MV images generated by the model for brain and pelvic CTs.

In total, 50 images of brain CTs were used for the experiments, the training set of the regression model was composed of 25 images, and the other 25 were used for evaluation. The test set was produced by fixing the displacement values ââin the lateral and longitudinal directions, with the displacement values ââof 5mm, 10mm and 20mm. A total of 6 possible displacements in each direction, generating 36 samples for each test CT. This configuration produced a total of 900 test registration cases. The training set was produced by randomly generating displacements in both directions, following a uniform distribution [-20,20] mm. For each CT, 100 displacements were created by fixing a direction with zero displacement, and 300 displacements were generated with non-zero values. For each CT, 500 training images were generated, giving a total of 12500 samples. Examples of these images are shown in Figure 3.

### 3.2 Comparison with State-of-Art Methods

The proposed method was compared with methods based on similarity which are more widely used. Each of these methods can be described by the applied optimization algorithm and the similarity measure to be maximized. The optimization occurs taking into account an initial solution. At each iteration, new solutions are generated until the value to be optimized converges, that is, until the difference in the value of the similarity of the solution found does not exceed a certain tolerance threshold . In this work, we consider two optimization algorithms: the Powell van2011evaluation method and the Downhill Simplex method of Nelder-Mead rivest2012nonrigid, and three measures of similarity: Mutual Information (MI) zollei20012d , Cross-Correlation (CC) knaan2003effective and Pattern Intensity (PI) russakoff2003evaluation. For both, the six combinations of intensity-based methods and the proposed method, the initial solution was fixed at the origin, and the Region Of Interest (ROI) was defined as the 10 cm side center square. Also, for methods based on similarity, a tolerance value of was used.

### 3.3 Evaluation Metrics

The gold standards are defined by the fixed displacements generated for the test set. We compute the distances between the values of displacements generated by the methods and the expected values. Also, based on the evaluation of the margins of error applied in treatments suzuki2012uncertainty; poulsen2007residual, we set a cut-off threshold of 2 mm for the distance between the displacement and the gold standard. Then, we define that when such distance is above the threshold, a false positive case occurs. The then false positive rate is defined by the ratio of the number of false positives by the number of test records. Here, both the false positive rate and the distribution of distances values were used to compare the methods.

### 3.4 Results

Table 1 shows the obtained results. Both, the mean distance between the displacements obtained by the methods and the False Positive Rate (FPR) are the shown in this Table. In addition, the distributions of the distances between the displacements and the gold standard are described by the percentiles 10, 25, 50, 75 and 90. The results by the methods based on similarity were obtained by the optimization methods applied until the tolerance threshold is reached. For the proposed method, the results until the third iteration are presented. The best results for both approaches are described in bold.

Method | Shift Deviation Percentile (mm) | FPR | |||||
---|---|---|---|---|---|---|---|

10th | 25th | 50th | 75th | 90th | Mean | (%) | |

Simplex-PI | 0.00 | 0.21 | 0.33 | 0.53 | 25.00 | 5.22 | 16.22 |

Simplex-MI | 0.38 | 0.64 | 1.12 | 2.40 | 21.41 | 5.28 | 27.78 |

Simplex-CC | 0.34 | 0.49 | 0.68 | 0.91 | 1.08 | 0.70 | 0.00 |

Powell-PI | 0.19 | 0.40 | 18.21 | 23.83 | 31.00 | 139.10 | 66.89 |

Powell-MI | 0.34 | 0.66 | 1.41 | 20.18 | 27.53 | 10.66 | 44.56 |

Powell-CC | 0.22 | 0.38 | 0.69 | 18.45 | 30.89 | 8.78 | 31.56 |

CNN - 1 iter | 0.46 | 0.77 | 1.14 | 3.08 | 20.90 | 6.37 | 37.22 |

CNN - 2 iters | 0.36 | 0.59 | 0.89 | 2.37 | 16.22 | 4.93 | 26.81 |

CNN - 3 iters | 0.23 | 0.38 | 0.55 | 1.48 | 10.29 | 4.15 | 12.79 |

Among the optimization methods, it was observed that the best results were obtained with the Downhill Simplex method. The Powell method proved unstable, where high deviations were observed for some examples. The worst results were observed with the use of PI similarity, where more than half of the registration presented deviations close to 2 cm. No measure of similarity presented results superior to no results achieved by the Simplex method.

Among similarity measures, the best results were observed with CC similarity. In both optimization methods, a better accuracy was observed using this similarity. By combining the Simplex method with CC similarity, we obtained the best results, arriving at a perfect false positive rate. That is, in none of the tests did the deviation observed by the obtained displacement exceed 2 mm - the mean deviation does not reach 1mm. The second best result achieved by the similarity methods was the Simplex-PI, with a false positive rate of 16.22%. The best performance achieved by the Powell optimization method showed a false positive rate of 31.56% (Powell-CC).

The proposed method presented a clear performance gain at each new iteration performed. With a single iteration, the FPR rate of 37.22% was obtained. In addition, at least half of the test cases showed deviations of at most 1.14mm, showing a good performance achieved by the training of the regression model. In the next iteration, a performance gain of 27.91% was observed, presenting an FPR rate of 26.81%, which indicates that its competitive with the Simplex-MI method. Already in the third iteration, a performance gain of 52.29% was observed, with an FPR rate of 12.79%. In this case, more than 75% of the test cases showed a deviation of less than 2 mm. With an average deviation of 4.15 mm, this result was better than the Simplex-PI method. In this way, it is possible to observe that the proposed method can achieve competitive results with similarity-based methods with very few iterations. Comparing with these methods, depending on the adopted tolerance value, it can reach dozens of iterations, having a method capable of obtaining similar results with fewer iterations is quite attractive, especially from the point of view of speed.

## 4 Conclusions

In the paper, we presented a 2d/3D MV image registration method based on a Convolutional Neural Network. The experiments were performed using a dataset of 50 images from the brain and the results were compared to traditional image registration methods. For the experiments using the CNN, due to computational limitations just 3 interactions of the proposed model were performed. However, the results show to be promising since for each interaction the FPR show a decrease of approximately 30%. Further investigation is needed to evaluate the full potential of the method.

## Acknowledgment

We would like to thank Helen Khoury, Silvio de Barros Melo, Halisson Alberdan Cavalcanti Cardoso for valuable discussion. Also, Dr. Ernesto Roesler and his team from Hospital Português, Lucas Delbem, Karen Pieri, Thiago Fontana for making the CT images available for this research.