Convolutional neural network based automatic plaque characterization from intracoronary optical coherence tomography images
Optical coherence tomography (OCT) can provide high-resolution cross-sectional images for analyzing superficial plaques in coronary arteries. Commonly, plaque characterization using intra-coronary OCT images is performed manually by expert observers. This manual analysis is time consuming and its accuracy heavily relies on the experience of human observers. Traditional machine learning based methods, such as the least squares support vector machine and random forest methods, have been recently employed to automatically characterize plaque regions in OCT images. Several processing steps, including feature extraction, informative feature selection, and final pixel classification, are commonly used in these traditional methods. Therefore, the final classification accuracy can be jeopardized by error or inaccuracy within each of these steps. In this study, we proposed a convolutional neural network (CNN) based method to automatically characterize plaques in OCT images. Unlike traditional methods, our method uses the image as a direct input and performs classification as a single-step process. The experiments on 269 OCT images showed that the average prediction accuracy of CNN-based method was 0.866, which indicated a great promise for clinical translation.
Further author information: (Send correspondence to Hua Li)
Hua Li: E-mail: email@example.com, Telephone: 1 314 537 7145
Optical coherence tomography (OCT) can achieve high-resolution and cross-sectional imaging of the internal microstructure in materials and biologic systems by measuring backscattered and backreflected light . Commonly, for characterizing superficial plaques in inter-coronary arteries, the acquired OCT images are manually differentiated into four types: lipid tissue (LT), fibrous tissue (FT), mixed tissue (MT) and calcified tissue (CA) . However, this manual process is laborious and time consuming. The accuracy also heavily relies on the experience of human observers. To avoid these problems, methods for automatically characterizing plaque types in intracoronary OCT images should be developed.
Recently, traditional machine learning methodologies have been applied to automatically characterize plagues from intracoronary OCT images. [3, 4]. For example, Xiaoya et al.  proposed a least square support vector machine (LS-SVM) based method to only classify LT and FT tissues for analyzing the plaque thickness. They first employed Otsu’s thresholding based method  to detect the whole plaque tissue area. Then, they selected the informative gray level co-occurrence matrices (GLCMs)  and local binary patterns (LBPs) features  for each tissue pixel, and inputed them into a LS-SVM-based classifier for pixel classification. However, only 9 OCT images were processed in their experiment, resulting in possible overfitting, although the reported prediction accuracy was 0.896. Differently, Athanasiou et al.  employed a random forest (RF) based method  to classify plaque tissue into all four types: LT, FT, MT and CA. Their method consists of several steps, such as tissue area selection with Otsu’s thresholding  based method, pixel clustering with K-mean algorithm, informative feature selection based on wrapper feature selection (WRP) , and pixel classification using a RF-based classifier. Although this method tries to characterize more tissue types in OCT images, the complex processing pipeline might prevent it from practical uses.
Currently, deep learning (DL) methods have had a profound impact on computer vision and image analysis applications, such as image classification [11, 12], segmentation , image completion  and so on. Convolutional neutral network (CNN) based deep neural nework, as the most commonly employed DL method, has the advantage of automatically and intensively extracting features directly from images. In this study, we employ a CNN-based DL method to automatically characterize plaque tissues from introcoronary OCT images and address the issues that limit traditional methods.
2.1 Overview of CNN-based method
As shown in Figure 1, our CNN-based automatic plaque characterization method includes two steps: tissue area detection and CNN-based pixel classification. First, we used Otsu’s automatic thresholding  based method to detect the tissue area in an OCT image. Second, we used a CNN-based classifier to classify each pixel in the tissue area into five different tissue categories: LT, FT, MT, CA and background (BK). The BK pixel was defined as the pixel that did not belong to any of the other four tissue types. In the following subsections, we will explain these two steps in detail.
2.2 Step 1: Tissue area extraction
The individual A-lines acquired by an OCT systems contain the information of the reflected optical energy as a function of time . These A-lines are stored sequentially in a 2-D polar OCT image with each element corresponding to a polar intensity data. In a polar OCT image , the top part corresponds to the area near gravitational center of the tissue, while the bottom part corresponds to the outer area outside the tissue. In each of these images, there are some catheter artifact pixels located outside the tissue area. In order to reduce their interferences on the accuracy of the pixel classification , we first need to remove these catheter pixels and keep only the tissue area. This tissue area extraction procedure includes two steps: lumen border detection and border expansion.
To detect the lumen border, we first performed Otsu’s automatic thresholding  to remove catheter artifact pixels. With such procedure, we obtained a binary image that contains only the zero pixels and nonzero pixels. Afterwards, we scanned each column in from the top (gravitational center) to the bottom (outer area), and stored the first nonzero pixel in each of these columns. Finally, these stored nonzero pixels were connected to form the detected lumen border.
After extracting the lumen border (inner border), we expanded mm, as presented in the reference , starting from this border towards the bottom (outer area), and obtained another border (outer border). The area between these two borders in the original polar OCT image was considered as the detected tissue area. Finally, in order to apply CNN-based classifier to these polar OCT images for pixel classification, we transformed these images from polar to Cartesian coordinates. Due to the border extension, some background pixels were included in this tissue area. As a result, we classified all pixels in this OCT image into 5 tissue types: LT, FT, MT, CA, and BK.
2.3 Step 2: CNN-based pixel classification
Having the extracted tissue area, we next employed a CNN-based method to classify each pixel in this tissue area into one of the five tissue types: LT, FT, MT, CA and BK. As shown in Figure 1, the input of the classifier is an image patch with the to-be-classified pixel at the center of this patch, and the classifier’s outputs are five scores which denote the probabilities that each to-be-classified pixel belongs to the LT, FT, MT, CA, and BK classes, respectively.
Our CNN-based classifier can be modeled as a nonlinear function, , which maps a 2-D image patch to a vector , where is the size of the OCT image patch. Here, each denotes the probability of the current image patch belonging to the -th tissue category. The mapping also depends on the set of parameters , where was the total number of trainable parameters in our classifier.
The network architecture design, network training strategy and data preprocessing strategy of our CNN-based classifier are presented in Sections 2.3.1, 2.3.2, 2.3.3, respectively. The network architecture design (in Section 2.3.1) determines the classifier mapping model and specifies . The network training strategy (in Section 2.3.2) describes how to configure values for all the parameters in . The data preprocessing strategy (in Section 2.3.3) introduces the way we generated the training sample for our classifier training, and validation sample for classifier validation.
An architecture of CNN-based classifier
Generally, a CNN-based deep neural network consists of a number of convolutional (CONV) layers followed by a number of fully connected (FC) layers. The CONV layers extract the high-level features from an image patch, then the classification is performed on these features by use of the FC layers. In this study, trial and error method was used to identify the CNN architecture to avoid the overfitting problem, and the number of CONV layers and that of FC layers were set to 9 and 2, respectively.
The network architecture employed in this study is shown in Figure 2. For description convenience, we defined a CONV block as a sequence of layers, which consisted of a CONV layer, a batch normalization layer, and a ReLu layer. As shown in 2, our network architecture contained 9 CONV blocks and 2 FC layers. 2 max pooling layers were placed after the 3rd and the 6th CONV blocks, respectively. 1 global pooling layer was placed after the 9th CONV block. The spatial support of the filter in each of the CONV layers was set as pixels. The number of the filters in first three CONV layers was set to . In order to compensate for the information loss caused by max pooling, the number of filters in 2nd three CONV layers and 3rd three CONV layers were set to and , respectively. Two FC layers followed the global pooling layer. The first FC layer included 512 neurons and the second one included 5 neurons. One dropout was set between these two FC layers with a dropout ratio 0.5 to further avoid overfitting. A softmax layer was placed at the end of our classifier to produce probability scores. The input of our CNN-based clssifier was OCT image patch (described in 2.3.3). The outputs of the classifier were 5 probability-like scores.
For this network architecture, the CNN mapping function is then fixed with .
The training strategy of our CNN-based classifier
Given a set of training data, the goal of classifier training is to find a set of parameters that minimizes a loss function that quantifies the average error between the true category of the training data and the category predicted by the classifier.
In this study, the training dataset consisted of image patches . Each image patch was categorized as one of the five tissue types: BK, LP, FT, MT and CA, and corresponds to a one-hot label vector as defined in Table 1. The cross-entropy loss function was employed:
where is the weight for the -th training data. For a given , if this patch belongs to class , will be defined in Eq. 2,
where is the number of training data that belong to class . The weight was utilized to compensate for the fact that the training data with minor classes have less opportunities to update the classifier parameters.
The training of our classifier can be defined as an nonlinear optimization problem:
where denotes the value of at -th iteration, is the learning rate which controls the speed of update, and momentum determines the degree that the previous gradients are incorporated into the current update. is the gradient provided by one batch of training data at the -th iteration, which can be calculated by use of the backpropogation algorithm . In this study, the learning rate and momentum were set to 0.0001 and 0.9, respectively. The batch size was set to 216.
|Class (Tissue type)||Label|
|class 1 (BK)|
|class 2 (LP)|
|class 3 (FT)|
|class 4 (MT)|
|class 5 (CA)|
The training and validation data employed in our classifier training were image patches generated from Cartesian-coordinate OCT images.
At each iteration of parameter update defined in Eq. 4, we randomly extracted a patch with size from each of OCT images in the training set. Each image patch and its corresponding class label were paired as a training sample. These generated training samples were formed as a training batch to update the parameters. To mitigate overfitting, we augmented the OCT images for every 200 iterations by using image rotation with a random degree in range .
Additionally, at each iteration, a set of validation samples were generated by randomly extracting image patches from each of the OCT images in the validation set. These validation samples were used for model selection during the classifier training. The training of our CNN-based classifier took about 3 millions of iterations. During this period, the parameters that resulted in the best prediction accuracy on the set of validation samples were considered as best parameters, and these parameters will be used for the performance evaluation of the classifier.
3 Experimental results
The image set used in our experiment contained 269 OCT images acquired from 22 patients. Each OCT image had a ground truth counterpart, which indicated the class label for every pixel in this OCT image. These ground truth data were manually established by expert observers. The fractions of pixels in each class in the whole ground truth data are shown in Figure 3.
The training and validation of our CNN-based classifier were performed on a NVIDIA Titan X GPU with 12GB of VRAM. Software packages used in our experiments included Python 3.4, Keras 2.0 and tensorflow 1.0. In order to evaluate our CNN-based method, we first randomly shuffled OCT images and evenly divided them into 5 non-overlap subsets. Then we performed the 5-fold cross validation  method on these image subsets to avoid the evaluation variance.
In this study, we used a sensitivity metric to evaluate the classification of each tissue type, which was defined by:
where is the number of true positive predictions, while is the false negative predictions. As shown in Figure 4, the average prediction sensitivities for the background and FT tissue classes can both achieve over 0.9. For tissue LP and MT, the average prediction sensitivities are over 0.6. However, the prediction sensitivity for CA tissue type is lowest, this might be due to the tiny ratio ( shown in Figure 3) of the CA pixels in the dataset.
Figure 5 gives two classification examples. It shows that the characterization results with our proposed method are close to the ground truth ones.
In this study, we developed a CNN-based method for automatic plaque characterization on OCT images. Our method can extract informative features directly from OCT image patches for pixel classification. Experimental results showed that the average pixel prediction accuracy was 0.866. We also demonstrated that our proposed method can detect the background and FT tissue regions with a sensitivity of over 0.9. These results show that the CNN-based automatic tissue segmentation method holds great promise for clinical translation. In future, we will acquire more OCT images and use them to retrain our CNN classifier, in order to improve the classification accuracy for LT, MT and CA classes.
- Fujimoto, J. G., Pitris, C., Boppart, S. A., and Brezinski, M. E., “Optical coherence tomography: an emerging technology for biomedical imaging and optical biopsy,” Neoplasia 2(1-2), 9–25 (2000).
- Prati, F., Guagliumi, G., Mintz, G. S., Costa, M., Regar, E., Akasaka, T., Barlis, P., Tearney, G. J., Jang, I.-K., Arbustini, E., et al., “Expert review document part 2: methodology, terminology and clinical applications of optical coherence tomography for the assessment of interventional procedures,” European heart journal 33(20), 2513–2520 (2012).
- Guo, X., Tang, D., Molony, D., Yang, C., Samady, H., Zheng, J., Mintz, G. S., Maehara, A., Wang, L., Pei, X., Li, Z.-Y., Ma, G., and Giddens, D. P., “A segmentation method for intracoronary optical coherence tomography (oct) image based on least squares support vector machine: Vulnerable coronary plaque cap thickness quantification,” Proc. ICCM (2017).
- Athanasiou, L. S., Bourantas, C. V., Rigas, G., Sakellarios, A. I., Exarchos, T. P., Siogkas, P. K., Ricciardi, A., Naka, K. K., Papafaklis, M. I., Michalis, L. K., et al., “Methodology for fully automated segmentation and plaque characterization in intracoronary optical coherence tomography images,” Journal of biomedical optics 19(2), 026009–026009 (2014).
- Otsu, N., “A threshold selection method from gray-level histograms,” IEEE transactions on systems, man, and cybernetics 9(1), 62–66 (1979).
- Kekre, H., Thepade, S. D., Sarode, T. K., and Suryawanshi, V., “Image retrieval using texture features extracted from glcm, lbg and kpe,” International Journal of Computer Theory and Engineering 2(5), 695 (2010).
- Nanni, L., Lumini, A., and Brahnam, S., “Local binary patterns variants as texture descriptors for medical image analysis,” Artificial intelligence in medicine 49(2), 117–125 (2010).
- Suykens, J. A., Van Gestel, T., and De Brabanter, J., [Least squares support vector machines ], World Scientific (2002).
- Liaw, A., Wiener, M., et al., “Classification and regression by random forest,” R news 2(3), 18–22 (2002).
- Hall, M. A. and Holmes, G., “Benchmarking attribute selection techniques for discrete class data mining,” IEEE Transactions on Knowledge and Data engineering 15(6), 1437–1447 (2003).
- He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [Proceedings of the IEEE conference on computer vision and pattern recognition ], 770–778 (2016).
- Zhang, X., Vishwamitra, N., Hu, H., and Luo, F., “Crescendonet: A simple deep convolutional neural network with ensemble behavior,” arXiv preprint arXiv:1710.11176 (2017).
- Badrinarayanan, V., Kendall, A., and Cipolla, R., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv preprint arXiv:1511.00561 (2015).
- Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., and Do, M. N., “Semantic image inpainting with perceptual and contextual losses,” arXiv preprint arXiv:1607.07539 (2016).
- Fercher, A. F., “Optical coherence tomography–development, principles, applications,” Zeitschrift für Medizinische Physik 20(4), 251–276 (2010).
- Bottou, L., “Large-scale machine learning with stochastic gradient descent,” in [Proceedings of COMPSTAT’2010 ], 177–186, Springer (2010).
- Hecht-Nielsen, R. et al., “Theory of the backpropagation neural network,” Neural Networks 1.Supplement-1 , 445â–448 (1988).
- Refaeilzadeh, P., Tang, L., and Liu, H., “Cross-validation,” in [Encyclopedia of database systems ], 532–538, Springer (2009).