Automatic Plaque Detection in IVOCT Pullbacks Using Convolutional Neural Networks

Automatic Plaque Detection in IVOCT Pullbacks Using Convolutional Neural Networks

Nils Gessert, Matthias Lutz, Markus Heyder, Sarah Latus, David M. Leistner, Youssef S. Abdelwahed, and Alexander Schlaefer Both authors contributed equally.N. Gessert, M. Heyder, S. Latus and A. Schlaefer are with the Institute of Medical Technology, Hamburg University of Technology, Am-Schwarzenberg-Campus 3, 21073, Hamburg, Germany (e-mail: {nils.gessert,markus.heyder,sarah.latus,schlaefer} Lutz is with the Universitätsklinikum Schleswig-Holstein, Arnold-Heller-Straße 3, 24105 Kiel, Germany (e-mail: M. Leistner and Y. S. Abdelwahred are with the Charité – Universitätsmedizin Berlin, Hindenburgdamm 30, 12203 Berlin, Germany (e-mail: {david-manuel.leistner,youssef.abdelwahed} (c) 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to

Coronary heart disease is a common cause of death despite being preventable. To treat the underlying plaque deposits in the arterial walls, intravascular optical coherence tomography can be used by experts to detect and characterize the lesions. In clinical routine, hundreds of images are acquired for each patient which requires automatic plaque detection for fast and accurate decision support. So far, automatic approaches rely on classic machine learning methods and deep learning solutions have rarely been studied. Given the success of deep learning methods with other imaging modalities, a thorough understanding of deep learning-based plaque detection for future clinical decision support systems is required. We address this issue with a new dataset consisting of in-vivo patient images labeled by three trained experts. Using this dataset, we employ state-of-the-art deep learning models that directly learn plaque classification from the images. For improved performance, we study different transfer learning approaches. Furthermore, we investigate the use of cartesian and polar image representations and employ data augmentation techniques tailored to each representation. We fuse both representations in a multi-path architecture for more effective feature exploitation. Last, we address the challenge of plaque differentiation in addition to detection. Overall, we find that our combined model performs best with an accuracy of , a sensitivity of and a specificity of . Our results indicate that building a deep learning-based clinical decision support system for plaque detection is feasible.

IVOCT, Deep Learning, Plaque Detection, Transfer Learning.

I Introduction

Coronary heart disease is a common cause of death despite being preventable and treatable. Early detection of atherosclerotic plaque deposits can be critical for appropriate treatment. Intravascular optical coherence tomography (IVOCT) has established itself as an alternative to intravascular ultrasound (IVUS) [1, 2] as an imaging modality for invasive coronary angiography. Compared to IVUS, IVOCT is capable of resolving microstructures in the arterial wall which is traded off against a smaller field of view (FOV) [3]. During the procedure, a catheter containing an OCT probe is inserted into a coronary artery. For acquisition, the probe is rotated and pulled back automatically, providing several hundred or thousands of high resolution cross-sectional images of the arterial walls which guide the practitioner’s treatment decision process during an intervention. The assessment of these images requires extensive training of the interventionalist. Also, the vast amount of images acquired in a single pullback cannot all be reviewed by the practitioner during clinical routine. Therefore, various approaches for automatic IVOCT data analysis have been proposed. For example, automatic lumen segmentation [4, 5] and stent detection [6, 7] have been proposed. Also, automatic detection of plaque deposits has been proposed. The general feasibility of inferring plaque characteristics from IVOCT data has been addressed by correlating the images to histology data [8]. IVOCT has been used to measure the thickness of fibrous caps in the arterial wall to assess the risk of ruptures and potential subsequent myocardial infarction [9, 10]. Also, calcified plaque regions have been quantified as they can increase the risk of stenosis [11, 12]. In general, the backscattering and attenuation coefficient of the OCT signal have been used frequently for identification of atherosclerotic tissue [13, 14, 15, 16, 17]. Also, automatic classification methods using texture and optical properties as features have been proposed for plaque detection and segmentation [18, 19]. The segmentation of calcified plaque regions has been proposed using k-means clustering and random forests [20]. More recently, deep learning methods have been proposed for IVOCT data processing, e.g. for tissue layer segmentation [21]. Deep learning models, in particular convolutional neural networks (CNNs), have shown remarkable success in image processing tasks such as classification [22] and segmentation [23]. Also for medical tasks, such as disease classification [24] or tumor segmentation [25], deep learning methods have shown significant improvment over classical methods [26]. For IVOCT-based learning problems, CNNs have been employed, mostly in preliminary studies for plaque segmentation. Patch-wise approaches have been presented where a CNN predicts the class of the patch center pixel [21, 27]. Following the trend of dense segmentation architectures for improved performance [23], SegNet [28] has been employed for calcified plaque segmentation [29].

Fig. 1: Example IVOCT images, highlighting the difficult nature of the modality. The bottom row shows polar representations, the upper row shows the corresponding, transformed, cartesian images.

Although these approaches provide promising initial results, there are various shortcomings to be addressed. First of all, it is unclear whether learning from polar or cartesian representations is the better choice. Using cartesian images [27] is more intuitive as their content resembles the anatomical structure of the artery. However, polar representations [21, 29] theoretically contain the same information while not being distorted by circular interpolation artifacts. For this reason, most classic approaches rely on polar images for image processing [30]. However, it is unclear whether either or both representations are more suitable for deep learning-based feature extraction. Also, IVOCT acquisition is often accompanied by various artifacts, see Fig. 1. Many approaches rely on extensive preprocessing, including lumen segmentation and flattening [31, 5] which is hindered by artifacts that block light penetration of the arterial walls. As these artifcats are common in daily clinical use, automatic procedures should be sufficiently robust. Moreover, previous approaches [27, 29, 21] are limited in terms of dataset size and model choice (e.g. AlexNet [22]). More recent standard architectures such as ResNet [32] and Inception [33] have been shown to significantly improve performance for medical learning tasks [26]. Also, transfer learning from ImageNet has been shown to be advantageous for medical learning problems [34]. For IVOCT-based tissue layer segmentation, rudimentary transfer learning with a partially frozen network has been shown [21]. Another approach considered full fine-tuning for plaque classification [35]. However, the optimal level of meaningful feature transfer and partial retraining has not been explored for IVOCT data. Previously, it has been shown that the optimal choice or transfer strategy varies for different modalities and problems [36] which motivates a thorough investigation for IVOCT data. Lastly, previous methods largely relied on segmentation which is a necessary step for treatment planning in many medical applications [26]. However, for IVOCT interventions, the key problem is the large number of slices in every pullback. Thus, a clinical decision support system needs to provide feedback on the existence and characteristics of plaque deposits on the pullback level. Overall, a goal of such a system is to ensure that no deposit region remains undetected. For this purpose, segmentation is not immediately required.

To overcome these shortcomings, we provide a thorough investigation of IVOCT-based deep learning using a newly built dataset with slice-level labels. Three trained experts provide a class for each slice, defining the plaque type that is present in the image. In this way, we can quickly build a large dataset containing a variety of challenging cases, as indicated in Fig. 1. Using this dataset, we investigate important aspects to be considered for deep learning with IVOCT data. First, we study the use of polar and cartesian image representations for deep feature learning. In particular, we use data augmentation techniques, tailored to each representation. To improve model performance, we employ transfer learning from ImageNet using the established models ResNet [32] and DenseNet [37]. As natural images are vastly different from IVOCT data, we provide an in-depth analysis of transfer learning, showing performance for different transfer strategies. Additionally, we build a new two-path architecture out of pretrained parts from other models that leverages features from both image representations. Overall, we consider binary labels ”plaque” and ”no plaque”, as well as a further differentiation of ”calcified plaque” from ”fibrous/lipid plaque”. This choice is also clinically motivated as calcified plaque deposits often require different tools for treatment, such as rotablation. This paper is structured as follows. First, we introduce our dataset and methods in Section II. Second, we provide results in Section III. Third, we discuss our results in Section IV and last, we conclude the paper in Section V.

Ii Methods

Fig. 2: The models we employ with our dataset. The upper model shows ResNet50-V2 [38], the lower model shows DenseNet121 [37]. In each ResBlock, input and output block, the number of output feature maps is given. denotes spatial downsampling with a stride of two. The ResBlocks within the DenseBlock do not use the last convolution, instead, the previous convolution outputs feature maps of size . Compress denotes a convolution that halves the number of feature maps. GAP denotes global average pooling [39]. Both architectures include dropout with a keep probability of . denotes a retraining point for transfer learning. Left of the point, weights are freezed, right of the point, weights are retrained.

Ii-a Dataset

We build a new dataset based on clinical IVOCT images that were acquired with a St. Jude Medical Ilumien OPTIS. The device is connected to a Dragonfly OPTIS imaging catheter which is inserted into the patient’s coronary vessels. Inside the vessel, images are acquired by rotating the OCT probe inside the catheter and pulling it backwards. In this way, a continuous M-Scan consisting of 1D depth scans (A-Scans) is created. The A-Scans from each turn are arranged in a B-Scan, which is the polar representation shown in Fig. 1. The polar image can be transformed into cartesian space with the transformation and . Applying interpolation in between the transformed pixels results in the cartesian representation , which resembles a cross-sectional view of the artery, see Fig. 1.

For ground-truth annotation, three trained experts with daily experience in IVOCT-assisted interventions determine the type of plaque that is present in a B-Scan. The experts provide labels on the B-Scan level, i.e. they assign the labels ”no plaque”, ”calcified plaque” or ”lipid/fibrous plaque” to each B-Scan presented to them. All experts were partially provided the same images and different images for labeling. For the same images, the final label is determined based on consensus between the experts, i.e. different opinions on the plaque type were resolved by asking the experts for a repetition of their evaluation. Providing different images to experts allows for a larger but potentially noisily labeled dataset. We sample our test set from the consensus set to ensure a meaningful evaluation. Before resolving conflicting evaluations we found an agreement of for binary differentiation and for multi-class differentiation.

In total, the dataset consists of 4000 images from 49 patients. We split off an independent test set of 742 images from 9 patients. We report our results based on this test set. We perform three-fold cross-validation on the training set in order to select relevant hyperparameters that are introduced in Section II-D. We use the ground-truth labels in two variants. For general results on plaque detection, we use binary labels, i.e. ”plaque” and ”no plaque”. We refer to this dataset as the binary dataset. Furthermore, we consider a multi-type plaque dataset where we divide the class ”plaque” further into ”calcified plaque” and ”lipid/fibrous plaque”. This dataset serves the purpose of showing the feasibility of further plaque differentiation with clinical relevance.

For the binary dataset, plaque and non-plaque images are approximately balanced. For the mutli-type dataset, roughly are labeled as ”calcified plaque”, are labeled as ”fibrous/lipid plaque” and are labeled as ”no plaque”. None of the images contains a stent as this would distort results. Our goal is to detect untreated plaque deposits and a deep learning model might learn to detect stents instead of the underlying disease.

Ii-B Preprocessing and Data Augmentation

The original polar images obtained from the OCT device have a resolution of pixels. Thus, the transformed cartesian images have a resolution of pixels. For training, we resize both to a resolution of , which matches the range used with standard architectures in the natural image domain [22, 32, 37]. Note, that this leads to the cartesian images having half the depth resolution of the polar images.

For data augmentation, an intuitive choice is to use random rotations for the cartesian images due to their circular strucutre. We apply random rotations with to cartesian images. Moreover, we apply random flipping along the and direction. For polar images, rotations are not meaningful. Instead, we shift the polar images randomly along the direction with . If A-Scans are shifted out of the image along the positive direction, they are added back on the other side. In this way, we achieve a transformation that matches a rotation in cartesian space. Additionally, we apply random flipping along the direction. For both image representations, we apply random cropping to an image size of . For evaluation, we do not use any rotations or flipping and we use a center crop of size .

Fig. 3: The two-path architecture for simultaneous use of polar and cartesian representations, shown for the ResNet model. We initialize all weights with the pretrained values from ImageNet training. The point denotes feature concatenation after the ResNet block. Here, is shown.

Ii-C Model Architectures

We employ two state-of-the-art architectures from the natural image domain, namely ResNet50-V2 [32] and DenseNet-121 [37]. The architectures’ concepts are shown in Fig. 2.

The ResNet model makes use of residual blocks for a better gradient flow and improved optimization. The key idea of this method is to learn a residual instead of the target mapping . The resulting skip connections have been shown to improve gradient propagation through the network [38]. Furthermore, bottlenecks are used which significantly reduce the number of parameters and the computational effort in the network. Here, a convolution first downsamples the input tensor along the feature map dimension. Then, the normal filter is applied on the lower dimensional embedding. Afterwards, a filter upsamples the tensor once again to the original feature map size. An efficient architecture is particularly important for the problem at hand, as our dataset is relatively small which increases the risk of overfitting.

The DenseNet model is focused on efficiency even more, as it relies on heavy feature reuse in each block. The idea of this architecture is to use all features from previous layers for current layer . I.e., the output of the layer is computed as . In this way, the output feature maps of all previous layers are reused. This allows for overall significantly smaller feature maps and a reduced number of parameters. Due to this structure, the number of feature maps grows linearily with a growth rate . Here, we use . Furthermore, the architecture uses compression layers in between dense blocks in order to keep the feature map sizes low. A convolution downsamples the feature maps by a factor . Here, is used. This architecture also makes use of bottlenecks, as described for the ResNet model.

As our dataset is small compared to, e.g., ImageNet [22], we make use of transfer learning which has been used successfully in the medical image domain [34]. Therefore, we consider different levels of weight freezing, both for the ResNet and DenseNet model. The retraining points are indicated in Fig. 2. E.g., for , we freeze all weights on the left of and retrain the rest. Thus, we consider two partial retraining scenarios, training from scratch and full retraining for both models. In all cases, we remove the last fully-connected layer and replace it with a layer that has a matching number of outputs for our binary or multi-type labels.

Furthermore, we consider a fusion of polar and cartesian images in a single architecture in order to investigate whether joint features from the two representations provide an overall benefit for the problem. Related to this approach, CNNs with multiple paths have been used for multi-resolution inputs [40] and differently deformed images [41]. So far, different image representations have not been combined in this way and there are no related approaches for IVOCT. The proposed architecture based on ResNet is shown in Fig. 3. Similarly, we construct the architecture for DenseNet. Each path receives the polar or cartesian representation as its input. We apply the same data augmentation techniques to each representation, as described in Section II-B. The point describes the concatenation point within the network, where the image features are fused. The concatenation is performed along the tensor’s feature map dimension, i.e. the features from each path are stacked together. Assume a tensor size of before concatenation where is the batch size, and are the spatial height and width and is the number of feature maps. The concatenated feature maps have the shape . For ResNet, the concatenation is performed in between two ResBlocks. Note, that we use the following reduction operation in the next ResBlock in order to reduce the feature map size back to the normal size that would occur in the single-path ResNet. For DenseNet, the concatenation is performed inside a DenseBlock, before the compression operation. Similar to ResNet, the compression block keeps the overall feature map size at a reasonable level despite the concatenation. This choice also enables us to directly use the pretrained weights for most parts of the architecture as most of the weights have the same shape. We initialize both paths before concatenation with the normal pretrained weights. After the concatenation, we can also reuse the weights due to our previously described downsampling strategy. Only the compression operation itself comes with a different weight shape, that cannot be adapted immediately. Consider two feature maps before concatenation from the two path, each with shape where is the batch size, and are the spatial height and width and is the number of feature maps. Thus, the downsampling convolution’s weight tensor has the shape where is the downsampled feature map size. We assign the original, pretrained weight tensor of shape both to the sliced tensor of shape and . We show that this initialization does have a significant impact on performance.

Ii-D Training

For model training we minimize the cross-entropy loss which is typically used for classification tasks, i.e.


where is the ground-truth label, is the softmax-normalized model prediction and the number of classes. For the multi-class cases, we weight the loss for ”calcified plaque” higher in order to counter class imbalance. For training, we use the Adam algorithm [42] with a starting learning rate of . To find the optimal schedule, we reduce the learning rate by a factor of two when the validation error saturates. We use a batch size of for single-path models and for two-path models. In total, we train each model for epochs. We find relevant hyperparameters with a restricted grid search with a coarse grid using the -score from threefold cross-validation. These hyperparameters include the initial learning rate, dropout rate, input image size and crop size. Relevant results are reported for the independent test set. We implement our models using Tensorflow [43].

Ii-E Experiments Overview

First, we present results for binary classification, both for the Resnet and Densenet model, as well as for polar and cartesian images. We show results with and without our data augmentation strategy. All models are pretrained on ImageNet.

Second, we investigate transfer learning by showing results without pretraining, and two partial retraining scenarios, both for Densenet and Resnet using cartesian images.

Third, we provide results for our two-path architecture, using both image representations. Results for Resnet and Densenet are presented. We show results for different concatenation points, investigating the optimal abstraction level for feature fusion. We present results for our weight initialization strategy at the concatenation point.

Last, we consider multi-class classification for both CNN models. Again, we consider both image representations and the use of data augmentation. All models are pretrained on ImageNet.

For evaluation of binary classification models, we report accuracy (Acc.), sensitivity (Sens.), specificity (Spec.), and the F1-score. For multi-class classification, we report the per-class weighted accuracy for each class and the F1-score with all classes. The per-class weighted accuracy is calculated by dividing the number of true positive examples by the number of all examples for that particular class.

Iii Results

Acc. Sens. Spec. F1-Score

Data Aug.

Densenet Cart.
Densenet Polar.
Resnet Cart.
Resnet Polar

No Data Aug.

Densenet Cart.
Densenet Polar.
Resnet Cart.
Resnet Polar
TABLE I: Results for binary plaque classification. All models are pretrained on ImageNet and all weights are fine-tuned.

For binary plaque classification, results are shown in Table I. The most evident difference is visible for data augmentation strategies. Also, the performance improvement through data augmentation is notably higher for cartesian images. Furthermore, we can observe that cartesian images lead to a better performance than polar images. The difference between the Resnet and Densenet model is comparatively small. Additionally, we show qualitative results for plaque detection along a pullback. The visualization is shown in Fig. 4. Although some slices are missclassified, the system overall provides a good overview on the presence of plaque along the pullback.

Fig. 4: Qualitative results shown with a rendered pullback. Top, the original pullback is shown. Mid, predicted plaque regions are shown along the pullback. Bottom, the ground-truth labels are shown along the pullback. The color red indicates regions with plaque and the color green indicates regions without plaque. The arrow indicates visibly incorrect predictions. Rendering was performed by stacking cartesian slices along the catheter’s center.

Furthermore, we investigate different transfer learning strategies. Table II shows the results for different levels of retraining. We consider training from scratch, full fine-tuning and two different partial freezing scenarios. Overall, fine-tuning all weights within the network performs best. This is followed by partially freezing some weights. The more weights are frozen, the more performance deteriorates. Performance is worst, when no pretraining is performed at all. We also report results for training from scratch with both cartesian and polar images. The difference between full retraining and training from scratch is larger for cartesian images.

Acc. Sens. Spec. F1-Score


From Scratch
From Scratch (Polar)


From Scratch
From Scratch (Polar)
TABLE II: Results for different transfer learning scenarios using cartesian images, unless indicated otherwise. Full refers to full fine-tuning of all weights. From scratch refers to training with random weight initialization instead of pretraining. denotes the point of weight freezing in the network, see Fig. 2.

Moreover, we consider the two-path architecture introduced in Section II-C. The results for the approach are shown in Table III. We show results for different concatenation points and for our weight initialization strategy at the concatenation point. The model with an intermediate concatenation point performs best, being close to . Moving the point closer to the input of the network deteriorates performance. Moreover, the proposed weight initialization strategy considerably improves performance.

Acc. Sens. Spec. F1-Score


No Init.


No Init.
TABLE III: Results for the two-path architecture concept, both for Densenet and Resnet. denotes the concatenation point in the network, see Fig. 3. No Init. refers to no use of our weight initialization strategy. All other weights are still initialized with their pretrained values. We demonstrate it with the best performing model with concatenation point .

Last, we show multi-class plaque classification results in Table IV. Overall, the classification is worse than for the binary classification case. Still, classification of calcified plaque achieves a slightly lower but similar performance as for the detection of other plaque types. The general trends in terms of performance are similar to binary classification. The use of cartesian images improves performance slightly which shows consistently for both models. No use of data augmentation significantly reduces classification performance.

W.A. W.A. W.A. . F1-Score

Data Aug.

Densenet Cart.
Densenet Polar.
Resnet Cart.
Resnet Polar

No Data Aug.

Densenet Cart.
Densenet Polar.
Resnet Cart.
Resnet Polar
TABLE IV: Results for multi-class plaque classification. The classes are ”calcified plaque” (), ”fibrous/lipid plaque” () and ”no plaque” (). All models are pretrained on ImageNet and all weights are retrained. W.A. refers to the weighted per-class accuracy.

Iv Discussion

With the recent success of deep learning methods for medical image analysis, future IVOCT-based clinical decision support systems require a profound understanding of deep learning with IVOCT data. Therefore, we present an in-depth investigation of deep learning-based plaque detection in IVOCT pullbacks. We built a new dataset that is specifically targeted at this task by using slice-level labels. This approach provides an assessment at the pullback level, which is particularly required for a potential clinical decision support system, see Fig. 4. This approach allows us to quickly build a large dataset which enables us to consider a variety of difficult IVOCT images containing various artifacts as shown in Fig. 1, which is particularly relevant for clinical application. A robust detection model would avoid the necessity of repeated pullbacks in case of, e.g., minor acquisition or blood artifacts. Overall, Fig. 4 shows that our dataset can be used for plaque detection along pullbacks, as most regions are classified correctly.

Polar and cartesian images. Using our dataset, we first investigate the use of polar and cartesian image representations for deep feature learning. For binary classification, we find that the best cartesian model achieves an accuracy of compared to for the best polar model. Overall, models using cartesian images appear to perform better, see Table I. This is a surprising results as polar images should be richer in tissue property information since we downsampled both images to the same size of . Due to the circular transformation, the depth scans in the cartesian images only possess half the resolution compared to polar images. This indicates, that the relevant features for plaque detection are easier to exploit from cartesian images for deep learning models. It should be noted, that practitioners also generally use cartesian images for assessment.

Data augmentation. We used extensive data augmentation, both for cartesian and polar images. In particular, rotations for cartesian images and shifting for polar images are well suited for IVOCT data. Considering the results in Table I, cartesian-based models benefit substantially more from data augmentation. Accuracy improves by for cartesian images and for polar images. This shows that the superior performance of cartesian images is also largely tied to data augmentation. Our dataset is still small compared to standard sets being used in the natural image domain [22] and thus dependence on data augmentation can be expected. We highlight that it is a key aspect to consider when building a deep learning-based decision support system as performance heavily depends on it.

Transfer learning. We provide an extensive evaluation of transfer learning for IVOCT-based deep learning. The results in Table II show that there is a significant difference in performance between training from scratch and fine tuning. For both models, a performance gain of approximately was achieved. Previous transfer learning approaches for other medical imaging modalities achieved similar performance improvements [34]. For IVOCT, it is still notable that pretraining on natural images works well, considering that the modalities are vastly different. For further investigation, we considered partially freezing early layers of the architectures which is motivated by the fact that earlier layers tend to learn more generic features[44]. While our results show that performance is still high compared to training from scratch, accuracy deteriorates the more weights are frozen during training. This indicates that IVOCT features are very different from natural image features and thus, full retraining should be performed. This extends previous findings on transfer learning and IVOCT where only partially frozen networks were considered [21]. Moreover, we compared how transfer learning affects performance for polar and cartesian representations. Pretraining appears to be more effective with cartesian images which might be related to the fact that the source domain images are also cartesian. Despite the significant difference between natural and IVOCT images, the shared cartesian structure appears to be beneficial for transfer learning.

Multi-path architecture. Besides representation-specific data augmentation we also considered a fusion of polar and cartesian image features for additional insights and performance gain. Overall, the two-path architectures with late feature concatenation slightly increase performance over the best single-path models by for Resnet and for Densenet, see Table I and Table III. First, this underlines that deep learning models are able to extract different features from polar and cartesian representations, as the performance increases when using both. Also, the learned features appear to differ earlier in the network, as early feature concatenation decreases performance. Second, we show an effective way of reusing pretrained weights for a modified architecture. As discussed before, transfer learning significantly boosts performance and thus pretraining is mandatory for high performance models. Our simple yet effective initialization strategy at the concatenation point shows that task-specific architecture design is still feasible when being tied to pretrained models. It should be noted that this concept works consistently across both the Resnet and Densenet model, although their structure and feature propagation mechanics are different.

Multi-class classification. Besides binary plaque detection, we consider a subdivision into calcified and fibrous/lipid plaque types. While fibrous and lipid plaques are relevant for risk assessment in terms of potential ruptures and thrombosis [9], detection of calcified regions is relevant in terms of tool selection for treatment. Often, the use of rotablation or a cutting balloon is required for calcified regions. Our results in Table IV show that calcified plaques can be distinguished from others. However, the overall F1-scores are significantly lower than for binary classification despite both approaches being based on the same image data. This indicates uncertainty about specific plaque types. Considering our labeling strategy, this indicates that plaque differentiation is subjective and dependent on the labeling expert. This is not surprising in the sense that plaque development over time is continuous. E.g., the decision whether a plaque deposit is still to be considered fibrous or already calcified is not always clear and might change even within the same plaque region. Our results highlight this well-known issue [2] and suggest extended research towards both experts’ labeling and multi-class learning approaches. The effect of continuous plaque changes along pullbacks is also visible in Fig. 4 for the binary case. The expert annotations indicate that a small area within a large region is free of plaque. As plaque develops continuous, the differences between slices are likely small. The experts made a fine-grained distinction which the model could not detect in this case which indicates that the expert’s chosen threshold might have differed between training cases and this particular test case.

For all parts of our investigation we considered two state-of-the-art deep learning models ResNet and Densenet which showed consistent results for all experiments which underlines the validity of the insights gained from our experiments.

V Conclusion

We present an in-depth analysis of plaque detection in IVOCT pullbacks using convolutional neural networks. For this purpose, we built a new database with IVOCT slices labeled by three trained experts. We employ two state-of-the-art deep learning models, Resnet and Densenet, for binary and multi-class plaque classification from cartesian and polar IVOCT images. We show consistently across models, that cartesian images appear to be advantageous for deep learning. We develop data augmentation strategies, tailored to each image representation which result in a considerable performance boost. Additionally, we investigate transfer learning which provides high performance with model fine-tuning. Furthermore, we develop a multi-path model that fuses features from both representations for improved accuracy and effectively reuses weights from different, pretrained architectures. Last, we consider differentiation of calcified plaque from other plaque types which works well but also highlights potential issues of plaque differentiation and expert judgement. Overall, our results show that a deep learning-based clinical decision support system providing plaque detection along pullbacks would be feasible. For future work, the dataset can be extended in size as well as the number of experts consulted for assessment. Based on this, additional analysis with respect to plaque differentiation could be performed. Additionally, a decision support system could be integrated into catheter laboratories for the assessment of usefulness in clinical practice.


  • [1] I.-K. Jang, B. E. Bouma, D.-H. Kang, S.-J. Park, S.-W. Park, K.-B. Seung, K.-B. Choi, M. Shishkov, K. Schlendorf, E. Pomerantsev et al., “Visualization of coronary atherosclerotic plaques in patients using optical coherence tomography: comparison with intravascular ultrasound,” Journal of the American College of Cardiology, vol. 39, no. 4, pp. 604–609, 2002.
  • [2] G. J. Tearney, E. Regar, T. Akasaka, T. Adriaenssens, P. Barlis, H. G. Bezerra, B. Bouma, N. Bruining, J.-m. Cho, S. Chowdhary et al., “Consensus standards for acquisition, measurement, and reporting of intravascular optical coherence tomography studies,” Journal of the American College of Cardiology, vol. 59, no. 12, pp. 1058–1072, 2012.
  • [3] A. S. Kini, Y. Vengrenyuk, T. Yoshimura, M. Matsumura, J. Pena, U. Baber, P. Moreno, R. Mehran, A. Maehara, S. Sharma et al., “Fibrous cap thickness by optical coherence tomography in vivo,” Journal of the American College of Cardiology, vol. 69, no. 6, pp. 644–657, 2017.
  • [4] S. Tsantis, G. C. Kagadis, K. Katsanos, D. Karnabatidis, G. Bourantas, and G. C. Nikiforidis, “Automatic vessel lumen segmentation and stent strut detection in intravascular optical coherence tomography,” Medical physics, vol. 39, no. 1, pp. 503–513, 2012.
  • [5] A. G. Roy, S. Conjeti, S. G. Carlier, P. K. Dutta, A. Kastrati, A. F. Laine, N. Navab, A. Katouzian, and D. Sheet, “Lumen segmentation in intravascular optical coherence tomography using backscattering tracked and initialized random walks,” IEEE journal of biomedical and health informatics, vol. 20, no. 2, pp. 606–614, 2016.
  • [6] H. Lu, M. Gargesha, Z. Wang, D. Chamie, G. F. Attizzani, T. Kanaya, S. Ray, M. A. Costa, A. M. Rollins, H. G. Bezerra et al., “Automatic stent detection in intravascular oct images using bagged decision trees,” Biomed Opt Express, vol. 3, no. 11, pp. 2809–2824, 2012.
  • [7] Z. Wang, M. W. Jenkins, G. C. Linderman, H. G. Bezerra, Y. Fujino, M. A. Costa, D. L. Wilson, and A. M. Rollins, “3-d stent detection in intravascular oct using a bayesian network and graph search,” IEEE Trans Med Imaging, vol. 34, no. 7, pp. 1549–1561, 2015.
  • [8] H. Yabushita, B. E. Bouma, S. L. Houser, H. T. Aretz, I.-K. Jang, K. H. Schlendorf, C. R. Kauffman, M. Shishkov, D.-H. Kang, E. F. Halpern et al., “Characterization of human atherosclerosis by optical coherence tomography,” Circulation, vol. 106, no. 13, pp. 1640–1645, 2002.
  • [9] Z. Wang, D. Chamie, H. G. Bezerra, H. Yamamoto, J. Kanovsky, D. L. Wilson, M. A. Costa, and A. M. Rollins, “Volumetric quantification of fibrous caps using intravascular optical coherence tomography,” Biomed Opt Express, vol. 3, no. 6, pp. 1413–1426, 2012.
  • [10] G. Zahnd, A. Karanasos, G. van Soest, E. Regar, W. Niessen, F. Gijsen, and T. van Walsum, “Quantification of fibrous cap thickness in intracoronary optical coherence tomography with a contour segmentation method based on dynamic programming,” Int J Comput Assist Radiol Surg, vol. 10, no. 9, pp. 1383–1394, 2015.
  • [11] E. Mehanna, H. G. Bezerra, D. Prabhu, E. Brandt, D. Chamié, H. Yamamoto, G. F. Attizzani, S. Tahara, N. Van Ditzhuijzen, Y. Fujino et al., “Volumetric characterization of human coronary calcification by frequency-domain optical coherence tomography,” Circulation Journal, vol. 77, no. 9, pp. 2334–2340, 2013.
  • [12] G. S. Mintz, “Intravascular imaging of coronary calcification and its clinical implications,” JACC: Cardiovascular Imaging, vol. 8, no. 4, pp. 461–471, 2015.
  • [13] C. Xu, J. M. Schmitt, S. G. Carlier, and R. Virmani, “Characterization of atherosclerosis plaques by measuring both backscattering and attenuation coefficients in optical coherence tomography,” Journal of biomedical optics, vol. 13, no. 3, p. 034003, 2008.
  • [14] G. Van Soest, T. P. Goderie, E. Regar, S. Koljenovic, A. G. J. van Leenders, N. Gonzalo, S. van Noorden, T. Okamura, B. E. Bouma, G. J. Tearney et al., “Atherosclerotic tissue characterization in vivo by optical coherence tomography attenuation imaging,” Journal of biomedical optics, vol. 15, no. 1, p. 011105, 2010.
  • [15] K. Vermeer, J. Mo, J. Weda, H. Lemij, and J. De Boer, “Depth-resolved model-based reconstruction of attenuation coefficients in optical coherence tomography,” Biomed Opt Express, vol. 5, no. 1, pp. 322–337, 2014.
  • [16] M. Gargesha, R. Shalev, D. Prabhu, K. Tanaka, A. M. Rollins, M. Costa, H. G. Bezerra, and D. L. Wilson, “Parameter estimation of atherosclerotic tissue optical properties from three-dimensional intravascular optical coherence tomography,” J Med Imaging, vol. 2, no. 1, p. 016001, 2015.
  • [17] S. Liu, Y. Sotomi, J. Eggermont, G. Nakazawa, S. Torii, T. Ijichi, Y. Onuma, P. W. Serruys, B. P. Lelieveldt, and J. Dijkstra, “Tissue characterization with depth-resolved attenuation coefficient and backscatter term in intravascular optical coherence tomography images,” Journal of biomedical optics, vol. 22, no. 9, p. 096004, 2017.
  • [18] G. J. Ughi, T. Adriaenssens, P. Sinnaeve, W. Desmet, and J. D’hooge, “Automated tissue characterization of in vivo atherosclerotic plaques by intravascular optical coherence tomography images,” Biomed Opt Express, vol. 4, no. 7, pp. 1014–1030, 2013.
  • [19] S. Celi and S. Berti, “In-vivo segmentation and quantification of coronary lesions by optical coherence tomography images for a lesion type definition and stenosis grading,” Medical image analysis, vol. 18, no. 7, pp. 1157–1168, 2014.
  • [20] L. S. Athanasiou, C. V. Bourantas, G. Rigas, A. I. Sakellarios, T. P. Exarchos, P. K. Siogkas, A. Ricciardi, K. K. Naka, M. I. Papafaklis, L. K. Michalis et al., “Methodology for fully automated segmentation and plaque characterization in intracoronary optical coherence tomography images,” Journal of biomedical optics, vol. 19, no. 2, p. 026009, 2014.
  • [21] A. Abdolmanafi, L. Duong, N. Dahdah, and F. Cheriet, “Deep feature learning for automatic tissue classification of coronary artery using optical coherence tomography,” Biomed Opt Express, vol. 8, no. 2, pp. 1203–1220, 2017.
  • [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds.   Curran Associates, Inc, 2012, pp. 1097–1105.
  • [23] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE CVPR, 2015, pp. 3431–3440.
  • [24] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017.
  • [25] S. Pereira, A. Pinto, V. Alves, and C. A. Silva, “Brain tumor segmentation using convolutional neural networks in mri images,” IEEE Trans Med Imaging, vol. 35, no. 5, pp. 1240–1251, 2016.
  • [26] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical image analysis, vol. 42, pp. 60–88, 2017.
  • [27] S. He, J. Zheng, A. Maehara, G. Mintz, D. Tang, M. Anastasio, and H. Li, “Convolutional neural network based automatic plaque characterization for intracoronary optical coherence tomography images,” in Medical Imaging 2018: Image Processing, vol. 10574, 2018, p. 1057432.
  • [28] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
  • [29] D. A. Oliveira, M. M. Macedo, P. Nicz, C. Campos, P. Lemos, and M. A. Gutierrez, “Coronary calcification identification in optical coherence tomography using convolutional neural networks,” in Medical Imaging 2018: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 10578, 2018, p. 105781Y.
  • [30] A. Boi, A. D. Jamthikar, L. Saba, D. Gupta, A. Sharma, B. Loi, J. R. Laird, N. N. Khanna, and J. S. Suri, “A survey on coronary atherosclerotic plaque tissue characterization in intravascular optical coherence tomography,” Current atherosclerosis reports, vol. 20, no. 7, p. 33, 2018.
  • [31] J. J. Rico-Jimenez, D. U. Campos-Delgado, M. Villiger, K. Otsuka, B. E. Bouma, and J. A. Jo, “Automatic classification of atherosclerotic plaques imaged with intravascular oct,” Biomed Opt Express, vol. 7, no. 10, pp. 4069–4085, 2016.
  • [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE CVPR, 2016, pp. 770–778.
  • [33] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE CVPR, 2016, pp. 2818–2826.
  • [34] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning,” IEEE Trans Med Imaging, vol. 35, no. 5, pp. 1285–1298, 2016.
  • [35] N. Gessert, M. Heyder, S. Latus, M. Lutz, and A. Schlaefer, “Plaque classification in coronary arteries from ivoct images using convolutional neural networks and transfer learning,” in Computer Assisted Radiology and Surgery Proceedings of the 32nd International Congress and Exhibition.   Springer, 2018.
  • [36] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?” IEEE Trans Med Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.
  • [37] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely connected convolutional networks,” in Proceedings of the IEEE CVPR, vol. 1, no. 2, 2017, p. 3.
  • [38] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in ECCV, 2016, pp. 630–645.
  • [39] M. Lin, Q. Chen, and S. Yan, “Network in network,” in International Conference on Learning Representations, 2014.
  • [40] K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation,” Medical Image Analysis, vol. 36, pp. 61–78, 2017.
  • [41] N. Gessert, J. Beringhoff, C. Otte, and A. Schlaefer, “Force estimation from OCT volumes using 3D CNNs,” International Journal of Computer Assisted Radiology and Surgery, vol. 13, no. 7, pp. 1073––1082, 2018.
  • [42] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2014.
  • [43] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, and M. Devin, “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
  • [44] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in Neural Information Processing Systems, 2014, pp. 3320–3328.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description