Automatic detection of lumen and media in the IVUS images using U-Net with VGG16 Encoder

Automatic detection of lumen and media in the IVUS images using U-Net with VGG16 Encoder


Coronary heart disease is one of the top rank leading cause of mortality in the world which can be because of plaque burden inside the arteries. Intravascular Ultrasound (IVUS) has been recognized as powerful imaging technology which captures the real time and high resolution images of the coronary arteries and can be used for the analysis of these plaques. The IVUS segmentation involves the extraction of two arterial walls components namely, lumen and media. In this paper, we investigate the effectiveness of Convolutional Neural Networks including U-Net to segment ultrasound scans of arteries. In particular, the proposed segmentation network was built based on the the U-Net with the VGG16 encoder. Experiments were done for evaluating the proposed segmentation architecture which show promising quantitative and qualitative results.

IVUS Segmentation Lumen Media deep learning U-Net.

1 Introduction

Intravascular Ultrasound (IVUS) is a diagnosis medical imaging technique wherein an ultrasound probe is inserted into the arteries to capture the real time high resolution cross-sectional scan of the vessel 4. Segmentation of this type of imaging helps a practitioner identify locate of not only the vessel membranes (lumen and media), but also the atherosclerotic plaques in the vessel walls. The early detection of these area can help prevent further complications which can lead to myocardial infarction and ultimately death. This study is important because myocardial infarction is one of the leading causes of deaths and [1] reported that there were 15.9 million deaths world wide in . However, IVUS segmentation is one of the challenging task in medical image analysis due to high volume of this type of image and existence of different types of artifacts.

Several methods ranging from probabilistic to deep learning have been proposed to address this problem. For instance, [2], The authors tries to segment the walls of the coronary artery using the Fast Marching Method (FMM) which incorporates two components, the texture gradient and the gray level gradient. They also make use of gamma probability density function to model the gray level distributions in the arterial walls. The presence of artefacts in the image can greatly disturb segmentation accuracy. So, one advantage of this method [2] is that it claims to handle artefacts fairly well. However, the method needs a large amount of data to verify robustness. One of the best achieved results in IVUS segmentation has been reported in [3]. They proposed a region selection strategy on top of a recently proposed Extremal Region of Extremum Level (EREL) [4, 5] detector in order to segment the lumen and media. Their deterministic method not only runs very fast, but also can segment the 20Mhz IVUS frames accurately.

Studies [6], [7], [8], [9], [11], [12] focus on neural networks. According to Google Trends, the current trends which includes the last five years indicate an increase in the adoption of neural networks to perform medical image segmentation and analysis [13]. This is due to the availability of appropriate hardware such as powerful NVIDIA GPUs, NVIDIA cuDNN - a GPU accelerated deep learning toolkit and high performance convolutional neural networks like NiftyNet [6], V-Net [7] and U-Net [14] which are built for medical image analysis. Shengran et al in [11], applied Artificial Neural Network (ANN) for border detection in IVUS images including both vessel membrane and plaque type and burden to address mentioned problem. The first phase is border smoothing using a mean filter to achieve region of interest that includes vessel and plaque inside. The second phase is to make Region Of Interest (ROI) narrow limited to vessel area using Double ANN and then smoothing borders again to pass the output of prior step to the next ANN to detect plaque border. Methodology used in [11] is ANN with layers which is evaluated by IVUS high resolution (up to um) images from subjects. IVUS images were used to train the model and images were used as testing data to cross validate, resulting in mean absolute error STD which is fairly tolerable. Lumen cross-section area (LCSA) correlation of testing data is highly accurate with and vessel cross-section area (VCSA) correlation is and manual compare value is . plaque cross-section area (PCSA) correlation of testing data is and manual compare of PCSA is that focuses on sensitivity of shape of the curve.

In this paper we make use of a variant of the fully convolutional neural network called U-Net[14]. We investigate the effectiveness of U-Net to segment gray scale IVUS images for identification of lumen and media. After that, the U-Net with VGG16[15] encoder is used for detection of lumen and media in IVUS scans which relies heavily on the data augmentation. The rest of this paper is organized as follow. The second section describes the data set and the proposed network for segmentation. The experimental results and discussion is presented in section 3. The conclusion is explained in section 5.

Figure 1: The VGG16-UNet architecture which is U-Net models with VGG-16 encoder.

2 Method

In this section, first the IVUS dataset that is used in this research is described. In the following, the proposed segmentation method which is based on the combination of VGG16 and U-Net architectures is discussed.

2.1 Dataset

We use a publicly available dataset [19] that includes ultrasound scans of the human coronary artery. The dimension for each scan is . Dataset has scans and each has its own lumen and media labels. The training data has IVUS scans and the testing part has . The test set consists of in-vivo pullbacks of the human coronary artery acquired by the from Volcano Corporation using the MHz Eagle Eye monorail catheter (Volcano Corp., San Diego, CA).

2.2 Network Architecture

The U-Net [14] is one of the powerful convolutional network architecture for fast and precise images segmentation which is presented in for the first time for biomedical image segmentation. It consists of two general parts which are encoder and decoder that makes it to have a \enquoteU shape. It predicts a pixel wise segmentation map of the input image rather than classifying the input image as a whole. U-Net passes the feature maps from each level of the contracting path over to the analogous level in the expanding path which allows the classifier to consider features at various scales and complexities to make its decision. U-Net is capable of learning from a relatively small training set.

In this paper, we propose a deep learning method that is a combination of these two powerful network architectures in which the VGG-16 is used as the encoder part of the U-Net. We call the proposed architecture VGG16-UNet. The architecture is shown the Figure1.

The left hand side of the network is an encoder and has blocks. It incorporates the convolutional layers from the original VGG16[15]. After each convolution block, the red arrow indicates a MaxPooling operation which reduces the dimensions of the image by . The right hand side of the network, is a decoder which also has blocks. The green arrow after each block is an UpSampling operation which restores the dimensions of the image. Each UpSampling operation repeats the rows and columns of the image by (two rows and columns). The skip connections between the blocks (horizontal connections with black arrows) is used to restore the dimensions of the image. These skip connections are implemented using the concatenate operation to combine the corresponding feature maps. Since this is a variant of the Fully Convolutional Neural Network, FCN[16] for semantic segmentation, the spatial dimension information (height and width) of the image needs to be retained hence we use the skip connections. The last convolutional layer has only filter which is similar to a final Dense layer in most other neural networks and gives the binary mask prediction. In total, the network has about convolutional layers which is followed by a PReLU[17] activation. The PReLU[17] has an alpha parameter that is learned during training. In addition, the last convolutional layer has a sigmoid activation function.

2.3 Image Augmentation

The training dataset we used comprises of IVUS scans which is not sufficient. The deep learning method’s performance is depend on the amount of data. Thus, the data augmentation was performed with keeping the information in the image. The augmentations type that are performed in this paper are including:

  • Horizontal Flips - The images were randomly flipped along the horizontal axis.

  • Vertical Flips - Images were flipped along the vertical axis.

  • Width Shift - Images were shifted along the horizontal axis by 30/width.

  • Height Shift - Images were shifted along the vertical axis by 30/height

  • Rotation - Randomly rotation of the image from 0 to 90.

3 Experimental Results and Discussion

Three sets of experiments were carried out for segmenting the lumen and media separately including simple U-Net model, VGG16-UNet without data augmentation (VGG16-UNet without DA), VGG16-UNet with data augmentation (VGG16-UNet with DA). In this section, first the metrics for evaluation the proposed networks are described and in the following the experiments, numerical implementations and their results are illustrated.

3.1 Evaluation Criteria

Dice Similarity

The loss function used during training is the Dice similarity Coefficient also known as Sørensen-Dice Index5 which is given in Eq.1.


The Dice similarity coefficient is used to measure the spatial overlap of two segmentation masks, and . and refer to the cardinality of the sets and . Dice score is a value between and which closer value to show the more similarity between the samples. The idea is to minimize the 1-Dice score computed between two samples. During training, the Dice score between the predicted binary mask and the ground truth mask is computed and this value is subtracted from . The error is then back propagated to be minimized.

Jaccard Similarity

The evaluation function for testing the model is the Jaccard Similarity coefficient also known as Intersection over Union. It is given by the Eq.2.


The Jaccard measure is used to compute the similarity of two samples. This is useful to evaluate the accuracy of the model predictions and thus evaluate the model. We compute the Jaccard score between the ground truth masks and the predicted mask.

3.2 Experimental Set Models

Simple U-Net

In the first set, we implemented a simple U-Net architecture with lesser number of convolution filters than the original U-Net[14] although the architecture was similar in terms of the number of convolution blocks. This model was trained with no data augmentation for epochs, batch size of on images. The train-test split was set to . The optimizer used was the Adam with learning rate set to . The model was tested on scans of the test set. The time taken to train was about to minutes.

VGG16-UNet without Data Augmentation

In the second set of experiments we implemented the VGG16-UNet. We trained the model for epochs with no data augmentation. Batch size set was set to since the network was a bit more dense and had more than million trainable parameters. The optimizer function used is the SGD(Stochastic Gradient Descent) with the learning set to . Here the convolution kernel size is set to and the stride length of to generate overlapping feature maps. The training set images were resized to due to memory constraints. The train-test split was set to . The model was tested on resized , scans of the test set. The time taken to train this model is about minutes.

Figure 4: VGG16-UNet model performance plots for lumen and media. (a) Dice score plot for lumen. (b) Dice score plot for media.

VGG16-UNet with Data Augmentation

In the third set of experiments we continued experimenting with our VGG16-UNet. Here we trained the model with data augmentation. The model was trained for epochs on images with batch size set to . The same optimizer as before, SGD(Stochastic Gradient Descent) was used and learning rate was set to . Here the learning rate was reduced if there was no improvement in validation loss for epochs. The train-test split was set to . Just like before the model was tested on , scans of the test set. This model takes about minutes to train as well. The Dice score and loss plot after training the proposed mode for lumen and media segmentation are presented in Fig.4(a) to (d) respectively.

3.3 Results

Two type of samples for lumen and Media are used for evaluation of the three experimental models that are used in this paper. The first sample has more artifacts and the second one is the regular image due to evaluate the models in different situations. The visual comparison of these three experiments segmentation result are presented in Table .1. The quantitative results based on the evaluation criteria is presented in Table.2. Furthermore, the general comparison of all of these experiments over the test set IVUS scans are shown in Table.3 which is the average of Jaccard measure and Dice for all of the test images. The Exp 1, Exp 2 and Exp 3 are referring to simple U-Net,VGG16-UNet without data augmentation and VGG16-UNet with data augmentation.

Simple U-Net

For the first experiment, we can see in Table.1, that the model does not seem to provide good segmentation results for the noisy images. It is not able to generalize well and the effect of artifacts in the scan like shadowing greatly disturbs the segmentation result. However, the model is able to get good predictions for the regular images. It is able to classify the pixels well since there is not much distortions in the image. On an average the results in Table.2, this model is not very good for lumen segmentation. The best Jaccard Similarity score we get for media which is . The average Jaccard and Dice score of this experiment for lumen and media segmentation is in Table.3 which is not good.

VGG16-UNet without Data Augmentation

From Table.1 we can see that the network is able to generalize quite well. But it still is not able to handle artifacts (in noisy lumen and noisy media). This is probably because the model still tries to classify some black pixels in the vicinity to be as part of the foreground. Some post processing step like thresholding can be used to eliminate any outliers that surround the region of interest. This model is better than the simple U-Net described earlier and provides better segmentation result which is shown in Table.2. Here, the best segmentation result is for media which has jaccard measure of . However it still does not perform well enough because the network needs more data. Moreover it is still susceptible to serious artifacts. The average Jaccard and Dice score of this experiment for lumen and media segmentation in Table.3 is higher than the simple U-Net.

Test Images Original Image Grand Truth Simple U-Net VGG16-UNet without DA VGG16-UNet with DA
Noisy Lumen Sample
Lumen Sample
Noisy Media Sample
Media Sample
Table 1: Visual segmentation Results for different experiments on two sample images for Lumen and Media (one regular and one noisy sample).

width=1 Test Images Intersection Area Unit Area Jaccard Dice Exp 1 Exp 2 Exp 3 Exp 1 Exp 2 Exp 3 Exp 1 Exp 2 Exp 3 Exp 1 Exp 2 Exp 3 Noisy Lumen Sample 4539 1537 2168 6448 2309 2355 0.6532 0.6656 0.9205 0.7909 0.7992 0.9586 Lumen Sample 12773 5431 5959 18995 6379 6418 0.6724 0.8513 0.9284 0.8041 0.9191 0.9629 Noisy Media Sample 9285 2619 3534 17534 4323 4299 0.5280 0.6058 0.8220 0.6911 0.7545 0.9023 Media Sample 26314 9382 9860 30532 10256 10258 0.8618 0.9147 0.9553 0.9257 0.9554 0.9771

Table 2: Comparison of different experimental results on two sample images for Lumen and Media (one regular and one noisy sample) based on the quantitative evaluation criteria.
Test Images Average Jaccard Average Dice
Exp 1 Exp 2 Exp 3 Exp 1 Exp 2 Exp 3
Lumen 0.5497 0.6965 0.7982 0.6931 0.8129 0.8846
Media 0.5754 0.7409 0.8085 0.7197 0.8393 0.8825
Table 3: General comparison of all the experiments based on the average quantitative evaluation criteria.

VGG16-UNet with Data Augmentation

The last column of Table.1 refers to the proposed model which is shown that it is able to generalize quite well and has learned to eliminate the outliers. Note that at this point we are not performing any preprocessing steps or post processing steps. The segmentation accuracy has improved significantly as compared to the simple U-Net and VGG16-UNet without data augmentation. In addition, some post processing can be used to improve the boundary smoothness of the region of interest. The Jaccard similarity measures between the ground truth and the prediction is very high based on the presented result in Table.2 which is . The model has seen a lot of augmented examples is able to generalize very well. Also, Table.3 shows the general scores of the proposed for VGG16-UNet model which has improvement over two previous methods and can get better results for media segmentation than the lumen.

3.4 Discussion

We initially experimented with the simple U-Net which the obtained results are not very good. The model performs poorly and is unable to generalize well enough. Besides the amount of data in the training set is quite less and the model does not see many samples to learn.

The VGG16 U-Net is able to generalize very well and no image from the test set is used to train the model. The results are significantly better the simple U-Net described before. From our literature review the authors[8] who implemented a Hough-CNN to segment MRI and ultrasound modalities of the brain say that ”deeper neural networks can work very well with small datasets”. We were able to verify this claim by implementing a deep U-Net structure with a VGG16[15] encoder. We applied data augmentation to the training and the results improved significantly since the model has seen a variety of samples. We also conducted an experiment to check whether the Adam[18] or the SGD parameter optimizer gives us better convergence and better accuracy. We found out that SGD works well for this type of problem. Adaptive optimization need not always be a good choice. We used the SGD and we set a high learning rate and reduced it if no improvement was seen after epochs. This greatly helped the model to converge and improve the accuracy and the Dice score. Our model is able to generate predictions on test images in the test in about less than seconds during the inference phase.

Even though our method provides accuracy for some predictions, there are some predictions that are affected by serious artifacts. Moreover the test dataset has more noise in terms of bifurcations, side branches, shadows caused by the catheter. Many images in the test set have more than one class of artifacts. However, Post processing the output images can significantly improve the accuracy.

4 Conclusion

This paper have presented a deep learning method for detection of lumen and media in IUVS images which is important to identify the build up plaques in the walls of the coronary vessels. In particular the deeper version of U-Net, a variant of the fully-convolutional neural network that employs a VGG16 as encoder (VGG16-UNet) is used for IUVS image segmentation. The proposed method is evaluated by using the dataset consist of IVUS scans and each with its own lumen and media labels. We show that the proposed VGG16-UNet network is able to perform quite well during training and testing and provides good segmentation results visually and quantitatively during the inference phase.


  1. email: {cbalakri, dadashza, soltanin}
  2. email: {cbalakri, dadashza, soltanin}
  3. email: {cbalakri, dadashza, soltanin}
  4. From
  5. From


  1. Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 19902015: a systematic analysis for the global burden of disease study 2015. The Lancet, 388(10053):1545 – 1602, 2016.
  2. Franois Destrempes, Marie-Hlne Roy Cardinal, Louise Allard, Jean-Claude Tardif, and Guy Cloutier. Segmentation method of intravascular ultrasound images of human coronary arteries. Computerized Medical Imaging and Graphics, 38(2):91 – 103, 2014. Special Issue
  3. Faraji, M., Cheng, I., Naudin, I., Basu, A.: Segmentation of arterial walls in intravascular ultrasound cross-sectional images using extremal region selection. Ultrasonics 84, 356–365 (2018)
  4. Faraji, M., Shanbehzadeh, J., Nasrollahi, K., Moeslund, T.B.: Erel: extremal regions of extremum levels. In: Image Processing (ICIP), 2015 IEEE International Conference on. pp. 681–685. IEEE (2015)
  5. Faraji, M., Shanbehzadeh, J., Nasrollahi, K., Moeslund, T.B.: Extremal regions detection guided by maxima of gradient magnitude. IEEE Transactions on Image Processing 24 (12), 5401–5415 (2015)
  6. Eli Gibson, Wenqi Li, Carole H. Sudre, Lucas Fidon, Dzoshkun Shakir, Guotai Wang, Zach Eaton-Rosen, Robert Gray, Tom Doel, Yipeng Hu, Tom Whyntie, Parashkev Nachev, Dean C. Barratt, Sébastien Ourselin, M. Jorge Cardoso, and Tom Vercauteren. Niftynet: a deep-learning platform for medical imaging. CoRR, abs/1709.03485, 2017.
  7. F. Milletari, N. Navab, and S. A. Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 565–571, Oct 2016.
  8. Fausto Milletari, Seyed-Ahmad Ahmadi, Christine Kroll, Annika Plate, Verena Rozanski, Juliana Maiostre, Johannes Levin, Olaf Dietrich, Birgit Ertl-Wagner, Kai Btzel, and Nassir Navab. Hough-cnn: Deep learning for segmentation of deep brain regions in mri and ultra- sound. Computer Vision and Image Understanding, 164:92 – 102, 2017. Deep Learning for Computer Vision.
  9. E. Smistad, A. stvik, B. O. Haugen, and L. Lvstakken. 2d left ventricle segmentation using deep learning. In 2017 IEEE International Ultrasonics Symposium (IUS), pages 1–4, Sept 2017.
  10. S. Su, Z. Gao, H. Zhang, Q. Lin, W. K. Hau, and S. Li. Detection of lumen and media- adventitia borders in ivus images using sparse auto-encoder neural network. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pages 1120–1124, April 2017.
  11. Shengran Su, Zhenghui Hu, Qiang Lin, William Kongto Hau, Zhifan Gao, and Heye Zhang. An artificial neural network method for lumen and media-adventitia border detection in ivus. Computerized Medical Imaging and Graphics, 57:29 – 39, 2017. Recent Developments in Machine Learning for Medical Imaging Applications.
  12. N. Torbati, A. Ayatollahi, and A. Kermani. Ultrasound image segmentation by using a fir neural network. In 2013 21st Iranian Conference on Electrical Engineering (ICEE), pages 1–5, May 2013.
  13. Google Trends: Deep learning image segmentation,
  14. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
  15. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
  16. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. CoRR, abs/1411.4038, 2014.
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR, abs/1502.01852, 2015.
  18. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
  19. Balocco, S., Gatta, C., Ciompi, F., Wahle, A., Radeva, P., Carlier, S., Unal, G., Sanidas, E., Mauri, J., Carillo, X., et al.: Standardized evaluation methodology and reference database for evaluating ivus image segmentation. Computerized medical imaging and graphics 38 (2), 70–90 (2014)
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description