Hybrid Forests for Left Ventricle Segmentation using only the first slice label

Hybrid Forests for Left Ventricle Segmentation using only the first slice label

Ismaël Koné and Lahsen Boulmane1 2MIA Research Group , LEM2A Lab,
Faculty of Sciences, Moulay Ismail University,
BP 11201, Avenue Zitoune
Meknes, Morocco
Abstract

Machine learning models produce state-of-the-art results in many MRI images segmentation. However, most of these models are trained on very large datasets which come from experts manual labeling. This labeling process is very time consuming and costs experts work. Therefore finding a way to reduce this cost is on high demand. In this paper, we propose a segmentation method which exploits MRI images sequential structure to nearly drop out this labeling task. Only the first slice needs to be manually labeled to train the model which then infers the next slice’s segmentation. Inference result is another datum used to train the model again. The updated model then infers the third slice and the same process is carried out until the last slice. The proposed model is an combination of two Random Forest algorithms: the classical one and a recent one namely Mondrian Forests. We applied our method on human left ventricle segmentation and results are very promising. This method can also be used to generate labels.

I Introduction

In clinical practice, doctors use cardiac cine-MRI for early detection of some cardiac attacks like heart failure with or without infarction. They manually delineate the Left Ventricle (LV) in each slice of the cine-MRI and use that for computing the Ejection Fraction percentage. Based on this metric, they diagnose the patient as presenting symptoms of a particular heart attack [1]. The bottleneck of this procedure is the manual delineation or labeling which is very time consuming. Therefore, automation is the way to face this problem. Recently, the field of Machine Learning, especially Deep Learning (DL) has shown state-of-the-art in this kind of task[2]. However it comes with a big price: the availability of huge amount of hand-labeled data for training the model. Thus we have two challenges:

• Automating the task of delineation.

• Reducing the burden of manual delineation to train a machine learning model.

To illustrate the first challenge, consider a doctor that receives daily 10 patients to diagnose a heart failure. We suppose each patient cardiac cine-MRI produces 10 slices, then the doctor has 10 x 10 = 100 slices to delineate on a daily basis. So this is a great part of the doctor’s time that’s lost on a redundant and tiring work. Therefore automation will be very helpful and allow the doctor to care more cases daily.

For the second challenge, following the first example, suppose we decide to use deep learning model automate the task. Now we have to prepare a training dataset of cardiac cine-MRI, say 100 patients’ cardiac cine-MRI where each patient cardiac cine-MRI produces 10 slices as before. So we must delineate the left ventricle on 100 patients x 10 slices, a total of 1000 slices to be able to train the deep learning model. This is obviously a very huge and costly work in terms of resources and time.

Here we propose a simple framework that exploits the sequential nature of MR images to reduce this labeling burden. We use this framework with a combination of classical machine learning algorithms, namely Standard Random Forests (RF)[3] and Mondrian Forests (MF)[4] and results are very promising. Our contribution are two folds:

• presenting a simple framework to MRI images segmentation using few labels (Sect. III-A).

• show its effectiveness in the case of LV segmentation.

We apply this method on the 2017 Automated Cardiac Diagnosis Challenge (ACDC) dataset [5] (Sect. IV).

Ii Dataset

The dataset used in this study comes from the Automated Cardiac Diagnosis Challenge (ACDC) associated with the previous Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017[5]. Data are Cine-MR images of the cardiac cycle of patients having some pathologies. They were fully anonymized and handled within the regulations set by the local ethical committee of the Hospital of Dijon (France). Labels in this dataset are segmentation of the heart in Right and Left Ventricles, Myocardium and the background. However in our experiments, we considered only the Left Ventricle segmentation which is mostly use for evaluating heat failure attacks[1]. Fig. 1 shows a sample of 3 slices from a single cardiac cine-MR of the dataset. The overlaid region in the first slice is the Left Ventricle (LV).

Iii Method

Iii-a Intuition

MRI images are a sequence of closed image sections of a part of the body that contains many organs. Consider looking at the image (Fig. 1), we notice strong resemblance between the 3 successive slices. When we observe the first slice and then we look at the second, we directly make associations between both slices. Specifically, an expert may tell us for instance, the blue overlay area in the first slice is the left ventricle (LV) (Fig. 1). Then if we look at the next slice, we immediately notice an area in the second slice that has approximately the same shape, appearance and position within the slice than the first slice’s LV. Therefore this area in the second slice is probably the LV. At the second slice, we update or refine our knowledge of how the LV looks like and use that to infer the third slice and the same process is carried out until the last slice. This is done based on visual features: LV’s position in the slice, its shape and gray intensities distribution over its surface which may present textures.

We can re-interpret this natural process in two steps:

• A learning or training phase which is the observation of the segmented LV from the first slice : the blue overlay (Fig. 1).

• An inference phase where we segment the second slice i.e we find the part which is likely to be the LV.

So our strategy consists in building a machine learning algorithm that follows this simple scheme in order to segment the LV from cardiac cine-MRI. Practically, the algorithm will loop over these two steps at each slice. This approach is based on our previous work [6].

Iii-B Formulation

Given representing the sequence of slices of a cardiac cine-MRI and considering a slice as a set of pixels, we aim at finding the subset of pixels namely in each slice that belongs to the LV. And we denote by (Background) the subset of pixels in slice that is outside . So we formalize the problem as a binary pixel-wise classification where we assign a label to each pixel such that:

 y={1,if x∈ LV (Left Ventricle).0,otherwise. (1)

We’re also given labels of the first slice.

We use algorithm 1 to segment the LV. Inputs are cardiac cine-MRI slices and its first slice manually segmented or labeled by an expert. We start by learning the left ventricle and the background from the first slice . We do it with two different machine learning models: MF and RF Classifiers[4, 3]. The parameters and represent knowledge acquired from learning respectively with MF and RF. From that point we loop over next slices to infer the LV part. For each slice, we infer the subset of pixels that are part of the LV with each model independently using the same criterion: the probability () of the pixel being in the LV of the current slice with the associated model knowledge must be greater than its probability of being in the background (). Then we apply some post-process operations on the segmentation results of each model in oder to correct some inference errors. The LV of the current slice is the combination of that has been inferred by each model. Now comes the update step which is one of the trickiest part. We use the inference or segmentation result of the MF Classifier () as a new data to train a new RF Classifier () and we let the MF as is. This trick is what gives us better accuracy among possible update scenarios. Finally the LV of the cardiac cine-MR is the set of slice-wise segmented LV.

Another important component in training a model is the choice of the features aka feature engineering. Following the previous part, we stated that our ability to recognize the LV in the next slice is based on the same shape, appearance and position within the previous slice. So here we choose the simple ones which are:

• the pixel value (visual appearance)

• the two coordinates of a pixel (location)

Iii-C Post-processing

In Fig. 2(a), red regions represent a typical result of a MF classifier inference. The green contour delimits the ground truth area. Comparing both, we notice misclassified pixels outside (isolated blobs), inside (holes) and at the boundaries of the ground truth LV. We propose two operations to resolve these issues:

• Finding the contour whose area maximally overlaps with previous slice LV. It allows to eliminate isolated blobs and holes.

• Computing the convex hull of the result to recover missing pixels at the boundaries.

Iii-C1 Finding the area of maximum overlapping with previous slice LV

the objective is to reclassify isolated blobs as background pixels and holes as LV pixels. For that, we apply an edge detection technique [7] to retrieve all contours of the segmentation result. Then we select the contour whose area has the largest overlap with the previous slice’s LV. This comes from the alignment property of MR images so the previous slice LV has the nearly the same location than the current slice one. Doing so discard isolated blobs’ contour and reconsider holes in the final result. Fig. 2(b) illustrates the process. We notice small holes are now red thus considered as the LV pixels and isolated blobs are white so discarded.

Iii-C2 Convex hull

After the previous operation, we still miss many pixels at boundaries. These missing pixels form concavities on the LV (see Fig. 2(b)). The objective is to close these concavities to get even closer to the ground truth and thus reduce errors. As the LV ground truth is approximatively convex, we use the convex hull to close concavities on the contour. In order to compute the convex hull, the contour is considered as a polygon where each contour point is a vertex and each segment joining two consecutive vertexes is a side. Briefly, the convex hull is determined by computing the angle formed by a vertex and adjacent sides. If the angle is less than or then we have a concavity so we suppress this vertex from the polygon. We do it using the method in [8]. Fig. 2(c) shows the final result. We notice the red area includes the concavities.

Iv Experiments and results

We used a Intel i7 CPU core laptop. We run experiments on Ubuntu 16.04 environment with python 2.7 language programming. We used RF implementation from sklearn library[9] and OpenCV library[10] python binding for post-processing operations. For the MF and RF classifiers, we set the number of estimators or trees to 50 and the minimum sample per leaf to 2.

We ran experiments in a couple of scenarios, each one starts by training each model i.e MF and RF with the first slice, then inferring subsequent slices following specific schemes:

• Experiment 1: basic inference.

• Experiment 2: with post-processing.

• Experiment 3: with post-processing and updating as shown in algorithm 1.

We randomly selected patient 11 of the ACDC challenge dataset and use the ED phase cine-MR to perform these experiments. The objective is to zoom in to analyze the method at the slice level. Then, we selected a sample of 45 patients and ran the same experiments.

Iv-a Qualitative results

Here, we present the visual results using a single patient (patient 11 of the ACDC challenge dataset) cardiac cine-MR. In Fig. 3, we present results of each experiment previously stated. We present only three slices to avoid overloading the page, the 2nd, 5th and 8th slice. The green contour delimits the LV ground truth, meaning everything inside it is part of the LV. The red overlay is the segmented LV of our approach with respect to the experiment settings.

Iv-A1 Fig. 3(a)

shows the result of experiment 1 where we don’t process model inferences nor updating them. We notice the first slice in the figure is very good but the next is less good. In the second slice, holes inside the LV and some of its boundary parts are excluded from the segmentation result. Even worse, the last slice has some green overlay parts around the LV thus the algorithm is really misclassified these pixels as part of the LV.

Iv-A2 Fig. 3(b)

on its turn present experiment 2 results. We clearly see the impact of the post-processing step. Now all slices are nearly equally good, the ground truth and segmented LV are very closed. Holes and boundaries are included in the LV conversely to experiment 1. Also we do not see any surrounding green overlay so they’ve been discarded.

Iv-A3 Fig. 3(c)

which presents the last experiment result doesn’t show a noticeable difference from experiment 2. But in fact it adds a small amount on the accuracy. It’s more evident in the next section where we use an evaluation metric.

Iv-B Quantitative results

To evaluate our results, we use the dice coefficient metric of two sets[11]:

 Dice(A,B)=2|A∩B||A|+|B|, (2)

where means the cardinal of the set . It’s a similarity measure between two sets and its varies between 0 and 1 where 0 means the two sets are totally different and 1 means a perfect match. In our case, we apply it to the segmented LV of our method and the ground truth LV so as to figure out how close we are to the correct result.

Fig. 4 presents this metric applied to each slice of the considered patient in the dataset with respect to each experiment’s setting. Additionally, we present the dice of each model: MF, RF, and the union both so as to see closely how each model performs. There are 9 slices where the first one is used to train the algorithm which then infers subsequent slices. So graphs in these figures are the dice score from slice 2 to 9.

Iv-B1 Patient 11

In the first experiment’s graph (Fig. 4(a)), dice scores for all models globally decrease as we move through slice except the Random Forest that gets up around middle slices but falls afterwards. This is consistent with the fact that distant slices less similar. But surprisingly dice scores are quite good for learning only from slice 1. This not only confirms our intuition but goes farther to the point that learning only from the first slice can give us good result. Dices score of overall slices are respectively: 0.869, 0.865 and 0.871 for the MF, RF and the combination of both. That’s a very good starting point. We notice that the combination of both models improves the dice score by a small fraction.

The second experiment’s graph (Fig. 4(b)) where we apply the post-processing step shows a great improvement in the dice score which now fluctuates between high values. Dices score of overall slices are now respectively: 0.947, 0.920 and 0.948 for the MF, RF and the combination of both. This is a huge jump. Here we notice that the combination of models performs slightly better than the MF one.

Concerning the third experiment’s graph (Fig. 4(c)), the RF result is close to that of MF. This normal as we are only updating the RF. A question that might raised is why have we only experimented the RF update only ? The answer is that updating the MF results in poor accuracy.

Iv-B2 The sample of 45 patients

TABLE I reports the mean variance of dice scores obtained after running the three experiments on a sample of 45 patients. We still notice a significant jump in accuracy after post-processing. But the update don’t show a significative improvement.

V Related work

The studied problem comes from a challenge where a total of 10 teams competed. The leaderboard 111$https://www.creatis.insa-lyon.fr/Challenge/acdc/miccai_{r}esults.html$ has better results than ours. But our approach is less data-hungry than theirs because we used approximately 10% of labels if we consider each MRI producing around 10 slices.This is very cost effective while competitors used CNN models that use all labels.

Some works use the same approach as ours like [12] where they developed a system for placenta segmentation from MR images using a single slice label with scribbles. We can’t compared their results with ours as images are different and they didn’t provide a way to get their dataset. However they were able to get state-of-the-art results for the placenta segmentation from fetal MRI. This shows that this strategy is very promising.

A drawback of our method is that it fails in case of misalignment of slices. Because features are mainly pixel coordinates. One solution could be to use other features. In our studies, we have in fact used features from Local Binary Pattern (LBP)[13] and Histogram of Gradients (HOG)[14]. However, results were very poor compared to the current one. Perhaps because, a single slice label is not enough for these features to expose significant structure to the used forest algorithms.

Vi Conclusion and Perspectives

We presented a method for segmenting the left ventricle in cardiac cine-MR images while reducing manual label effort. This segmentation is in practice used to detect some heart failures. This approach leverages similarities and alignment between slices to initiate the system by training it on a single manually labeled slice. Then the system follows an infer-and-learn scheme across all slices without additional manual label. We experiment this method using Mondrian and Random Forests and their combination. Results confirm that idea and show more. Indeed, after learning the first slice, the MF gives its best results when inferring subsequent slices without updates while its counterpart shows slighty better results with updates.

Recently, a new paradigm has been developed, namely data programming [15] which aims in reducing manual label effort. It has three steps:

1. Building a training set by writing function to generate labels.

2. Modeling this training set to denoise it.

3. Training a noise-aware discriminative model.

They found that in some cases, the discriminative model trained on the denoised training set performs better than it is trained directly with true labels. So our next research direction is to use this data-programming paradigm considering our approach as a labeling functions and results as noisy labels.

References

• [1] Alfakih K, Plein S, Thiele H, Jones T, Ridgway JP, Sivananthan MU. Normal human left and right ventricular dimensions for MRI as assessed by turbo gradient echo and steady-state free precession imaging sequences.. J Magn Reson Imaging 2003;17(3):323–9.
• [2] Litjens G., Kooi T., Bejnordi E. B., Setio A. A. A., Ciompi F., Ghafoorian M., Jeroen A.W.M. van der Laak, Bram van Ginneken, Sánchez C. I. A survey on deep learning in medical image analysis. Medical Image Analysis, Vol 42, p 60–88, 2017.
• [3] L. Breiman. Random Forests. Machine Learning, 45(1), 5–32, 2001.
• [4] B. Lakshminarayanan, D. M. Roy and Y. W. Teh. Mondrian forests: Efficient online random forests. In Adv. Neural Information Proc. Systems (NIPS), 2014.
• [5] O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, O. Humbert, and P. Jodoin. Automated Cardiac Diagnosis Challenge (ACDC) - MICCAI. https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html MICCAI challenge 2017.
• [6] I. Kone, and L. Boulmane. A general scheme for MRI images’ segmentation based on slice-by-slice learning. Communication, Management and Information Technology - Sampaio de Alencar (Ed.) 2017 Taylor Francis Group, London, ISBN 978-1-138-02972-9
• [7] Satoshi Suzuki and Keiichi Abe. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing, 30(1):32â46, 1985.
• [8] Jack Sklansky. Finding the convex hull of a simple polygon. Pattern Recognition Letters, 1(2):79â83, 1982.
• [9] Pedregosa et al. Scikit-learn: Machine Learning in Python. J of Machine Learning Research, 12:2825-2830, 2011.
• [10] Bradski, G. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
• [11] Lee, R., Dice: Measures of the amount of ecologic association between species. Ecology. 26(3):297â302, 1945.
• [12] Wang G. et al.: Slic-Seg: Slice-by-Slice Segmentation Propagation of the Placenta in Fetal MRI Using One-Plane Scribbles and Online Learning. In: Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention â MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham.
• [13] Ahonen, T., Hadid, A. and Pietikainen, M.: Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Volume 28 Issue 12, 2037–2041, 2006.
• [14] Dalal, N. and Triggs, B.: Histograms of Oriented Gradients for Human Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, 886–893, 2005.
• [15] Ratner, A. De Sa, C., Wu, S., Selsam, D., RÃ©, C.: Data Programming: Creating Large Training Sets, Quickly. 30th Conference on Neural Information Processing Systems (NIPS), 3567–3575, 2016.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters