Superpixels Based Segmentation and SVM Based Classification Method to Distinguish Five Diseases from Normal Regions in Wireless Capsule Endoscopy

# Superpixels Based Segmentation and SVM Based Classification Method to Distinguish Five Diseases from Normal Regions in Wireless Capsule Endoscopy

Omid Haji Maghsoudi Omid Haji MaghsoudiDepartment of Bioengineering, College of Engineering, Temple University, Philadelphia, PA, USA, 19122
Tel.: +1-267-432-6386
22email: o.maghsoudi@temple.edu
###### Abstract

Wireless Capsule Endoscopy (WCE) is relatively a new technology to examine the entire GI trace. During an examination, it captures more than 55,000 frames. Reviewing all these images is time-consuming and prone to human error. It has been a challenge to develop intelligent methods assisting physicians to review the frames. The WCE frames are captured in 8-bit color depths which provides enough a color range to detect abnormalities. Here, superpixel based methods are proposed to segment five diseases including: bleeding, Crohn’s disease, Lymphangiectasia, Xanthoma, and Lymphoid hyperplasia. Two superpixels methods are compared to provide semantic segmentation of these prolific diseases: simple linear iterative clustering (SLIC) and quick shift (QS). The segmented superpixels were classified into two classes (normal and abnormal) by support vector machine (SVM) using texture and color features. For both superpixel methods, the accuracy, specificity, sensitivity, and precision (SLIC, QS) were around 92%, 93%, 93%, and 88%, respectively. However, SLIC was dramatically faster than QS.

## 1 Introduction

Wireless capsule endoscopy (WCE) is relatively a new device being able to investigate the entire gastrointestinal (GI) tract without any pain. WCE has been used to detect small bowel abnormalities such as a tumor, bleeding, ulcers, Crohn’s disease, and celiac. This device is like a normal pill as illustrated in Fig 1. During 8 to 12 hours examination, it records more than 55,000 frames alizadeh2014segmentation. To review the recorded frames, a physician needs to spend at least an hour to review all the frames which are a time-consuming process. In addition, this review process can be prone to human error. Image processing has been used frequently to help segmenting, classification, and tracking of objects in different application li2009computer, penjweini2017investigating, maghsoudi2017superpixels; therefore it can address this need.
Many Methods have been presented to segment the regions containing abnormalities or to find frames containing those regions. In another word, these methods can be classified into two classes based on the type of detection: regions (pixel) based and frame based.
To detect the bleeding in the WCE frames, numerous methods have been presented in recent years. The majority of these methods utilized color and texture features to differentiate the bleeding regions/frames from the normal regions/frames. Combination of following features and classifiers were used to achieve this goal: extracting features from the HSI and the RGB color spaces and classifying by a probabilistic neural network pan2011bleeding; extracting chrominance moment and uniform local binary pattern, and classifying by a multilayer perceptron neural network li2009computer; extracting Haralick, Gabor, and Lawâs texture features and classifying using three neural networks maghsoudi2016detection.

Considering 30% to 50% of adults might have polyps, they can be considered as one of the most common diseases in the intestine. This change can increase to 90% for people having more than 50 years old yuan2016improved. Most polyps are not cancerous, but if one becomes large enough, there is a chance of turning into cancer. Therefore, it is necessary to detect the polyps in their early stages.
In one of the most recent studies, a method was proposed to differentiate the frames containing polyps from the frames having only the normal regions yuan2016improved. Single scale-invariant feature transform (SIFT), local binary pattern (LBP), uniform LBP, complete local binary pattern (CLBP), and a histogram of oriented gradients (HOG) were extracted from the WCE frames. Then, a support vector machine (SVM) and Fisher’s linear discriminated analysis was used to classify the frames. The highest accuracy, specificity, and sensitivity for detection of frames containing polyp were 93%, 94%, and 87%, respectively.
In another study, a pixel-based method was presented to segment the polyps in two steps: first, combined log Gabor filters and Susan edge detector were used to creating potential polyp segments, and then, geometric features were extracted to outline final polyps regions karargyris2011detection. Edge detection techniques following by Hough transform were used to extract features based on shape. Texture features were added to these features, and finally, a Cascade Adaboost was used to classify the frames using these features. A sensitivity of 91% and a specificity of 95% were reported silva2014toward.
There has been some studies to segment or detect other types of diseases in the WCE frames maghsoudi2016detection, maghsoudi2013detection, maghsoudi2017superpixel, li2011computer, kumar2012assessment, segui2016generic, omid2012segmentation. A technique was presented to detect Crohn’s disease in the WCE frame kumar2012assessment. The MPEG-7 edge histogram descriptor (edge feature), the MPEG-7 dominant color descriptor clustering colors from the LUV color space, and the MPEG-7 homogeneous texture descriptor using Gabor filters (texture features) were extracted. Finally, an SVM was used to classify the regions using these extracted features. The results indicated an accuracy of 93%, a precision of 91%, and a specificity of 91%.
To segment Crohn’s disease (Lymphangiectasia, Xanthoma, Lymphoid hyperplasia, and Stenosis), a simple method was presented using a sigmoid function to emphasize on the region based on the intensity value omid2012segmentation. The sensitivity, specificity, and accuracy were reported by 89%, 65%, and 75%, respectively.
In addition, methods were proposed to detect frames containing tumor regions in the WCE frames maghsoudi2016detection, li2012tumor.
Here, two superpixel based methods, simple linear iterative clustering (SLIC) and quick shift (QS), are presented to segment the bleeding, Crohn’s disease, Lymphangiectasia, Xanthoma, and Lymphoid hyperplasia regions in the WCE frames. Then, an SVM is used to classify the segmented regions. The main contribution of this work is to demonstrate how SLIC and QS superpixels can be used to segment the diseases regions in the WCE frames. The accurate segmentation leads to a superior classification of these regions, especially for some of these diseases compared to the previous studies.

## 2 Method

### 2.1 SLIC Superpixel

Superpixels contract and group uniform pixels in an image and it has been so popular for different applications like segmentation, object recognition, and tracking of objects in a video Ren03, Comaniciu02, Felzenszwalb04, Levinshtein09. The main idea for superpixels were presented as defining the perceptually uniform regions using the normalized cuts algorithm shi2000normalized, Ren03, Mori04, Li12 . Superpixels create a more natural and perceptually meaningful representation of an image compared to the many other methods available for segmentation.
Here, we used SLIC Achanta12 and QS vedaldi2008quick to evaluate the superpixel segmentation performance to segment the five diseases regions in the WCE frames. SLIC can be considered as a form of k-means clustering but it has two main differences: the number of distance calculations is decreased by superpixels size, color, and spatial relations are combined to update the size and compactness of superpixels Mori04.
The key parameter for SLIC is the number of superpixels. First, centers are considered as the cluster centers. Then, to avoid keeping the center located on the edge (high gradient), the center is transferred to the lowest gradient position in a neighborhood. Each of the pixels is associated with the nearest cluster center based on color information. Therefore, two coordinate components ( and ) show the location of the segment and three color components (for example , , and intensities in the RGB color space) are derived. SLIC finds and minimizes a distance (a Euclidean norm on 5D spaces) function defined as follow:

 Dc=√(Rj−Ri)2+(Gj−Gi)2+(Bj−Bi)2, (1)
 Dp=√(xj−xi)2+(yj−yi)2, (2)
 D=√(DcNc)2+(DpNp)2. (3)

Where and are maximum distances within a cluster used to normalize the color and spatial proximity. It should be said that SLIC keeps the size of superpixels between half and twice of the initially specified superpixel size. Therefore, the number of superpixels for the SLIC method determines an important role on how the segmentation can be performed. This effect is illustrated in Fig 3.

### 2.2 Features

Segmentation and detection of the five common diseases (bleeding, Crohn’s disease, Lymphangiectasia, Xanthoma, and Lymphoid hyperplasia) in the GI tract were the main goal of this study. Therefore, we needed to find the best possible features differentiating these five diseases regions from the normal landmarks in the frames. Because the shape of the diseases was varying dramatically from a frame to another frame, the shape and the size of superpixels cannot provide distinctive features. On the other hand, color and texture features can provide enough information for this distinction.
Local binary pattern has been widely used to extract texture feature yuan2016improved, maghsoudi2016detection, ojala2002multiresolution, nawarathna2014abnormal. The superiority of uniform LBP to the other texture feature methods is that it is invariant to rotation and scaling ojala2002multiresolution.

To calculate the LBP, a function T can be defined as follow for n neighbors,

 T=t(gc,g0,...,gn−1) (4)

where shows the center pixel intensity, and (p = 0, . . ., n-1) is the intensity value of the pixels locating on a circle with a radius of R. The coordinates of these neighbors can be given by (+Rcos(2p/n), -Rsin(2p/n)), in which , are the coordinates of the pixel located in the center of block. If the intensity of the center is subtracted from the intensities of all the neighbors, then the texture function can be written as:

 T=t(gc,g0−gc,...,gp−1−gc) (5)

where shows the intensity and the function can be redefined as follows:

 T=t(s(g0−gc),...,s(gp−1−gc)) (6)

Finally, the LBP feature can be calculated using the following equation:

 LBPP,R(xc,yc)=P−1∑p=0s(gp−1−gc)×2p (7)

where:

 S(x)={$1 x>1$$0 x<0$ (8)

How LBP is calculated is illustrated in Fig 2 for an example block.
In addition to LBP and uniform LBP, we extracted following five measures form the gray scale image, LBP, and uniform LBP: mean, variance, skewness, kurtosis, and entropy.
To extract color features, the images were transferred to the HSV color space. The same five features were extracted from the hue, red, green, and blue channels.

SVM has been one of the most popular classifiers in different applications and especially to classify objects in the WCE frames kumar2012assessment, li2012tumor, maghsoudi2016tracker. Therefore, we used SVM to classify the superpixels using the extracted features.

## 3 Results

To evaluate the proposed methods, 39 frames containing bleeding taken from 9 patients, 28 frames containing Crohn’s disease taken from 5 patients, 25 frames containing Lymphangiectasia taken from 4 patients, 19 frames containing Xanthoma taken from 3 patients, and 24 frames containing Lymphoid hyperplasia taken from 4 patients were gathered from the Shariati Hospital, Tehran, Iran.
To train the SVM, we randomly selected one of the patients having at least 6 frames from each of diseases. Only for the bleeding class, we had to select two patients because none of the patients had more than 6 frames in our database. The remaining frames were used for testing the trained SVM.
Five measures, mean, variance, skewness, kurtosis, and entropy, were extracted from LBP, uniform LBP, gray scale image, hue, red, green, and blue channels. This process created 35 features. We used the Laplacian score test to reduce the number of features and find the best distinctive features he2005laplacian.
As discussed in section 2, the main parameter affecting the superpixels was the number of superpixels. The frames collected from the Hospital were in 8-bit color depth and with the resolution of . Five superpixel numbers (25, 50, 100, 250, and 500) were selected to evaluate the size effect for segmentation and classification of diseases. This effect is illustrated in Fig 3 and the results are shown in Fig 4. It should be noted that we trained five SVMs for each of the superpixel numbers as the features were different based on the number of superpixels.

To quantify the segmentation after classification, the accuracy, precision, sensitivity, and specificity were measured as follows:

 Sensitivity=TPTP+FN (9)
 Specificity=TNTN+FP (10)
 Accuracy=TP+TNTP+FN+TN+FP (11)
 Precision=TPTP+FP (12)

where TP, FN, TN, and FP denote the number of pixels in abnormal regions that were correctly labeled, the number of pixels in abnormal regions that were incorrectly labeled as normal, the number of pixels in normal tissue regions that were correctly labeled, and the number of pixels in normal tissue regions that were incorrectly labeled as abnormal.

The results are quantified in Table 1 showing how the SLIC superpixels method was preformed to segment the regions and SVM labeled the segmented superpixels using the extracted features. Table 1 consists of six parts: results reported on top left of table showing the result when all five diseases were considered as a class of abnormal, and the other five parts of the table showing the results for each of the five diseases. The labeled superpixels was compared to a manual outlined region in the frames. The results for labeling of all five diseases are illustrated in Fig 5.
Amongst superpixels methods, QS algorithm was selected to be compared with the SLIC results. A 2.7 GHz intel core i5 MacBook pro with 8GB 1867MHz DDR3 memory was used to perform the methods in Python 3. Fig 6 compares the four measures (sensitivity, specificity, accuracy, and precision) for segmentation and classification of the superpixels between SLIC and QS algorithms. In addition, to evaluate the speed of the methods, we compared the required average time for segmentation using SLIC method with the QS approach; this comparison is illustrated in Fig 7.

## 4 Discussion

The need for segmentation and detection of diseases in the WCE frames has been discussed schwartz2007small, eliakim2013video. In addition, it helps physicians to review the frames by an accurate finding of the regions in the WCE frames. To address this need, it is vital to segment the regions accurately, and then, to classify the segmented regions. The SLIC and QS methods presented here can segment five diseases, bleeding, Crohn’s disease, Lymphangiectasia, Xanthoma, and Lymphoid hyperplasia, in the WCE frames. Then, the segmented regions were classified using trained SVMs.

Bleeding can be considered as one of a common type of diseases in the GI tract and it can cause a more severe disease. The majority of studies have been devoted to the segmentation and detection of bleeding in the WEC frames pan2011bleeding, li2009computer, maghsoudi2016detection, karargyris2011detection, guobing2011novel.
Most of these methods tried to find the frames containing bleeding regions, frame-based methods, and a few tried to detect the regions in the frames, pixel based methods. The sensitivity of a pixel-based method was reported more than 92% and more than 98% for frame-based detection of bleeding pan2011bleeding. We achieved to a sensitivity of 97% for detection bleeding regions as reported in Table 1. An improvement of 5 percent will help us to develop the methods for a frame based approach in future works by gathering more frames.
To segment the Crohn’s disease, a method proposed by Kumar kumar2012assessment achieved to an accuracy of 93% and precision of 91%. Another method tried to segment this type of disease, but the results showed an accuracy of 75% maghsoudi2012segmentation. The sensitivity, accuracy, precision, and specificity of our method were separately measured for each of the five diseases. These four measures for detection of Crohn’s disease were 91%, 91%, 91%, and 85%, respectively. Our proposed methods achieved to slightly better results compared to the previous methods for this type of disease.
To segment the Lymphangiectasia region in WCE frames, a method was proposed cui2010detection. The method was dependent on the size of the disease region in the frames. The accuracy and sensitivity were reported 94% and 55%, respectively. Our method showed a more accurate detection by having the accuracy and sensitivity of 91% and 90% which were significantly better than the previous methods. A summary of available methods compared to our proposed methods is illustrated in Table 2.

As Fig 3, Fig 4, and Table 1 show the SLIC based method achieved reliable results for detection of the five diseases regions in the WCE frames. The results show the trend of accuracy was a slightly different from one disease to another by changing the number of superpixels. After, reviewing the results and checking the frames, it was discovered that the diseases by a larger area showed a more dramatic decrease in the results after a specific number of superpixels. This trend was seen in Crohn’s disease, Lymphoid hyperplasia, and bleeding as they usually have a region with more than 25% of the whole image pixels. While the other two diseases, Lymphangiectasia and Xanthoma, had smaller regions and the results increased by growing the number of superpixels.

In addition to this trend, the precision was slightly higher. The reason can be the fact that larger superpixels carries more texture and color information for the detection and classification of regions. In another word, when the number of superpixels got higher and the size of them got smaller (getting closer to be a pixel again), the detection error was increased after passing a specific number of superpixel (this number was 100).
Fig 6 shows the comparison between two superpixels algorithms: QS and SLIC. As seen, the QS was slightly better when the number of superpixels was fewer than the SLIC method. Fig 7 illustrates the required average time for these two methods to segment the frames indicating that the SLIC method needed an average time of 0.7 second to process the frames (all the steps including segmentation and classification).
For future studies, we will try to develop a method to detect frames, a frame based method, by gathering more frames from each of diseases. The superpixels, specially SLIC, can provide a wealth of information to segment the abnormal regions in the WCE frames. Other methods like deep learning can be used to improve the results. yuan2017deep.

## References

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters