Simple Interactive Image Segmentation using Label Propagation through kNN graphs
Many interactive image segmentation techniques are based on semi-supervised learning. The user may label some pixels from each object and the SSL algorithm will propagate the labels from the labeled to the unlabeled pixels, finding object boundaries. This paper proposes a new SSL graph-based interactive image segmentation approach, using undirected and unweighted kNN graphs, from which the unlabeled nodes receive contributions from other nodes (either labeled or unlabeled). It is simpler than many other techniques, but it still achieves significant classification accuracy in the image segmentation task. Computer simulations are performed using some real-world images, extracted from the Microsoft GrabCut dataset. The segmentation results show the effectiveness of the proposed approach.
Instituto de Geociências e Ciências Exatas
Universidade Estadual Paulista “Júlio de Mesquita Filho” (UNESP)
Avenida 24A, 1515 – 13.506-900 – Rio Claro – SP – Brazil \email@example.com
Image segmentation is considered one of the most difficult tasks in image processing . It is the process of dividing an image into parts, identifying objects or other relevant information . Fully automatic segmentation is still very difficult to accomplish and the existing techniques are usually domain-dependent. Therefore, interactive image segmentation, in which the segmentation process is partially supervised, has experienced increasing interest in the last decades [7, 17, 22, 5, 14, 13, 23, 21, 19, 1, 2, 27, 8, 10].
Semi-supervised learning (SSL) is an important field in machine learning, usually applied when unlabeled data is abundant but the process of labeling is expensive, time consuming and/or requiring intensive work of human specialists [30, 11]. This characteristics makes SSL an interesting approach to perform interactive image segmentation, which may be seen as a pixel classification process. In this scenario, there are often many unlabeled pixels to be classified. An human specialist can easily classify some of them, which are away from the borders, but the process of defining the borders manually is difficult and time consuming.
Many interactive image segmentation techniques are, in fact, based on semi-supervised learning. The user may label some pixels from each object, away from the boundaries where the task is easier. Then, the SSL algorithm will iteratively propagate the labels from the labeled pixels to the unlabeled pixels, finding the boundaries. This paper proposes a different SSL-based interactive image segmentation approach. It is simpler than many other techniques, but it still achieves significant classification accuracy in the image segmentation task. In particular, it was applied to some real-world images, including some images extracted from the Microsoft GrabCut dataset . The segmentation results show the effectiveness of the proposed approach.
1.1 Related work
The approach proposed in this paper may be classified in the category of graph-based semi-supervised learning. Algorithms on this category rely on the idea of building a graph which nodes are data items (both labeled and unlabeled) and the edges represent similarities between them. Label information from the labeled nodes is propagate through the graph to classify all the nodes . Many graph-based methods [6, 29, 28, 3, 4, 18] are similar and share the same regularization framework . They usually employ weighted graphs and labels are spread globally, differently from the proposed approach, where the label spreading is limited to neighboring nodes and the graph is undirected and unweighted.
Another graph-based method, known as Label Propagation through Linear Neighborhoods , also uses a -nearest neighbors graph to propagate labels. However, the edges have weights, which require the resolution of quadratic programming problems to be calculated, prior to the iterative label propagation process. On the other hand, the proposed approach uses only unweighted edges.
1.2 Technique overview
In the proposed method, an unweighted and undirected graph is generated by connecting each node (data item) to its -nearest neighbors. Then, in a iterative process, unlabeled nodes will receive contributions from all its neighbors (either labeled or unlabeled) to define their own label. The algorithm usually converges quickly, and each unlabeled node is labeled after the class from which it received most contributions. Differently from many other graph-based methods, no calculation of edge weights or Laplacian matrix are required.
2 The Proposed Model
In this section, the proposed technique will be detailed. Given a bidimensional digital image, the set of pixels are reorganized as , such that is the labeled pixel subset and is the unlabeled pixels subset. is the set containing the labels. is the function associating each to its label as the algorithm output. The algorithm will estimate for each unlabeled pixel .
2.1 -NN Graph Generation
|1||Pixel row location|
|2||Pixel column location|
|3||Red (R) component of the pixel|
|4||Green (G) component of the pixel|
|5||Blue (B) component of the pixel|
|6||Hue (H) component of the pixel|
|7||Saturation (S) component of the pixel|
|8||Value (V) component of the pixel|
|9||ExR component of the pixel|
|10||ExG component of the pixel|
|11||ExB component of the pixel|
|12||Average of R on the pixel and its neighbors (MR)|
|13||Average of G on the pixel and its neighbors (MG)|
|14||Average of B on the pixel and its neighbors (MB)|
|15||Standard deviation of R on the pixel and its neighbors (SDR)|
|16||Standard deviation of G on the pixel and its neighbors (SDG)|
|17||Standard deviation of B on the pixel and its neighbors (SDB)|
|18||Average of H on the pixel and its neighbors (MH)|
|19||Average of S on the pixel and its neighbors (MS)|
|20||Average of V on the pixel and its neighbors (MV)|
|21||Standard deviation of H on the pixel and its neighbors (SDH)|
|22||Standard deviation of S on the pixel and its neighbors (SDS)|
|23||Standard deviation of V on the pixel and its neighbors (SDV)|
For measures to , the pixel neighbors are the -connected neighborhood, except on the borders where no wraparound is applied. All components are normalized to have mean and standard deviation . They are also scaled by a vector of weights in order to emphasize/deemphasize each feature during the graph generation. ExR, ExG, and ExB components are obtained from the RGB components using the method described in . The HSV components are obtained from the RGB components using the method described in .
The undirected and unweighted graph is defined as , where is the set of nodes, and is the set of edges . Each node corresponds to a pixel . Two nodes and are connected if is among the -nearest neighbors of , or vice-versa, considering the Euclidean distance between and features. Otherwise, and are disconnected.
2.2 Label Propagation
For each node , a domination vector is created. Each element corresponds to the domination level from the class over the node . The sum of the domination vector in each node is always constant, .
The domination levels are constant in nodes corresponding to labeled pixels, with full domination by the corresponding class. On the other hand, domination levels are variable in nodes corresponding to unlabeled pixels and they are initially set equally among classes. Therefore, for each node , the domination vector is set as follows:
In the iterative phase, at each iteration each unlabeled node will get contributions from all its neighbors to calculate its new domination levels. Thus, for each unlabeled node , the domination levels are updated as follows:
where is the size of , and is the set of the neighbors. In this way, the new dominance vector is the arithmetic mean of all its neighbors dominance vectors, no matter if they are labeled or unlabeled.
The average maximum domination levels is defined as follows:
considering all representing unlabeled nodes. is checked every iterations and the algorithm stops when its increase is below between checkpoints.
At the end of the iterative process, each unlabeled pixel is assigned to the class that has the highest domination level on it:
2.3 The Algorithm
Overall, the proposed algorithm can be outlined as follows:
In order to reduce the computational resources required by the proposed method, the following implementation strategy is applied.
The iterative step of the algorithm is very fast in comparison with the graph generation step, i.e., the graph generation dominates the execution time. Therefore, the graph is generated using the k-d trees method , so the algorithm runs in linearithmic time ().
In the iterative step, each iteration runs in , where is the amount of unlabeled nodes and is usually proportional to the amount of neighbors each node has (not equal because the graph is undirected). is usually a fraction of in practical problems, and often . By increasing , one also increases each iteration execution time. On the other hand, the amount of iterations required to converge decreases as the graph becomes more connected and the labels propagate faster, as it was empirically observed in computer simulations.
The iterative steps are synchronous, i.e., the contributions any node receives to produce its domination vector in time refer to the domination levels its neighbors had in time . Therefore, parallelization of this step, corresponding to the inner loop in steps and of the Algorithm 1, is possible. Nodes can calculate their new domination vectors in parallel without running into race conditions. Synchronization is only required between iterations of the outer loop (steps to ).
The proposed technique efficacy is first tested using the real-world image shown on Fig. (a)a, extracted from , which has pixels. A trimap providing seed regions is presented in Figure (b)b. Black (0) represents the background, ignored by the algorithm; dark gray (64) is the labeled background; light gray (128) is the unlabeled region, which labels will be estimated by the proposed method; and white (255) is the labeled foreground.
The proposed technique efficacy is then verified using a series of computational experiments using nine image selected from the Microsoft GrabCut database 
For each image, and the vector of weights were optimized using the genetic algorithm available in Global Optimization Toolbox of MATLAB, with its default parameters.
5 Results and Discussion
First, the proposed method was applied to the image shown on Fig. (a)a. The best segmentation result is shown on Fig. (c)c. By comparing this output with the segmentation result achieved in  for the same image, one can notice that the proposed method achieved slightly better results, by eliminating some misclassified pixels and better defining the borders.
Then, the proposed method was applied to the nine images shown on Fig. 2, as described on Section 4. The best segmentation results achieved with the proposed method are shown on Fig. 5. Error rates are computed as the fraction between the amount of incorrectly classified pixels and the total amount of unlabeled pixels (light gray on the trimaps images shown on Fig. 3). Notice that ground truth images (Fig. 4) have a thin contour of gray pixels, which corresponds to uncertainty, i.e., pixels that received different labels by the different persons who did the manual classification. These pixels are not used in the classification error calculation.
Segmentation error rates are also summarized on Table 2. Some results from other methods [14, 12, 8] are also included for reference. By analyzing them, one can notice that the proposed method has comparable results. The results from the other methods were extracted from the respective references.
It is also important to notice that the proposed method is deterministic. Given the same parameters, it will always output the same segmentation result on different executions. Other methods, like Particle Competition and Cooperation , are stochastic. Therefore, they may output different segmentation results on each execution.
The optimized parameters and features weights () are shown on Table 3. Considering the images evaluated in this paper, pixel location features (Row and Col) are the most important features, followed by the ExB component, intensity (V), and the mean of green (MG). The least important features were hue (H), saturation (S) and all those related to standard deviation. However, no single feature received a high weight in all images. The optimal weights and seem to be highly dependent on image characteristics.
In this paper, a new SSL graph-based approach is proposed to perform interactive image segmentation. It employs undirected and unweighted NN graphs to propagate labels from nodes representing labeled pixels to nodes representing unlabeled pixels. Computer simulations with some real-world images show that the proposed approach is effective, achieving segmentation accuracy similar to those achieved by some state-of-the-art methods.
As future work, the method will be applied on more images and more features may be extracted. Methods to automatically define the parameters and may also be explored. Graph generation may also be improved to provide further increase in segmentation accuracy.
Moreover, the proposed method works for multiple labels simultaneously at no extra cost, which is an interesting property not often exhibited by other interactive image segmentation methods. This feature will also be explored in future works.
The author would like to thank the São Paulo Research Foundation - FAPESP (grant #2016/05669-4) and the National Counsel of Technological and Scientific Development - CNPq (grant #475717/2013-9) for the financial support.
- (2010-05) Improved random walker algorithm for image segmentation. In Image Analysis Interpretation (SSIAI), 2010 IEEE Southwest Symposium on, pp. 89–92. External Links: Cited by: §1.
- (2011-05) Interactive image segmentation using machine learning techniques. In Computer and Robot Vision (CRV), 2011 Canadian Conference on, pp. 264–269. External Links: Cited by: §1.
- (2004) Regularization and semisupervised learning on large graphs. In Conference on Learning Theory, pp. 624–638. Cited by: §1.1.
- (2005) On manifold regularization. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTAT 2005), New Jersey, pp. 17–24. Cited by: §1.1.
- (2004) Interactive image segmentation using an adaptive gmmrf model. In Computer Vision - ECCV 2004, T. Pajdla and J. Matas (Eds.), Lecture Notes in Computer Science, Vol. 3021, pp. 428–441 (English). External Links: Cited by: §1.
- (2001) Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, pp. 19–26. Cited by: §1.1.
- (2001) Interactive graph cuts for optimal boundary amp; region segmentation of objects in n-d images. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, Vol. 1, pp. 105–112 vol.1. External Links: Cited by: §1.
- (2015-07) Interactive image segmentation using particle competition and cooperation. In 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. External Links: Cited by: §1, §4, Table 2, §5, §5, §5.
- (2015) Auto feature weight for interactive image segmentation using particle competition and cooperation. In Proceedings - XI Workshop de Visão Computacional WVC’2015, , Vol. , , pp. 164–169. Note: Cited by: §2.1.
- (2015) Interactive image segmentation of non-contiguous classes using particle competition and cooperation. In Computational Science and Its Applications – ICCSA 2015, O. Gervasi, B. Murgante, S. Misra, M. L. Gavrilova, A. M. A. C. Rocha, C. Torre, D. Taniar and B. O. Apduhan (Eds.), Lecture Notes in Computer Science, Vol. 9155, pp. 203–216 (English). External Links: Cited by: §1.
- O. Chapelle, B. Schölkopf and A. Zien (Eds.) (2006) Semi-Supervised Learning. Adaptive Computation and Machine Learning, The MIT Press, Cambridge, MA. Cited by: §1.1, §1.
- (2008-12) Image segmentation as learning on hypergraphs. In Machine Learning and Applications, 2008. ICMLA ’08. Seventh International Conference on, pp. 247–252. External Links: Cited by: Table 2, §5.
- (2010) Interactive image segmentation using probabilistic hypergraphs. Pattern Recognition 43 (5), pp. 1863 – 1873. Note: External Links: Cited by: §1.
- (2014) Random walks in directed hypergraphs and application to semi-supervised image segmentation. Computer Vision and Image Understanding 120 (0), pp. 91 – 102. Note: External Links: Cited by: §1, Table 2, §5.
- (1977-09) An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3 (3), pp. 209–226. External Links: Cited by: §3.
- (2008) Digital image processing (3rd edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA. External Links: Cited by: §1.
- (2006-11) Random walks for image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28 (11), pp. 1768–1783. External Links: Cited by: §1.
- (2003) Transductive learning via spectral graph partitioning. In Proceedings of International Conference on Machine Learning, pp. 290–297. Cited by: §1.1.
- (2010-11) Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning. Geoscience and Remote Sensing, IEEE Transactions on 48 (11), pp. 4085–4098. External Links: Cited by: §1.
- (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. External Links: Cited by: §2.1.
- (2010-03) Fast semi-supervised image segmentation by novelty selection. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 1054–1057. External Links: Cited by: §1.
- (2007-04) Interactive image segmentation via adaptive weighted distances. Image Processing, IEEE Transactions on 16 (4), pp. 1046–1057. External Links: Cited by: §1.
- (2004-08) “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23 (3), pp. 309–314. External Links: Cited by: §1, §1, §4.
- (2001) Computer vision. Prentice Hall. External Links: Cited by: §1.
- (1978) Color gamut transform pairs. In ACM Siggraph Computer Graphics, Vol. 12, pp. 12–19. Cited by: §2.1.
- (2008-Jan.) Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering 20 (1), pp. 55–67. Cited by: §1.1.
- (2008-12) Interactive image segmentation by semi-supervised learning ensemble. In Knowledge Acquisition and Modeling, 2008. KAM ’08. International Symposium on, pp. 645–648. External Links: Cited by: §1.
- (2004) Learning with local and global consistency. In Advances in Neural Information Processing Systems, Vol. 16, pp. 321–328. External Links: Cited by: §1.1.
- (2003) Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the Twentieth International Conference on Machine Learning, pp. 912–919. Cited by: §1.1.
- (2005) Semi-supervised learning literature survey. Technical report Technical Report 1530, Computer Sciences, University of Wisconsin-Madison. Cited by: §1.1, §1.