Symbolic Segmentation Using Algorithm Selection
In this paper we present an alternative approach to symbolic segmentation; instead of implementing a new method we approach symbolic segmentation as an algorithm selection problem. That is, let there be available algorithms for symbolic segmentation, a selection mechanism forms a set of input features and image attributes and selects on a case by case basis the best algorithm. The selection mechanism is demonstrated from within an algorithm framework where the selection is done in a set of various algorithm networks. Two sets of experiments are performed and in both cases we demonstrate that the algorithm selection allows to increase the result of the symbolic segmentation by a considerable amount.
The research field of computer vision contains currently several very hard open issues. One of the problems being investigated is the problem of the symbolic segmentation; in this task the algorithm must segment images into meaningful regions and then detect objects present in the image. Both segmentation and object recognition have been extensively studied using various approaches. For instance, for segmentation in various contexts several dedicated resources exists [27, 10, 7]. Similarly algorithms for various contexts have been developed such as for natural images [25, 38, 3, 24], for medical images [37, 17, 29, 1] or for biological images [2, 28]. The object recognition are received even more attention due to very high interest in object recognition from the industry. Some of the recent approach to object recognition and detection include [20, 12, 14, 8, 6].
The combination of both segmentation and recognition is however more difficult and relatively smaller number of studies and approaches have been proposed.For instance semantic segmentation has been implemented as a combination of segmentation and recognition , probabilistic models [40, 16], convolutional networks  or other approaches for either specific conditions , a unified framework  or interleaved recognition and segmentation . Some of the main difficulties of semantic segmentation are:
The segmentation by humans depends on recognition and higher level information 
The recognition is directly depending on features and regions from which the features are extracted.
The context of the image strongly modulate segmentation and object recognition.
Consequently the symbolic segmentation is very complex due to the mutual influences of recognition and segmentation and the designed algorithms have generally high specificity to some particular features or context.
As can be seen in computer science and other fields requiring algorithms it happens very often that several algorithms are implemented to solve similar or same problem in some varying contexts, environments or different types of inputs. The reason for such diversity and specificity is the fact that real-world problems are much more complex and dynamical than the current state of art software and hardware can handle. Consequently several approaches used the algorithm selection approach to improve the algorithms for various problems.
In this paper we propose the algorithm selection approach to the problem of symbolic segmentation. We base our work on the previously proposed platform for algorithm selection in . We show that using algorithm selection and high level reasoning about the results of algorithm processing allows to iteratively improve result of semantic segmentation. We analyze two different approaches for algorithm selection using either Bayesian Network (BN) or Support Vector Machine (SVM). The main contributions of this paper are:
Analysis of an iterative algorithm selection framework in the context of symbolic segmentation
Evaluation of two different machine learning approaches for semantic segmentation algorithms
Demonstration of the fact that despite the low precision of the algorithm selector the resulting semantic segmentation is improved
2 Previous Work and Background
The algorithm selection have been used previously in the area of image processing as well as in certain applications to computer vision. The general idea behind the algorithm selection is to select a unique algorithm for a particular set of properties, attributes and features extracted from the data or obtained prior to processing. The algorithm selection was originally proposed by Rice  for the problem of operating system scheduler selection. Since then the algorithm selection have been used in various problems but has never become a main stream of problem solving.
The reason for which algorithm selection is not a mainstream is dual: on one hand it is necessary to find distinctive features and on the other hand the problem studied should be difficult enough that extracting additional features from the input data is computationally advantageous.
The distinctive features might be too expensive (computationally) to obtain and thus algorithm selection requires the selection of such features that provide the highest quality of algorithm selection using the least amount of features. This idea is illustrated in Figure 1. Figure 1a shows that when features are not well identified the algorithm selection does not allow to uniquely determine the best algorithm because the features are non-distinctive for the available algorithms. Counter example using distinctive features is shown in Figure 1b.
The ratio of computational effort that is required to extract additional features to the whole computation of the result can be estimated by comparing their respective computational time. In  it was shown that for the task of image segmentation the algorithm selection is directly proportional to the size of the processed region of the image. If the region of segmentation is too small, the resulting segmentation of the tested algorithms results in very similar f-values and thus selecting fastest/computational least expensive algorithm. For regions of larger size up to regions having the size of the input image, algorithm selection is both advantageous due to computational advantages as well as due to the increased quality of the result.
In computer vision and image processing the algorithm selection was previously on various levels of algorithmic processing. For instance, image segmentation of artificial  or biological images  was successfully implemented using algorithm selection approach. A set of features was found sufficient and allowed to clearly separate the area of performance of different algorithms. These two approaches however focused to separate the available algorithms only with respect to noise present in the image. Moreover, the algorithms used were single level line detectors such as Canny or the Prewitt. More complex algorithms for image segmentations were studied in [21, 23]. Similarly to [41, 39] a method using machine learning for algorithm selection for the segmentation of natural real-world images was developed. Other approaches have been studying the parameter selection or improving image processing algorithms using either machine learning or analytical methods but their approach is in general contained within a single algorithm [15, 33, 35].
Methods and algorithms aimed at understanding of real world images have in general quite limited extend of their application. Currently there is a large amount of work combining segmentation and recognition and some of them are [16, 5]. In  uses an interleaved object recognition and segmentation in such manner that the recognition is used to seed the segmentation and obtain more precise detected objects contours. In  objects are detected by combining part detection and segmentation in order to obtain better shapes of objects. More general approaches such as  build a list of available objects and categories by learning them from data samples and reducing them to relevant information using some dictionary tool. However this approach does not scale to arbitrary size because the labels are not structured and ultimately require complete knowledge of the whole world.
In  uses depth information to estimate whole image properties such as occlusions, background and foreground isolation and point of view estimation to determine type of objects in the image. All the modules of this approach are processed in parallel and integrated in a final single step. An airport apron analysis is performed in  where the authors use motion tracking and understanding inspired by cognitive vision techniques. Finally, the image understanding can also be approached from a more holistic approach such as for instance in  where the intent is only to estimate the nature of the image and distinguish between mostly natural or artificial content.
3 Algorithm Selection for Symbolic Segmentation
The framework used in this experiments was originally introduced in . The schematic representation is shown in Figure 2. The processing start by extracting features (1) from the input image which are used by the algorithm selector (2) to determine the most appropriate algorithm. The input image is processed by the selected network of algorithms (3) which results in symbolic segmentation of the input image. The symbolic segmentation result is interpreted by constructing a multi-relational graph representing the high level description. The high-level description is analyzed for symbolic contradiction (5).
The contradiction is obtained using a contradiction which is based on co-occurrence statistics obtained from training data. If a contradiction is detected a new hypothesis about a region containing a contradiction is generated by the largest co-occurrence statistics given the symbolic segmentation for all but one regions being fixed. Once the new hypothesis is generated it is used as an additional input to the algorithm selector. Finally, features are extracted from the region of the contradiction. This new set of features values and hypothesis attributes are used for a new algorithm selection.
The newly selected algorithm processes the whole image and generates a new symbolic segmentation. The region that before contained the contradiction is now extracted and is merged with the original high-level description (4). The new high-level description is analyzed and the cycle begins again. The processing stops when for a given input there are no more contradictions or when no more algorithms can be selected. This amounts to either have no more errors in the high level description or when no more new hypotheses can be generated. This platform will be referred to as Iterative Analysis (IA) as it incrementally changes the high level description of the input image.
The output of symbolic segmentation algorithm is a set of labeled regions. The high-level interpretation (description) consists of building a multi-relational graph, that specifies relations between the labeled regions in the resulting image. Using this graph the result is checked for contradiction and a hypothesis about the recognized objects’ relations is generated if necessary. Currently, the IA platform uses co-occurrence statistics obtained from training data to estimate the contradiction and to propose most viable hypothesis. The estimated relations are the relative position (left, right, above, below), relative size (larger, smaller, same), background/foreground (in front, back) and one single object property which is the shape (Hough transform). Each of these properties are applied to either a pair of objects or individual objects and the probability of contradiction is generated as a cumulative normalized product of all individual scores. An example of IA processing an image is shown in Figure 3.
The verification is intended as an additional source of information; the reasoning over the recognized regions is performed only on relational level and thus only if two or more regions are detected our method is applicable.
To evaluate the proposed framework we used the VOC2012 data and three different algorithms for symbolic segmentation [16, 5, 11]. Each of the algorithms use similar or none preprocessing, different segmentation and similar classification machine learning based object recognition. All three algorithms have been evaluated and tested on the VOC2012 data set.
As introduced in Section 3 the high level verification requires multiple objects detections in one image. Consequently the testing and the training of the IA platform was carried only on images that contain more than one distinct object. The training set requires that not only the input contains more than one objects in the ground truth but also that at least one of the algorithms used is able to detect at least two objects in the image. Failing to do so the verification procedure will not be triggered and the iterative process of high level understanding improvement could not be started.
The experiments are carried over various features’ set and terminating conditions. We evaluate two different algorithm selection algorithms: a Bayesian Network (BN) and support vector machine (SVM). The motivation for using these two different methods is one hand given by the ability of using hierarchy of information and thus to reduce the complexity of learning and on the other hand the simplicity and in general good learning results of BN and SVM respectively.
4.1 Training of the Algorithm Selector
For SVM algorithm selection two SVM are trained: one for the selection of algorithms from image features only SVM and one for algorithm selection using features and hypothesis attributes SVM. Such approach is used as a solution to the problem of missing values in the inputs of SVM  and is one of the possible solutions . Initially two separate SVM machines have been used: one for the initial algorithm selection using only image features and another one for selection using features and hypothesis attributes. However it was shown experimentally that patching approach  outperformed the two separate SVMs. Using the patching approach, whenever the attributes of an image could not be obtained (hypithesis was not generated, or it is unknown) the attributes values were generated by the average of the available values.
The first training data set is equivalent to the VOC2012 training data set. In the case of SVM only features are extracted. The feature vector contains all together 7856 feature values composed from histograms of various features. The features used are brightness, fft, gabor, wavelets, rgb intensity, acutance, and so on. The second training data set is created from bounding boxes of around the semantic segmentations in the training set of VOC2012 data set. Same features as in but additionally a set of attributes extracted from the region corresponding to the region of the correct semantic segmentation is extracted using the Matlab regionprops function.
In the case of the BN only is used for training as the BN is well suited to handle missing input values. However the BN approach requires deterministic input values - observations. Because most of the features extracted are continuous values within a certain range it is necessary to cluster the data to discrete values. The clusterization is done using an equivalent ranges for each value given by (1).
The BN structure is shown in Figure 4 and the inputs are specified by three categories: application specifications, hypothesis attributes and image features.
The application specification represents input information about the target application and other application related information that are constant in the framework of this study. The attributes are regional properties extracted regionprops command in Matlab and represent the attributes for each of the available hypotheses. The hypotheses are the available labels for used in region labeling. Here the labels corresponds to the 20 classes and a background from the VOC database. Each attribute for a class is calculated as an average of the values extracted from all objects of that class encountered in the training data set. The extracted features from the image are together with the attributes clustered as described in next subsection.
Both the training and the testing data however are fairly imbalance as can be seen in Table 1.
|ALE ||COMP6 ||CPMC |
The creation of the training sets follows different principles depending whether the training set is or . For the data set, each sample image is evaluated as with being all labels present in the ground truth of image I, and is the f-value of the symbolic segmentation of class in image I. In the evaluation of each algorithm is done only with respect to the region representing a single label fully enclosed in the bounding box provided by the VOCdevkit.
Finally, the experimental results have shown that using all data for learning the algorithm selection is not well suited because many images have relatively close results of processing by more than one algorithm. Let, be an input image and are f-values calculated on the output of each algorithm applied to , let be the ordered set of then a is used for learning if . In most of the experiments in this paper was set to .
4.2 Testing of the Platform
The testing of the system was done over a subset of images from the VOC2012 validation data set; images that contain at least two objects in the ground truth. At first we evaluate the algorithm selector ability to learn to classify the images according to which algorithm results in best symbolic segmentation. To evaluate the classification power of both algorithms we analyzed results both for binary classification (with two different algorithms for semantic segmentation) and for multi-class classification (using all three available semantic segmenters). Then the whole system is analyzed by looking at the resulting data.
First we evaluated the BN for various levels of data clustering. Intuitively, the size of the BN is directly and inversely proportional to the number of values on the input observations; the conditional probabilities tables in the nodes where the observable inputs are connected to grows according to with being the number of observable values of the input variables and being the number of input variables connected to this node. The experimentation using the BN was carried in Matlab using the BNT  package and the learning of the BN was performed using the EM algorithm. The results of evaluating the BN classification power on the data set with respect to the number of observed data values is shown in Table 2
|Clusters||BN Classification Error|
Notice that for the EM algorithm used for BN learning results in very high error rate of classification and for any the EM does not converge. The BN is fairly limited in the number of input nodes as well. Because the conditional probability table in each node of the BN grows using 1. Consequently using the BNT Matlab package we were able to experiment with a BN having at best 10 sextenary input feature variables.
Because the BN requires the best features for high quality of classification we performed two different experiments of classification with BN: (a) search for best features for BN and (b) using clustered PCA features. The results using the data set are shown in Table 3.
|Task||Clusters||Number of Features||BN Error|
Contrary to the BN the SVM uses continuous features values and only normalization is required. Moreover SVM works well with large input vectors that are in general reduced using PCA for increased speed and accuracy of classification. The results of testing of the SVM classification using the and training data is shown in Table 4. The evaluation was done using two data sets; one data set contained image regions (bounding boxes with individual semantic segmentations) and another data set contained full images (denoted FI in Table 4).
|Task||Data set||SVM Classification Error|
The main result that can be seen in Table 4 is that the error rate on classification is significantly smaller than when the SVM is using only features. Moreover all experiments where no attributes are used, the SVM is given mean values of the attributes. When the SVM was used completely without the attributes and was trained exclusively in he features the results have even lower accuracy of selection. Consequently all experiments on the IA platform were done using a single SVM that was either given only features and mean values of attributes or features and hypothesis attributes. Moreover, as can be expected the error rate of classification is significantly lower for two algorithms.
We can see that both the learning of the whole images as well as the learning of segments performs relatively poor with both the SVM and the BN. However the IA platform uses high level verification and thus it was tested with the best of the algorithm selector, the SVM.
To evaluate the IA platform data from the VOC2012 trainval set was used. The average precision of the of the three algorithms and the iterative analysis approach is shown in Table 5.
Some examples of processing are shown in Figure 5. Notice that despite the low accuracy a number of images are improved by selecting the regions from each algorithm.
To see how well the IA approach is performing we compare the average precision of each category of class. Comparison of each algorithm’s results is shown in Table 6. As can be seen due to the low level of learning our IA framework outperformed the highest classes precision only in three classes of objects: the boat, bus and dog. For the rest of the categories the IA approach was able to outperform most of the algorithms but one. This is due to the fact that the selection accuracy is relatively low.
Notice that according to the schematic of the IA platform the low accuracy of the algorithm selector could be compensated by a stronger verification and reasoning mechanism. Consider the third row in Figure 5. A better reasoning procedure would lead to a result as shown in the hypothetical and ideal case shown in Figure 3 rather to the result shown in the last column of the third row in Figure 5. The simplest heuristics that would prevent replacing regions directly reducing the f-value could increase the overall result without any significant computational overhead. Similar heuristics for improbable regions removal can also be implemented in parallel to the co-occurrence statistics. Thus even a relatively inaccurate algorithm selection with combined with simple high level verification would lead to better results.
In this paper we introduced a soft computing approach to the semantic segmentation problem. The method is based on an algorithm selection platform with the target to increase the quality of the result by reasoning on the content of algorithms outputs. The IA platform for image understanding iteratively improves the high level understanding and even with a very weak algorithm selector can outperform in many cases the best algorithm by combining the best results of each available algorithm.
In the future several direct extensions and improvements are planned to the IA platform. First the algorithm selection accuracy must be improved. Second the high level verification also requires a more robust method of contradiction detection and hypothesis generation. Co-occurrence statistics are not sufficient because their dependence on the training data. Finally the result merging requires more flexible and robust mechanism in order to avoid decrease in result quality.
-  A. Ali, M. Couceiro, A. Hassanien, M. Tolba, and V. Snasel. Fuzzy c-means based liver ct image segmentation with optimum number of clusters. In P. Kroemer, A. Abraham, and V. Snasil, editors, Proceedings of the Fifth International Conference on Innovations in Bio-Inspired Computing and Applications IBICA 2014, volume 303 of Advances in Intelligent Systems and Computing, pages 131–139. Springer International Publishing, 2014.
-  R. Ali, M. Gooding, T. Szilagyi, B. Vojnovic, M. Christlieb, and M. Brady. Automatic segmentation of adherent biological cell boundaries and nuclei from brightfield microscopy images. 23(4):607–621, 2011.
-  P. Arbelaez. Boundary extraction in natural images using ultrametric contour maps. In Computer Vision and Pattern Recognition Workshop, 2006. CVPRW ’06. Conference on, page 182, june 2006.
-  P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik. Finding animals: Semantic segmentation using regions and parts. In International Conference on Computer Vision and Pattern Recognition, 2012.
-  J. Carreira, F. Li, and C. Sminchisescu. Object recognition by sequential figure-ground ranking. International Journal of Computer Vision, 98(3):243–262, 2012.
-  U. Ciresan, D. andMeier and J. Schmidhuber. Multi-column deep neural networks for image classification. Technical report, IDSIA, 2012.
-  M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, June 2010.
-  P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 2010.
-  J. Ferryman, M. Borg, D. Thirde, F. Fusier, V. Valentin, F. Bremond, M. Thonnat, J. Aguilera, and M. Kampel. Automated scene understanding for airport aprons. In Proceedings of 18th Australian Joint Conference on Artificial Intelligence, Sidney, Australia, 2005. Springer-Verlag.
-  E. Gelasca, J. Byun, B. Obara, and B. Manjunath. Evaluation and benchmark for biological image segmentation. In Proceedings of the International conference on Image Processing, 2008.
-  B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In European Conference on Computer Vision (ECCV), 2014.
-  J. Heikkila and O. Silven. A real-time system for monitoring of cyclists and pedestrians. Image and Vision Computing, 22(7):563 – 570, 2004. Visual Surveillance.
-  D. Hoiem, A. A. Efros, and M. Hebert. Closing the loop on scene interpretation. In Proc. Computer Vision and Pattern Recognition (CVPR), June 2008.
-  H. Jung, D. Kim, P. Yoon, and J. Kim. Structure analysis based parking slot marking recognition for semi-automatic parking system. In D.-Y. Yeung, J. Kwok, A. Fred, F. Roli, and D. de Ridder, editors, Structural, Syntactic, and Statistical Pattern Recognition, volume 4109 of Lecture Notes in Computer Science, pages 384–393. Springer Berlin Heidelberg, 2006.
-  V. Kolmogorov, Y. Boykov, and C. Rother. Applications of parametric maxflow in computer vision. In In Proceedings of the International Conference on Computer Vision, 2007.
-  L. Ladicky, C. Russell, P. Kohli, and P. Torr. Graph cut based inference with co-occurrence statistics. In Proceedings of the 11th European conference on Computer vision, 2010.
-  G. Lathen. Segmentation methods for digital image analysis: Blood vessels, multi-scale filtering, and level set methods, 2010.
-  B. Leibe, A. Leonardis, and B. Schiele. Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77:259Â289, 2008.
-  L.-J. Li, R. Socher, and L. Fei-Fei. Towards total scene understanding:classification, annotation and segmentation in an automatic framework. In Computer Vision and Pattern Recognition (CVPR), 2009.
-  D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, 1999.
-  M. Lukac and M. Kameyama. Adaptive functional module selectiopn using machine learning: Framework for intelligent robotics. In Proceedings of the SICE, 2011.
-  M. Lukac, M. Kameyama, and K. Hiura. Natural image understanding using algorithm selection and high level feedback. In SPIE Intelligent Robots and Computer Vision XXX: algorithms and Techniques, 2013.
-  M. Lukac, R. Tanizawa, and M. Kameyama. Machine learning based adaptive contour detection using algorithm selection and image splitting. Interdisciplinary Information Sciences, 18(2):123–134, 2012.
-  M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. Using contours to detect and localize junctions in natural images. In Conference on vision and Pattern Recognition, 2008.
-  J. Malik, S. Belongie, J. Shi, and T. Leung. Textons, contours and regions: Cue combination in image segmentation. In International Conference on Computer Vision, 1999.
-  H. Mallinson and A. Gammerman. Imputation using support vector machines, 2003.
-  M. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pages 416–423, July 2001.
-  E. Meijering. Cell segmentation: 50 years down the road. 29(5):140–145, 2012.
-  A. Mharib, A. Ramli, S. Mashohor, and R. Mahmood. Survey on liver ct image segmentation methods. 37(2):83–95, 2012.
-  K. P. Murphy. The bayes net toolbox for matlab. Computing Science and Statistics, 33:2001, 2001.
-  A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175, 2001.
-  K. Pelckmans, J. De Brabanter, J. Suykens, and B. De Moor. Handling missing values in support vector machine classifiers. (18), 2005.
-  B. Peng and V. Veksler. Parameter selection for graph cut based image segmentation. In In Proceedings of the British Conference on Computer Vision, 2008.
-  A. G. A. Perera, G. Brooksby, A. Hoogs, and G. Doretto. Moving object segmentation using scene understanding. In Conference on Computer Vision and Pattern Recognition, 2006.
-  B. Price, B. Morse, and S. Cohen. Geodesic graph cut for interactive image segmentation. In In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2010.
-  J. Rice. The algorithm selection problem. Advances in Computers, 15:65Â118, 1976.
-  N. Sharma and L. M. Aggarwal. Automated medical image segmentation techniques. 1(35):3â14, 2010.
-  J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
-  S. Takemoto and H. Yokota. Algorithm selection for intracellular image segmentation based on region similarity. In Ninth International Conference on Intelligent Systems Design and Applications, 2009.
-  Z. Tu and S.-C. Zhu. Image segmentation by data-driven markov chain monte carlo. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5):657–673, May 2002.
-  X. Yong, D. Feng, and Z. Rongchun. Optimal selection of image segmentation algorithms based on performance prediction. In Proceedings of the Pan-Sydney Area Workshop on Visual Information Processing (VIP2003), 2003.
-  E. Zavitz and L. J. Baker. Higher order of image structure enables boundary segmentation in the absence of luminance or contrast cues. 4(14):1–14, 2014.