SemiContour: A Semisupervised Learning Approach for Contour Detection
Abstract
Supervised contour detection methods usually require many labeled training images to obtain satisfactory performance. However, a large set of annotated data might be unavailable or extremely labor intensive. In this paper, we investigate the usage of semisupervised learning (SSL) to obtain competitive detection accuracy with very limited training data (three labeled images). Specifically, we propose a semisupervised structured ensemble learning approach for contour detection built on structured random forests (SRF). To allow SRF to be applicable to unlabeled data, we present an effective sparse representation approach to capture inherent structure in image patches by finding a compact and discriminative lowdimensional subspace representation in an unsupervised manner, enabling the incorporation of abundant unlabeled patches with their estimated structured labels to help SRF perform better node splitting. We reexamine the role of sparsity and propose a novel and fast sparse coding algorithm to boost the overall learning efficiency. To the best of our knowledge, this is the first attempt to apply SSL for contour detection. Extensive experiments on the BSDS500 segmentation dataset and the NYU Depth dataset demonstrate the superiority of the proposed method.
1 Introduction
Contour detection is a fundamental but challenging computer vision task. In recent years, although the research of contour detection is gradually shifted from unsupervised learning to supervised learning, unsupervised contour detection approaches are still attractive, since it can be easily adopted into other image domains without the demand of a large amount of labeled data. However, one of the significant limitations is the high computational cost [2, 35]. On the other hand, the cuttingedge supervised contour detection methods, such as deep learning, rely on a huge amount of fully labeled training data, which often requires huge human efforts and domain expertise. Semisupervised learning (SSL) [31, 23, 18] is an alternative technique to balance the tradeoff between unsupervised learning and supervised learning. However, currently there exist no reports on semisupervised learning based contour detection.
Supervised contour detection is often based on patchtopatch or patchtopixel classification. Contours in local patches (denoted by sketch tokens [20]) contain rich and wellknown patterns, including straight lines, parallel lines, curves, Tjunctions, Yjunctions, etc [27, 20]. One of the main objectives of the most recent supervised contour detection methods is to classify these patterns using structure learning [12, 13], sparse representation [24, 35], convolution neutral network (CNN) [29, 16, 36], etc. In our method, we use unsupervised techniques to capture the patterns of unlabeled image patches, enabling the successful training of the contour detector with a limited number of labeled images. For notation convenience, we denote labeled patches as ltokens and unlabeled patches as utokens.
The proposed semisupervised structured ensemble learning approach is built on structured random forests (SRF) [17]. Inheriting from standard random forests (RF), SRF is popular because its: 1) fast prediction ability for highdimensional data, 2) robustness to label noise [23], and 3) good support to arbitrary size of outputs. However, similar to RF, SRF heavily relies on the number of labeled data [18]. These properties make SRF a good candidate for SSL.
In this paper, we propose to train SRF in a novel semisupervised manner, which only requires a few number of labeled training images. By analyzing the learning behaviors of SRF, we observe that improving the node splitting performance for data with structured labels is the key for the successful training. To this end, we incorporate abundant utokens into a limited number of ltokens to guide the node splitting, which is achieved by finding a discriminative lowdimensional subspace embedding using sparse representation techniques to learn a basis dictionary of the subspace in an unsupervised manner.
In order to solve the sparse coding problem efficiently, we also propose a novel and fast algorithm to boost the overall learning efficiency. In addition, we demonstrate the maxmargin properties of SRF, enabling us to use maxmargin learning to dynamically estimate the structured labels for utokens inside tree nodes. For better illustration, we explain the idea in Figure 1. In the experimental section, we show the vulnerability of other supervised methods to a limited number of labeled images and demonstrate that, with only 3 labeled images, our newly developed contour detector even matches or outperforms these methods which are fully trained over hundreds of labeled images.
2 Related Works
Recently, most advanced contour detection methods are based on strong supervision. Ren et al. use sparse code gradients (SCG) [35] to estimate the local gradient contrast for gPb, which slightly improves the performance of gPb. Maire et al. [24] propose to learn a reconstructive sparse transfer dictionary to address contour representation. These methods indicate the strong capability of sparse representation techniques to capture the contour structure in image patches. In the ensemble learning family, Lim et al. [20] propose sketch tokens, a midlevel feature representation, to capture local contour structure, and train a RF classifier to discriminate the patterns of sketch tokens. Dollár et al. [13] propose a structured edge (SE) detector that outperforms sketch tokens by training a SRF classifier instead. Several variants of SRF are also successfully applied to image patch classification [29, 12, 25, 3, 22, 33]. Recently, CNN has shown its strengths in contour detection [29, 16, 4], and its success is attributed to the complex and deep networks with new losses to capture contour structure. One major drawback of CNN, as well as other supervised learning methods, is its high demand of labeled data.
Semisupervised learning (SSL) has been studied to alleviate the aforementioned problems [8, 18, 23, 31]. Leistner et al. [18] treat unlabeled data as additional variables to be jointly optimized with RF iteratively. Liu et al. [23] instead use unlabeled data to help the node splitting of RF and obtain improved performance. However, it is difficult for these methods to avoid the curse of dimensionality. By contrast, this paper takes advantage of several properties of SRF to achieve an accurate contour detector with very few labeled training images. We address several critical problems to successfully learn SRF in a semisupervised manner without much sacrificing the training and testing efficiency by 1) estimating the structured labels for utokens lying on a complex and highdimensional space, and 2) preventing noises of extensively incorporated utokens from misleading the entire learning process of SRF.
3 SSL Overview in Contour Detection
SSL uses a large number of unlabeled data to augment a small number of labeled data and learns a prediction mapping function . In the scenario of contour detection, we denote as a token, and as its corresponding structured label of a certain pattern.
Contour detection performance of supervised methods is not only determined by the number of ltokens in , but also affected by the number of labeled images, from which ltokens are sampled [12]. This is because the limited information in ltokens sampled from a few labeled images is severely biased, which can not lead to a general classification model. On the contrary, sufficient utokens in sampled from many unlabeled images contain abundant information that is easy to acquire. We apply SSL to take advantage of utokens to improve the supervised training of our contour detector. However, utokens always have large appearance variations, so it is difficult to estimate their structured labels in the highdimensional space .
We propose to estimate the structure labels of utokens by transferring existing structured labels of ltokens. Because the patterns of the structured labels are limited and shared from images to images, which can be categorized into a finite number of classes (e.g., straight lines, parallel lines, and Tjunctions), the structured labels of ltokens from a few images are sufficient to approximate the structured labels of massive utokens from many images. We demonstrate this in Figure 2.
4 SSL via Structured Ensemble Learning
In this section we describe the proposed semisupervised ensemble learning approach for contour detection. The method is built on the structured random forests (SRF), which has a similar learning procedure as the standard random forest (RF) [6]. The major challenge of training SRF is that structured labels usually lie on a complex and highdimensional space, therefore direct learning criteria for node splitting in RF is not well defined. Existing solutions [17, 12] can only handle fully labeled data, and are not applicable in our case that contains both unlabeled and labeled data. We will start by briefly introducing SRF and analyze several favorable properties of SRF for SSL, and then present the proposed SSL based contour detection method.
4.1 Structured random forest
SRF is an ensemble learning technique with structured outputs, which ensembles independently trained decision trees as a forest . Robust SRF always has large diversity among trees, which is achieved by bootstrapping training data and features to prevent overfitting. Given a set of training data , starting from the root node, a decision tree attempts to propagate the data from top to bottom until data with different labels are categorized in leaf nodes.
Specifically, for all data in node , a local weak learner propagates to its left substree if , and right substree otherwise. is learned by maximizing the information gain :
(1) 
The optimization is driven by the Gini impurity or Entropy [6]. is a structured label with the same size as the training tokens. To enable the optimization of for structured labels, Dollár et al. [13] propose a mapping to project structured labels into a discrete space, , and then follow the standard way. The training terminates (i.e., leaf nodes are reached) until a stopping criteria is satisfied [6]. The most representative (i.e., closet to mean) is stored in the leaf node as its structured prediction, i.e., the posterior .
The overall prediction function of SRF ensembles predictions from all decision trees, which is defined as
(2) 
To obtain optimal performance, given a test image, we densely sample tokens in multiscales so that a single pixel can get predictions in total. The structured outputs force the spatial continuity. The averaged prediction yields soft contour responses, which intrinsically alleviate noise effects and indicate a good sign to performing SSL in SRF.
Good features play an important role in the success of SRF. Shen et al. [29] improve the SE contour detector [13] by replacing the widely used HoGlike features with CNN features. In fact, this CNN classifier itself is a weak contour detector used to generate better gradient features. Inspired by this idea, we use a limited number of ltokens from a few labeled images to first train a weak SE contour detector (denoted by ) [13]. produces efficient detection and provides prior knowledge for utokens to facilitate SSL. We will see its further usage subsequently. In our method, we use three color channels (), two gradient magnitude (obtained from ) and eight orientation channels in two scales, and thus the total feature space is , which is similar to the configuration in [20].
4.2 Semisupervised SRF learning
In our method, maximizing the information gain is achieved by minimizing the Gini impurity measurement [11], which is defined as
(3) 
where denotes the label empirical distribution of class in with respect to the th feature dimension. We adopt the mapping function [13] to map structured labels of ltokens to discrete labels. denotes when is with the discrete label. Intuitively, minimizing is to find a separating line in the th feature dimension (several feature dimensions can be used together to define a separating hyperplane [11]) to split in the whole feature space into the left and right subtrees, so that on both sides are maximized [6]. Proposition 1 proves the close relationship of the Gini impurity to the maxmargin learning.
Proposition 1.
Given the hinge loss function of maxmargin learning, the Gini impurity function is its special case.
Proof.
Since , if , then we have:
where . Because , the Proposition holds. A generalized theorem is given in [18].
Incorporate Unlabeled Data It is wellknown that a limited number of labeled data always lead to biased maxmargin estimation. We incorporate utokens into the limited number of ltokens to improve the maxmargin estimation of weak learners in every node. However, of utokens is unavailable for computing Gini impurity. One solution to address this problem [23] is to apply a kernel density estimator to obtain and use the Bayes rule to obtain . In this approach, a proper selection of bandwidth is not trivial. In addition, it can not handle structure labels and the highdimensional space, on which utokens lie. In our method, we propose to map tokens into a more discriminate lowdimensional subspace associated with discrete labels using a learned mapping , and find a hyperplane to estimate . In this scenario, the goal is to calculate the bases of the subspace. The data correlation in the subspace is consistent with that in the original space so that the estimated will not mislead the weak learners. In Section 5, we demonstrate that this goal can be achieved using sparse representation techniques.
SRF Node Splitting Behaviors During the training stage of SRF, tokens with various patterns are chaotic in the top level nodes, and weak learners produce coarse splitting results; while at the bottom level nodes, the splitting becomes more subtle. For example, suppose , the weak learner in the root node intends to split foreground and background tokens into the left and right subtrees, respectively. The top level nodes tend to split the straight line and broken line patterns, whereas weak learners tend to split degree and degree straight lines in the bottom level nodes, in which patterns are more pure. Considering this property, we propose a novel dynamic structured label transfer approach to estimate the structured labels for utokens.
4.3 Dynamic structured label transfer
Because it is challenging to directly estimate highdimensional structured labels for utokens, in our method, we transfer existing structured labels of ltokens to utokens. An important concern is to prevent inaccurate structured label estimation for utokens from destroying the learning of SRF. Suppose we have mapped tokens in node into a lowdimensional subspace using , we first search for a maxmargin hyperplane using a linear wighted binary support vector machine trained over ltokens with discrete labels in this node (so the number of discrete labels in our case). In this way, for an utoken , we can estimate its discrete label (i.e., ) through .
To estimate its structured label, we adopt the nearest search to find the best match in the candidate pool of ltokens with the same discrete label as . The structured label transfer function is defined as
(4) 
where is the cosine metric. In Section 5.2, we will see that generates very sparse lowdimensional representation for tokens so that the steps of finding the hyperplane and performing the nearest search are computationally efficient. Finally we can easily map utokens associated with structure labels back to their original space, and all tokens in node are propagated to child nodes.
The bruteforce searching at the top level nodes may yield inaccurate structured label estimation for utokens due to the chaotic patterns and coarse discrete labels. In addition, it might lead to unnecessary computations because of redundant structured labels in one class. To tackle these problems, we dynamically update the transferred structured labels during the training of SRF. At the root node, we transfer initial structured labels to utokens. As the tree goes deeper, weak learners gradually purify the candidate pool by decreasing the token volume and pattern variety. Therefore, the dynamically estimated structured labels of utokens will become more reliable in the bottom level nodes. Since the number of utokens is much larger than that of ltokens, some bottom level nodes might contain less or no ltokens. We treat utokens with high probability as ltokens when a node does not contain enough ltokens, less than in our case. In addition, we randomly pick a subset instead of the entire candidate pool to perform the nearest search in each individual node.
5 Sparse Token Representation
This section discusses the approach of finding the subspace mapping mentioned in Section 4.2. We first describe how to learn a token dictionary to construct the bases of the lowdimensional subspace, and then present a novel and fast sparse coding algorithm to accelerate the computation of .
5.1 Sparse token dictionary
Sparse representation has been proven to be effective to represent local image patches [24, 35, 21]. In our method, we pursue a compact set of the lowlevel structural primitives to describe contour patterns by learning a token dictionary. Specifically, any token can be represented by a linear combination of bases in a dictionary containing bases (). A sparse code is calculated to select the bases. Given a set of training tokens , the dictionary , as well as the associated , is learned by minimizing the reconstruction error [1]:
(5) 
(6) 
(7) 
where is norm, , to ensure that a sparse code only has nonzero entries. is the Frobenius norm. Inspired by [24], we adopt MIKSVD, a variant of the popular KSVD, to solve Eqn. (7) for better sparse reconstruction [5]. However, the dictionary is learned in an unsupervised manner, so it is not taskspecific and its learning performance can be influenced by large appearance variances in tokens from different images. In particular, we observe that the cluttered background tokens (i.e., tokens contain no annotated contour inside) may cause unnecessary false positives. To ameliorate these problems, we introduce the prior label knowledge as an extra feature in the dictionary to improve its learning performance.
Specifically, for an RGB token, we apply (Section 4.1) to generate its corresponding contours as the prior label knowledge, i.e., a patch with detected contours. In this way, the new featured token will have channels, which is represented as , where is the contour channel corresponding to the RGB channels (, , and ). We model background with and foreground with , respectively. Figure 3 illustrates how the dictionary represents the structure in tokens.
In our method, both utokens and ltokens are used as the training data for dictionary learning, which are sampled from unlabeled and labeled images, respectively. Foreground tokens are extracted if they straddle any contours indicated by the ground truth. The rest are background tokens. Because the ground truth of utokens is unavailable, we use the probability outputs of to help us sample high confident foreground and background utokens.
5.2 Subspace mapping using fast sparse coding
As we mentioned in Section 4.2, we use the mapping function to provide a compact and discriminative lowdimensional representation for a token . Given a learned dictionary in Section 5.1, the subspace representation of is defined as
(8) 
It is wellknown that solving Eqn. (8) is NPhard (norm). One typical algorithm to solve this problem is orthogonal matching pursuit (OMP) [26]. Many other algorithms often relax it to the tractable norm minimization problem. Yang et al. [37] show that norm provides better classification meaningful information than norm. The main reason is that, unlike norm that only selects the dictionary bases, norm also assigns weights to the selected bases to determine their contributions. Usually, high weights are often assigned to the bases similar to the target data [34]. In this paper, we propose a novel and fast sparse coding algorithm, which is scalable to a large number of target data.
Based on the above observation, we approximate the computation of sparse coding by two steps: 1) basis selection, which measures the similarity score of each basis to the target data individually and then selects the bases with large scores; 2) reconstruction error minimization, which aims to assign weights to selected bases. The details are summarized in Algorithm 1. Given a target data, we first compute a sequence of scores with respect to each basis (steps 1 to 3). Next we select bases associated with the first largest scores to construct a small size dictionary (step 4). Then we solve a constrained leastsquares problem to obtain the coefficient and assign weights to the selected bases (step 5). The regularization parameter is set to a small value, . Finally, the value of is mapped to as the final sparse code (steps 6 to 7).
Unlike many existing methods, our proposed algorithm decouples the sparse coding optimization to problems with analytical solutions, which do not need any iteration. Therefore, our algorithm is faster than others that directly solve norm or norm problems.
6 Experimental Results
In this section, we first evaluate the performance of the proposed method for contour detection on two public datasets, and then compare the efficiency of the proposed sparse coding solver with several stateofthearts.
6.1 Contour detection performance
We test the proposed approach on the Berkeley Segmentation Dataset and Benchmark (BSDS500) [2] and the NYUD Depth (NYUD) V2 Dataset [30]. We measure the contour detection accuracy using several criteria: Fmeasures with fixed optimal threshold (ODS) and perimage threshold (OIS), precision/recall (PR) curves, and average precision (AP) [2]. In all experiments, we use tokens with a size of based on an observation that a larger size (e.g., ) will significantly reduce the sparse representation performance, while a smaller size (e.g., ) can hardly represent rich patterns. This token size is also adopted by SRF to train trees. The skeleton operation is applied to the output contour images of the proposed SemiContour using the nonmaximal suppression for quantitative evaluation.
Training Image Settings: We randomly split training images into a labeled set and an unlabeled set^{1}^{1}1To compensate for possible insufficient foreground ltokens, we duplicated images in the labeled set by histogram matching.. We use a fair and relative large number of training tokens for all comparative methods, i.e., for background and foreground. Tokens (including ltokens and utokens) are evenly sampled from each image in both sets. is trained over the labeled set to sample utokens from the unlabeled set. Three tests are performed and average accuracies are reported as the final results.
ODS  OIS  AP  
Human  .80  .80   
Canny [7]  .60  .64  .58 
FelzHutt [15]  .61  .64  .56 
Normalized Cuts [10]  .64  .68  .48 
Mean Shift [9]  .64  .68  .56 
Gb [19] 
.69  .72  .72 
gPbowtucm [2] 
.73  .76  .70 
ISCRA [28]   (.72)   (.75)   (.46) 
Sketch Tokens [20]  .64(.73)  .66(.75)  .58(.78) 
SCG [35] 
.73(.74)  .75(.76)  .76(.77) 
SE [13]  .66(.74)  .68(.76)  .69(.78) 
SEVar [12]  .69(.75)  .72(.77)  .74(.80) 
SemiContour 
.73  .75  .78 
SemiContourSeg  .74  .77  .76 

# of Labeled Images  ODS  OIS  AP 

3  .728  .747  .776 
10  .732  .753  .782 
20  .734  .755  .784 
50  .736  .758  .787 
BSDS500: BSDS500 [2] has been widely used as a benchmark for contour detection methods, including training, validation, and test images. Our method uses labeled training images in the labeled set; the rest images are included in the unlabeled set. Table 1 and Figure 5(a) compare our method with several other methods^{2}^{2}2We carefully check every step when retraining their model and keep the other parameters default..
In order to compare with supervised methods, we provide the performance with as well as with all labeled training images (comparative results with 200 images are obtained from the authors’ original papers). As we can see, the proposed SemiContour method produces similar results as supervised methods using 200 training images, but outperforms all the unsupervised methods and supervised methods with labeled training images. The performance of all supervised approaches except SCG significantly decreases with only labeled training images. Specifically, compared with the SEVar (an improved version of SE), our method exhibits point higher ODS and point higher AP. The gPbowtucm and SCG, which merely replaces the local contrast estimation of the former that does not rely on many labeled images, exhibit close performance to ours, but our PR curve still shows higher precision with the same recall rates. In terms of efficiency, our method is hundreds of times faster than these two. For a image, SemiContour runs within s, while gPbowtucm and SCG require s and s, respectively. Several qualitative example results are shown in Figure 4. In addition, we also show the experimental results of the proposed method using a different number of labeled images in Table 2.
We find that the estimated structured labels of utokens sometimes might cause skewed localization at exact contour position. However, our method is less likely to miss real contours, as shown in Figure 6. Precise contour localization is necessary but less important in applications such as object detection and scene understanding.
\adl@mkpreamc—\@addtopreamble\@arstrut\@preamble  \adl@mkpreamc\@addtopreamble\@arstrut\@preamble  
ODS  OIS  ODS  OIS  
redspectral [32]  .56  .62  .81  .85 
gPbowtucm [2]  .59  .65  .83  .86 
DC [14]  .58  .63  .82  .85 
SemiContourSeg  .59  .64  .83  .85 
We also test the performance of using the proposed SemiCoutour method for segmentation. After contour detections using SemiContour, multiscaleUCM [3] is applied onto the generated contour images to generate the segmentation results (denoted as SemiContourSeg in our experiments). We compare SemiContourSeg with several stateoftheart methods. The results are shown in Figure 4 and Table 3. SemiContourSeg also improves the contour detection performance as shown in Table 1.
NYUD: NYUD contains 1449 RGBD images. We follow [13] to perform the experiment setup. The dataset is splited into 381 training, 414 validation, and 654 testing images. To conduct RGBD contour detection, we treat the depth image as an extra feature channel, and thus the dictionary basis has five channels, and the feature channels for SRF are increased by 11 [13]. We use images in the labeled set with the rest images in the unlabeled set. The comparison results are shown in Table 4 and Figure 5(b). We can observe that SemiContour with only 10 training images produces superior results than supervised methods trained with 10 images, and also provides competitive results with supervised methods trained using all 381 labeled data.
ODS  OIS  AP  

gPbowtucm [2]  .63  .66  .56 
Siberman [30]   (.65)   (.66)   (.29) 
SEVar [13] 
.66(.69)  .68(.71)  .68(.72) 
SemiContour  .68  .70  .69 
ODS  OIS  AP  
NYUD/BSDS 
SEVar  .73  .74  .77 
SemiContour  .73  .75  .78  
BSDS/NYUD  SEVar  .64  .66  .63 
SemiContour  .65  .66  .63 
6.2 Crossdataset generalization results
One advantage of the proposed SemiContour is that it can improve the generalization ability of contour detection by incorporating unlabeled data from the target dataset domain. To validate this, we perform a crossdataset experiment on BSDS500 and NYUD. The two datasets exhibit significant visual variations. NYUD contains various indoor scenes under different lighting conditions, and BSDS500 contains outdoor scenes. We use one dataset as the labeled set and another as the unlabeled set. The rest experiment setup is the same as SEVar [12]. We compare SemiContour with SEVar in Table 5^{3}^{3}3Later on, we conducted an extra experiment to augment 200 labeled training images of BSDS with 100 unlabeled images of NYUD to improve the testing results of BSDS. Our method achieves (.752ODS, .786OIS, .792AP), compared with SEVar’s results (.743ODS, .763OIS, .788AP), both with totally 1 million training tokens..
These experiments validate the strong generalization ability and the robustness of the proposed SemiContour method, which indicates a strong noise resistance of the method even when we incorporate utokens from a different image domain.
6.3 Efficiency of the proposed fast sparse coding
The running time of our novel sparse coding algorithm is determined by the steps of basis selection and reconstruction error minimization. The former step needs to compute scores and to select the bases, and the latter reconstruction error minimization step needs with a dictionary. Therefore, the total time complexity is , usually because is much smaller than in practice.
We compare our fast sparse coding solver with several algorithms in Figure 7. Most of existing sparse coding algorithms suffer from computational expensive iterations. We only choose several popular ones to compare with our algorithm, including OMP [26], BatchOMP [26] and its faster version (BatchOMPfast). All of these comparative algorithms contain highly optimized implementations and our algorithm is a simple Matlab implementation. We observe that our fast sparse coding algorithm obtains the same results as the others in terms of the final contour detection accuracy, but it is significantly faster than the others. Since the computation of each target data is independent, an additional benefit is that the proposed algorithm can be easily parallelized. All algorithms are tested on an Intel i7@3.60GHz6 cores and 32GB RAM machine.
7 Conclusions
In this paper, we present a novel semisupervised structured ensemble learning method for contour detection. Specifically, our approach trains an effective contour detector based on structured random forests (SRF). We take advantage of unlabeled data to conduct better node splitting of SRF using sparse representation techniques, whose procedures are embedded in the overall SRF training. In order to increase the scalability of sparse coding to extensive target data, we have proposed a fast and robust sparse coding algorithm. Compared with many existing literatures, our method provides superior testing results.
References
 [1] M. Aharon, M. Elad, and A. Bruckstein. Ksvd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing, 54(11):4311–4322, 2006.
 [2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. PAMI, 33(5):898–916, 2011.
 [3] P. Arbelaez, J. PontTuset, J. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping. In CVPR, pages 328–335, 2014.
 [4] G. Bertasius, J. Shi, and L. Torresani. Deepedge: A multiscale bifurcated deep network for topdown contour detection. arXiv preprint arXiv:1412.1123, 2014.
 [5] L. Bo, X. Ren, and D. Fox. Multipath sparse coding using hierarchical matching pursuit. In CVPR, pages 660–667, 2013.
 [6] L. Breiman. Random forests. Machine learning, 45:5–32, 2001.
 [7] J. Canny. A computational approach to edge detection. PAMI, (6):679–698, 1986.
 [8] O. Chapelle, B. Schölkopf, A. Zien, et al. Semisupervised learning. 2006.
 [9] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. PAMI, 24(5):603–619, 2002.
 [10] T. Cour, F. Benezit, and J. Shi. Spectral segmentation with multiscale graph decomposition. In CVPR, volume 2, pages 1124–1131, 2005.
 [11] A. Criminisi, J. Shotton, and E. Konukoglu. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semisupervised learning, volume 7. 2012.
 [12] P. Dollár and C. Zitnick. Fast edge detection using structured forests. PAMI, 2015.
 [13] P. Dollár and C. L. Zitnick. Structured forests for fast edge detection. In ICCV, pages 1841–1848, 2013.
 [14] M. Donoser and D. Schmalstieg. Discretecontinuous gradient orientation estimation for faster image segmentation. In CVPR, pages 3158–3165, 2014.
 [15] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graphbased image segmentation. IJCV, 59(2):167–181, 2004.
 [16] Y. Ganin and V. Lempitsky. N^ 4fields: Neural network nearest neighbor fields for image transforms. In ACCV, pages 536–551. 2014.
 [17] P. Kontschieder, S. R. Bulo, H. Bischof, and M. Pelillo. Structured classlabels in random forests for semantic image labelling. In ICCV, pages 2190–2197, 2011.
 [18] C. Leistner, A. Saffari, J. Santner, and H. Bischof. Semisupervised random forests. In ICCV, pages 506–513, 2009.
 [19] M. Leordeanu, R. Sukthankar, and C. Sminchisescu. Generalized boundaries from multiple image interpretations. PAMI, 36(7):1312–1324, 2014.
 [20] J. J. Lim, C. L. Zitnick, and P. Dollár. Sketch tokens: A learned midlevel representation for contour and object detection. In CVPR, pages 3158–3165, 2013.
 [21] B. Liu, J. Huang, L. Yang, and C. Kulikowsk. Robust tracking using local sparse appearance model and kselection. In CVPR, pages 1313–1320, 2011.
 [22] F. Liu, F. Xing, Z. Zhang, M. Mcgough, and L. Yang. Robust muscle cell quantification using structured edge detection and hierarchical segmentation. In MICCAI, pages 324–331, 2015.
 [23] X. Liu, M. Song, D. Tao, Z. Liu, L. Zhang, C. Chen, and J. Bu. Semisupervised node splitting for random forest construction. In CVPR, pages 492–499, 2013.
 [24] M. Maire, X. Y. Stella, and P. Perona. Reconstructive sparse code transfer for contour detection and semantic labeling. In ACCV, pages 273–287. 2014.
 [25] A. Myers, C. L. Teo, C. Fermüller, and Y. Aloimonos. Affordance detection of tool parts from geometric features. In ICRA, 2015.
 [26] Y. C. Pati, R. Rezaiifar, and P. Krishnaprasad. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar Conference on Signals, Systems and Computers, pages 40–44, 1993.
 [27] X. Ren, C. C. Fowlkes, and J. Malik. Figure/ground assignment in natural images. In ECCV, pages 614–627. 2006.
 [28] Z. Ren and G. Shakhnarovich. Image segmentation by cascaded region agglomeration. In CVPR, pages 2011–2018, 2013.
 [29] W. Shen, X. Wang, Y. Wang, X. Bai, and Z. Zhang. Deepcontour: A deep convolutional feature learned by positivesharing loss for contour detection. In CVPR, pages 3982–3991, 2015.
 [30] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, pages 746–760. 2012.
 [31] F. Tang, S. Brennan, Q. Zhao, and H. Tao. Cotracking using semisupervised support vector machines. In ICCV, pages 1–8, 2007.
 [32] C. J. Taylor. Towards fast and accurate segmentation. In CVPR, pages 1916–1922, 2013.
 [33] C. L. Teo, C. Fermüller, and Y. Aloimonos. Fast 2d border ownership assignment. In CVPR, pages 5117–5125, 2015.
 [34] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Localityconstrained linear coding for image classification. In CVPR, pages 3360–3367, 2010.
 [35] R. Xiaofeng and L. Bo. Discriminatively trained sparse code gradients for contour detection. In Advances in NIPS, pages 584–592, 2012.
 [36] S. Xie and Z. Tu. Holisticallynested edge detection. arXiv preprint arXiv:1504.06375, 2015.
 [37] J. Yang, L. Zhang, Y. Xu, and J.y. Yang. Beyond sparsity: The role of l 1optimizer in pattern classification. Pattern Recognition, 45(3):1104–1118, 2012.