Detecting Drivable Area for Self-driving Cars: An Unsupervised Approach

Detecting Drivable Area for Self-driving Cars:
An Unsupervised Approach

Ziyi Liu, Siyu Yu, Xiao Wang and Nanning Zheng1 Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shannxi, P.R.China
National Engineering Laboratory for Visual Information Processing and Applications,
Xi’an Jiaotong University, Xi’an, Shannxi, P.R.China
Correspondence : 1nnzheng@mail.xjtu.edu.cn
Abstract

It has been well recognized that detecting drivable area is central to self-driving cars. Most of existing methods attempt to locate road surface by using lane line, thereby restricting to drivable area on which have a clear lane mark. This paper proposes an unsupervised approach for detecting drivable area utilizing both image data from a monocular camera and point cloud data from a 3D-LIDAR scanner. Our approach locates initial drivable areas based on a ”direction ray map” obtained by image-LIDAR data fusion. Besides, a fusion of the feature level is also applied for more robust performance. Once the initial drivable areas are described by different features, the feature fusion problem is formulated as a Markov network and a belief propagation algorithm is developed to perform the model inference. Our approach is unsupervised and avoids common hypothesis, yet gets state-of-the-art results on ROAD-KITTI benchmark. Experiments show that our unsupervised approach is efficient and robust for detecting drivable area for self-driving cars.

I Introduction


Fig. 1: The framework of our proposed unsupervised approach.

In the field of self-driving cars, road detection is a crucial requirement, and the topic has been attracting considerable research interest in recent times. Moreover, significant achievements on road detection have been proposed in the literature[1]. Though there exist some algorithms on road detection for well-marked roads based on sample-training, unsupervised road detection for unlabeled roads in inner-city and rural areas still remains a challenge on account of the high variability of traffic scene and light conditions. So far, to the best of our knowledge, there exists no robust solution to solve this challenge. However, taking cues from human driving behavior, distinguishing drivable area from non-drivable area is a priority for humans when they drive. Then, roads are found and driving decisions are made based on the drivable area.

Inspired by human driving behavior, we proposed an unsupervised approach for detecting drivable area by fusing image data and LIDAR points, as shown in Fig.1. By combining image coordinate frame with LIDAR coordinate frame, a Delaunay triangulation [2] is generated to describe the spatial-relationship between points and utilized to classify obstacle points. Then an initial location of the drivable area is obtained by the fusion of ”direction ray map” and image superpixels, which serves as priori knowledge and narrows the range of detection, as detailed in Section III. In Section IV, features used to describe the final drivable area are learned autonomously based on that initial location. In Section V, a feature fusion step is implemented leveraging a Markov network through belief propagation, and the final results are obtained.

In the experiment step, our approach is tested on ROAD-KITTI benchmark[3]. Comparisons have been made with similar fusion approaches and ours gets state-of-the-art results without training or making assumptions about shape or height, which demonstrates the robustness and generalization ability of our approach.

Ii Related Work

Reliably detecting the road areas is a key requirement in self-driving cars. In resent years, many approaches have been proposed to address this challenge. The approaches mainly differ from each other based on the type of sensors used to get data, such as, monocular camera[4], binocular camera[5] , LIDAR[6] and the fusion of multi-sensor[7].

For monocular camera based approaches, most road detection algorithms use cues such as color [8] and lane markings[9]. To cope with illumination varieties and shadows, different color spaces have been introduced [10][11][12]. Besides, leveraging deep learning, monocular vision based methods can achieve unprecedented results[13][14][15]. However, unlike other vision conceptions, such as cat or dog, the conception of a road cannot be defined by appearance alone. A region that is regarded as a road depends more on its physical attributes. Therefore, approaches only relying on monocular vision are not robust enough for real applications.

With the advent of LIDAR sensors, which can measure distances accurately, many LIDAR-based road detection approaches have been developed. Such approaches use the LIDAR points’ spatial information to analyse the scene and regard flat areas as roads. But due to the sparsity of LIDAR points, it’s hard to analyse the details between points. Besides, abandoning image information will increase the difficulty of points classification.

Considering the above drawbacks of the above methods, we propose an unsupervised detection approach for drivable area by fusing image data and LIDAR points. Compared with other detection methods, the superiorities of our approach reflect in three aspects. First, it dose not need strong hypothesis, training steps or manually labelled data, which ensures the generalization ability of our approach. Second, by fusing LIDAR and monocular camera, our approach can learn probabilistic models in a self-learning manner, thereby making it robust to complex road scenarios and fickle illumination. Finally, superpixels are used as basic processing units instead of pixels, and they have been found to be an easy and efficient way to combine sparse LIDAR points with image data. Therefore, we consider our unsupervised approach as an efficient and robust way for drivable areas detection for self-driving cars.

Iii Preprocessing and Data Fusion

Iii-a Image Processing in Superpixel Scale

In our approach, superpixels are considered as elementary units instead of pixels motivated by the observation that superpixels and LIDAR points are complementary. On the one hand, superpixels are dense and involve color information that LIDAR sensor cannot capture; on the other hand, LIDAR points reflect depth information that monocular camera cannot obtain. Owing to the promotion of superpixel segmentation methods, image processing in superpixel scale can cut down the computational and memory consumptions without much loss in accuracy. In our approach, ”Sticky Edge Adhesive Superpixels” are detected and used[16][17][18]. Without edge term, these superpixels are computed using an iterative approach like SLIC[19]; with edge term added, the superpixels snap to edges, resulting in higher quality boundaries. So it can be assumed that superpixels can adhere object boundary well. Therefore, it’s a reasonable choice to use superpixels to shape the initial location of drivable areas, which will be detailed in IV.

Besides, superpixels can be used to calculate local statistics so that results can be more robust. Thus, in our approach, superpixels are considered as elementary units instead of pixels, and assist in shaping the initial drivable area and accelerating feature extraction processes.

Iii-B Illumination Invariant Color Space

To acquire color feature regardless of lighting condition or weather, RGB color images () are transformed into an illumination invariant color space, noted as . As presented in [12], a 3-channel image is converted into a 1-channel image with one parameter related to peak spectral responses of the camera:

(1)

where ,, are wavelengths and the is obtained following:

(2)

where is the pixel value of in and , , are , , pixel values of in , respectively. We set to 0.4706 as[12] suggested.

Iii-C Obstacle Classification via Data Fusion


Fig. 2: The whole process of obstacle classification, which contains three steps: data fusion, finding corresponding surface and measuring point¡¯s flatness. The red arrows indicate the main information obtained by each step.

The whole process of this subsection is shown as Fig.2. To fuse LIDAR points with image pixels, the projection of 3D points is employed as presented in [20]. After the alignment, the LIDAR points set noted as is gained, where . is ’s LIDAR coordinate and is its image coordinate.

The objective of obstacle classification step is to find the mapping relations . indicates that is an obstacle point while indicates that is a non-obstacle point (as shown in Fig.3). It is assumed that whether is an obstacle point or not merely depends on how flat its corresponding surface is in the physical world, so the problem is broken up into two sub-problems: how to define the corresponding surface of and how to measure the its flatness, as shown in Fig.2.

To define the corresponding surface of , Delaunay triangulation is utilized to establish the spatial-relationships among , because it has properties that each vertex has on average six surrounding triangles in the plane, and nearest neighbor graph is a subgraph of the Delaunay triangulation. The graph is generated as proposed in [21]. For each , its image coordinate is used in a planar Delaunay triangulation to generate an undirected graph . represents the set of edges which defines the relationships among . The edge is discarded if it dose not satisfy :

(3)

where is the Euclidean distance of and .

Then, the ”corresponding surfaces” are the surfaces (triangles) determined by , where is the set of ’s neighbor points. Then, the flatness of ’s corresponding surfaces can be measured by calculating the normal vectors of them, and is obtained by averaging the normal vectors of ’s neighboring triangles. Finally, is gained:

(4)

where is the minimum deviation angle from horizon of an obstacle point.

(a)
(b)
Fig. 3: The classification result of obstacle and non-obstacle. (a) is the original image. (b) shows obstacle points (red dots) and non-obstacle points (blue dots). It can be seen that LIDAR points are successfully classified.

Iv Drivable Areas Detection

To locate the drivable area, we first get an initial location of it and then truing it by features, which is a coarse to fine process.

Iv-a Locating Initial Drivable Areas

Once the classification step is completed, the corresponding drivable areas are determined. To locate initial drivable areas, we first get the ”direction ray map ()”, and then fuse it with superpixels. is obtained as shown in Algorithm.1. First, polar coordinate transformation is employed to , taking the middle bottom pixel of as the origin, noted as . Then, is restructured as where . means a LIDAR point whose transformed image coordinate is in the -th angle range. Because is sparse in laser coordinate frame, two problems emerge: the first is how to transform the sparse rays into dense pixels; the second is how to overcome the ”ray leak” problem shown in Fig.4.

As for the first problem, increasing is a natural solution, meanwhile, it aggravates the ”ray leak” problem. Therefore, is fused with superpixels to address this problem.

For the second problem, it should be noted that the width of a vehicle is not ignorable. That is, whether an area is drivable or not depends on the flatness of the area, as well as its width. Therefore, a minimum length filtering method is used to filter the leaked rays, and the final is shown in Fig.4(c).

Once has been obtained, the initial drivable area is generated by fusing with superpixels. Essentially, it is a set of superpixels noted as , and the set of LIDAR points within is noted as .

0:    The set of LIDAR points ;
0:    Direction Ray Map,
1:  Initial with the size of and zeros elements
2:  for  to  do
3:     Find obstacle set
4:     if  then
5:        
6:     else
7:        
8:     end if
9:     Line point with point in
10:  end for
Algorithm 1 Generating Direction Ray Map.
(a)
(b)
(c)
Fig. 4: ”Ray leak” problem. (a) shows the ”ray leak” problem above the obstacle classification result. Every white line in (a) is perpendicular to a ray and the middle point of white line is the end point of a ray. (b) shows all rays before filtered with white lines. (c) is the result of minimum length filtering
0:    The set of LIDAR points ;
0:    Level feature for every point
1:  Initial with the zero ()
2:  for  to  do
3:     for  to  do
4:        if  then
5:           for  to  do
6:              
7:           end for
8:        end if
9:     end for
10:  end for
Algorithm 2 Getting ”Level” Feature.

Iv-B Feature Extraction Based on Initial Drivable Area

Once is obtained, four features (”level” feature, normal feature, color feature, strength feature) are all calculated superpixel by superpixel to describe .

Iv-B1 ”Level” Feature

Our method focuses on detecting the drivable area. Therefore, a feature called ”level” is proposed to describe the drivable degree, and Algorithm.2 shows steps to calculate it. LIDAR points in are arranged in accordance with the distances to , that is, every point in satisfies:

(5)

Then, the ”level” feature of superpixel is defined as:

(6)

Because corresponds to , a small means a high drivable degree of the relevant area. A probability map is generated in the ”level” feature space, where the probability distribution is represented by a Gaussian-like model with parameters and as:

(7)

where is the probability that belongs to the drivable area in the ”level” feature space. The parameter and can be calculated throughout in a self-learning manner without training.

Iv-B2 Normal Feature

The normal feature of each superpixel is designed as the minimum value of the among . As mentioned in Section III, represents the angle deviation of the relevant spatial triangle from the horizon. Thus, a larger means a higher drivable degree of . Namely, the larger is, the more flat the area will be. Similar to (7), a Gaussian-like model with parameters and is built as:

(8)

where is the probability that belongs to the drivable area in the normal feature space. The estimation of and is the same as and mentioned above. Similarly, no manual setting or training is needed.

Iv-B3 Color Feature

The color feature of is calculated using . A Gaussian model with parameters and is built as:

(9)

where is the probability that belongs to the drivable area in the color feature space. And is the color of in illumination invariant color space. and are calculated throughout like and .

Iv-B4 Strength Feature

The number of ray points within superpixel is counted to measure the smoothness of the relevant area, and is defined as strength feature . Different from above Gaussian models, the probability that belongs to the drivable area is calculated as:

(10)

where presents the Euclidean distance between and in image coordinate frame, and presents the area of .

V Feature Fusion via Belief Prorogation


Fig. 5: The Markov network used to model the positional relationship between adjacent superpixels. A circle node represents the state of a superpixel and a square node represents the observation of the corresponding superpixel.

Once all features are obtained as detailed in Section IV, these features are fused to get the final results. The most straightforward fusion method is using Bayesian rule to get the maximum posteriori probability of each superpixel that belongs to the drivable area. But this fusion method ignores the positional relationship among superpixels which is valuable in this task. To model this kind of relationship between adjacent superpixels, a Markov network is used. As shown in Fig.5, a circle node represents the state of a superpixel and a square node represents the observation of the corresponding superpixel. An undirected line models relationship between adjacent superpixels and is calculated by potential compatibility function . A directed line models the observation process.

To perform the inference of the model, the belief propagation algorithm is used[22]. The local message passing from node to node is:

(11)

where is the set of of ’s adjacent superpixels except (green nodes in Fig.5). The marginal posterior probability of can be obtained by

(12)

Then the fusion problem is formulated as designing the likehood function and potential compatibility function . can be obtained by:

(13)

Noticing that represents the closeness of and , so is defined as:

(14)

which is similar to (8).

With (13) and (14), the is calculated iteratively following (11) and the fusion result is then obtained through (12).

Vi Experimental Results And Discussion

In order to validate our approach, we test it on the ROAD-KITTI benchmark [3]. The result is evaluated in BEV with the metrics max F-measure (), average precision (), precision (), recall (), false positive rate (), and false negative rate () for three datasets: Urban Marked (UM), Urban Multiple Marked (UMM), Urban Unmarked (UU). To show the priority of the proposed algorithm (”Ours Test” in tables below), we compare it with HybridCRF, MixedCRF and LidarHisto, which are the top three methods among LIDAR involved methods from ROAD-KITTI benchmark’s websit111 http://www.cvlibs.net/datasets/kitti/eval_road.php . Besides, our experimental results on training set are also listed below (”Ours Train” in tables below) only to demonstrate that our approach is training-free.

As TABLE I II and III show, our approach achieves the highest in set UM and UU indicating that it is robust for different situations. We also obtain the best score in and in UMM and UU, which indicates that the road areas are covered well by our results. TABLE IV lists the performance across the three datasets and our approach obtains the best , and . However, our value is higher compared with other methods. This can be explained by the fact that the ground truth represents road areas, while our approach detects drivable areas, which contains road as well as other flat area (lawn, transition zones between sidewalk and road). In practical application, self-driving cars should choose such flat areas as candidate road for emergency, such as avoiding sudden turning vehicles. Above all, our method is unsupervised and still it gets such competitive performance compared with supervised methods.

UM MaxF AP PRE REC FPR FNR
HybridCRF 90.99 85.26 90.65 91.33 4.29 8.67
MixedCRF 90.83 83.84 89.09 92.64 5.17 7.36
LidarHisto 89.87 83.03 91.28 88.49 3.85 11.51
Ours Test 84.96 86.51 79.94 90.65 10.37 9.35
Ours Train 86.34 88.17 82.29 90.80 8.98 9.20
TABLE I: Comparison on UM (BEV).
UMM MaxF AP PRE REC FPR FNR
HybridCRF 91.95 86.44 94.01 89.98 6.30 10.02
MixedCRF 92.29 90.06 93.83 90.80 6.56 9.20
LidarHisto 93.32 93.19 95.39 91.34 4.85 8.66
Ours Test 92.22 92.23 91.70 92.74 9.23 7.26
Ours Train 92.74 93.87 92.51 92.96 8.20 7.04
TABLE II: Comparison on UMM (BEV).
UU MaxF AP PRE REC FPR FNR
HybridCRF 88.53 80.79 86.41 90.76 4.65 9.24
MixedCRF 82.79 69.11 79.01 86.96 7.53 13.04
LidarHisto 86.55 81.13 90.71 82.75 2.76 17.25
Ours Test 83.48 84.75 77.19 90.87 8.75 9.13
Ours Train 83.20 84.97 77.45 89.86 9.31 10.14
TABLE III: Comparison on UU (BEV).
URBAN MaxF AP PRE REC FPR FNR
HybridCRF 90.81 86.01 91.05 90.57 4.90 9.43
M-CRF 89.46 83.70 88.52 90.42 6.46 9.59
LidarHisto 90.67 84.79 93.06 88.41 3.63 11.59
Ours Test 87.72 87.84 83.97 91.83 9.65 8.17
Ours Train 87.43 89.00 84.08 91.21 8.83 8.79
TABLE IV: Comparison on URBAN (BEV).

Besides, in order to testify how much the feature fusion step boosts the performance, we compare the final results with results from single feature space as well as . All the results in TABLE V are obtained from experiments on training set. As TABLE V shows, feature fusion achieves a significant boost in , and . It should be noticed that the (”Initial” in the table below) performs outstandingly in and with a similar with Baseline, so that it’s reasonable to use to estimate parameters as described in Section IV.

URBAN MaxF AP PRE REC FPR FNR
Baseline 77.95 82.47 72.83 83.88 20.03 16.11
Initial 81.23 67.64 70.75 96.25 21.35 3.75
Color 85.35 79.76 78.93 93.43 12.81 6.57
Strength 84.16 86.37 79.36 89.70 12.76 10.30
Level 86.14 76.62 80.58 92.96 11.65 7.04
Normal 87.04 79.23 83.48 91.28 8.98 8.70
Fusion 87.43 89.00 84.08 91.21 8.83 8.79
TABLE V: Comparison on URBAN Training Set (BEV).

Vii Conclusion And Future Work

In this paper, an unsupervised approach for detecting drivable areas is proposed that fuses four features via belief prorogation. Our approach combines both pixel information and depth information to overcome the drawbacks of using single observation when faced with highly various traffic scene and light conditions. Without the need of strong hypothesis, training steps or manually labelled data, our method is proved to be a general approach for self-driving cars. Besides, the experiments on the ROAD-KITTI benchmark verified the efficiency and robustness of our approach. In future work, we will first focus on separating road areas from the drivable areas and locating candidate drivable areas for emergency. Besides, a more suitable dataset with hierarchical labels for drivable area is required. Finally, we intend to realize a FPGA implementation of our approach to achieve a real-time application for self-driving cars.

Acknowledgment

This research was partially supported by the National Natural Science Foundation of China (No. 61627811, L1522023), the Programme of Introducing Talents of Discipline to University (No. B13043)

References

  • [1] A. Bar Hillel, R. Lerner, D. Levi, and G. Raz, “Recent progress in road and lane detection: a survey,” Machine vision and applications, pp. 1–19, 2014.
  • [2] D.-T. Lee and B. J. Schachter, “Two algorithms for constructing a delaunay triangulation,” International Journal of Computer & Information Sciences, vol. 9, no. 3, pp. 219–242, 1980.
  • [3] J. Fritsch, T. Kuehnl, and A. Geiger, “A new performance measure and evaluation benchmark for road detection algorithms,” in International Conference on Intelligent Transportation Systems (ITSC), 2013.
  • [4] T. Kuehnl, F. Kummert, and J. Fritsch, “Spatial ray features for real-time ego-lane extraction,” in Proc. IEEE Intelligent Transportation Systems, 2012.
  • [5] H. Badino, U. Franke, and R. Mester, “Free space computation using stochastic occupancy grids and dynamic programming,” in Workshop on Dynamical Vision, ICCV, Rio de Janeiro, Brazil, vol. 20.   Citeseer, 2007.
  • [6] C. Tongtong, D. Bin, L. Daxue, Z. Bo, and L. Qixu, “3d lidar-based ground segmentation,” in Pattern Recognition (ACPR), 2011 First Asian Conference on.   IEEE, 2011, pp. 446–450.
  • [7] L. Xiao, B. Dai, D. Liu, T. Hu, and T. Wu, “Crf based road detection with multi-sensor fusion,” in Intelligent Vehicles Symposium (IV), 2015 IEEE.   IEEE, 2015, pp. 192–198.
  • [8] A. Broggi and S. Berte, “Vision-based road detection in automotive systems: A real-time expectation-driven approach,” Journal of Artificial Intelligence Research, vol. 3, pp. 325–348, 1995.
  • [9] Z. Nan, P. Wei, L. Xu, and N. Zheng, “Efficient lane boundary detection with spatial-temporal knowledge filtering,” Sensors, vol. 16, no. 8, p. 1276, 2016.
  • [10] U. L. Jau, C. S. Teh, and G. W. Ng, “A comparison of rgb and hsi color segmentation in real-time video images: A preliminary study on road sign detection,” in Information Technology, 2008. ITSim 2008. International Symposium on, vol. 4.   IEEE, 2008, pp. 1–6.
  • [11] J. M. Á. Alvarez and A. M. Lopez, “Road detection based on illuminant invariance,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 1, pp. 184–193, 2011.
  • [12] W. Maddern, A. Stewart, C. McManus, B. Upcroft, W. Churchill, and P. Newman, “Illumination invariant imaging: Applications in robust vision-based localisation, mapping and classification for autonomous vehicles,” in Proceedings of the Visual Place Recognition in Changing Environments Workshop, IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, vol. 2, 2014, p. 3.
  • [13] V. Badrinarayanan, A. Handa, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling,” arXiv preprint arXiv:1505.07293, 2015.
  • [14] J. Alvarez, T. Gevers, Y. LeCun, and A. Lopez, “Road scene segmentation from a single image,” Computer Vision–ECCV 2012, pp. 376–389, 2012.
  • [15] G. L. Oliveira, W. Burgard, and T. Brox, “Efficient deep models for monocular road segmentation.” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016. [Online]. Available: http://lmb.informatik.uni-freiburg.de//Publications/2016/OB16b
  • [16] P. Dollár and C. L. Zitnick, “Structured forests for fast edge detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1841–1848.
  • [17] P. Dollár and C. L. Zitnick, “Structured forests for fast edge detection,” in ICCV, 2013.
  • [18] C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” in ECCV, 2014.
  • [19] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.
  • [20] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” International Journal of Robotics Research (IJRR), 2013.
  • [21] P. Y. Shinzato, D. F. Wolf, and C. Stiller, “Road terrain detection: Avoiding common obstacle detection assumptions using sensor fusion,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings, June 2014, pp. 687–692.
  • [22] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, “Learning low-level vision,” International journal of computer vision, vol. 40, no. 1, pp. 25–47, 2000.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
50617
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description