# Detecting Drivable Area for Self-driving Cars:

An Unsupervised Approach

###### Abstract

It has been well recognized that detecting drivable area is central to self-driving cars. Most of existing methods attempt to locate road surface by using lane line, thereby restricting to drivable area on which have a clear lane mark. This paper proposes an unsupervised approach for detecting drivable area utilizing both image data from a monocular camera and point cloud data from a 3D-LIDAR scanner. Our approach locates initial drivable areas based on a ”direction ray map” obtained by image-LIDAR data fusion. Besides, a fusion of the feature level is also applied for more robust performance. Once the initial drivable areas are described by different features, the feature fusion problem is formulated as a Markov network and a belief propagation algorithm is developed to perform the model inference. Our approach is unsupervised and avoids common hypothesis, yet gets state-of-the-art results on ROAD-KITTI benchmark. Experiments show that our unsupervised approach is efficient and robust for detecting drivable area for self-driving cars.

## I Introduction

In the field of self-driving cars, road detection is a crucial requirement, and the topic has been attracting considerable research interest in recent times. Moreover, significant achievements on road detection have been proposed in the literature[1]. Though there exist some algorithms on road detection for well-marked roads based on sample-training, unsupervised road detection for unlabeled roads in inner-city and rural areas still remains a challenge on account of the high variability of traffic scene and light conditions. So far, to the best of our knowledge, there exists no robust solution to solve this challenge. However, taking cues from human driving behavior, distinguishing drivable area from non-drivable area is a priority for humans when they drive. Then, roads are found and driving decisions are made based on the drivable area.

Inspired by human driving behavior, we proposed an unsupervised approach for detecting drivable area by fusing image data and LIDAR points, as shown in Fig.1. By combining image coordinate frame with LIDAR coordinate frame, a Delaunay triangulation [2] is generated to describe the spatial-relationship between points and utilized to classify obstacle points. Then an initial location of the drivable area is obtained by the fusion of ”direction ray map” and image superpixels, which serves as priori knowledge and narrows the range of detection, as detailed in Section III. In Section IV, features used to describe the final drivable area are learned autonomously based on that initial location. In Section V, a feature fusion step is implemented leveraging a Markov network through belief propagation, and the final results are obtained.

In the experiment step, our approach is tested on ROAD-KITTI benchmark[3]. Comparisons have been made with similar fusion approaches and ours gets state-of-the-art results without training or making assumptions about shape or height, which demonstrates the robustness and generalization ability of our approach.

## Ii Related Work

Reliably detecting the road areas is a key requirement in self-driving cars. In resent years, many approaches have been proposed to address this challenge. The approaches mainly differ from each other based on the type of sensors used to get data, such as, monocular camera[4], binocular camera[5] , LIDAR[6] and the fusion of multi-sensor[7].

For monocular camera based approaches, most road detection algorithms use cues such as color [8] and lane markings[9]. To cope with illumination varieties and shadows, different color spaces have been introduced [10][11][12]. Besides, leveraging deep learning, monocular vision based methods can achieve unprecedented results[13][14][15]. However, unlike other vision conceptions, such as cat or dog, the conception of a road cannot be defined by appearance alone. A region that is regarded as a road depends more on its physical attributes. Therefore, approaches only relying on monocular vision are not robust enough for real applications.

With the advent of LIDAR sensors, which can measure distances accurately, many LIDAR-based road detection approaches have been developed. Such approaches use the LIDAR points’ spatial information to analyse the scene and regard flat areas as roads. But due to the sparsity of LIDAR points, it’s hard to analyse the details between points. Besides, abandoning image information will increase the difficulty of points classification.

Considering the above drawbacks of the above methods, we propose an unsupervised detection approach for drivable area by fusing image data and LIDAR points. Compared with other detection methods, the superiorities of our approach reflect in three aspects. First, it dose not need strong hypothesis, training steps or manually labelled data, which ensures the generalization ability of our approach. Second, by fusing LIDAR and monocular camera, our approach can learn probabilistic models in a self-learning manner, thereby making it robust to complex road scenarios and fickle illumination. Finally, superpixels are used as basic processing units instead of pixels, and they have been found to be an easy and efficient way to combine sparse LIDAR points with image data. Therefore, we consider our unsupervised approach as an efficient and robust way for drivable areas detection for self-driving cars.

## Iii Preprocessing and Data Fusion

### Iii-a Image Processing in Superpixel Scale

In our approach, superpixels are considered as elementary units instead of pixels motivated by the observation that superpixels and LIDAR points are complementary. On the one hand, superpixels are dense and involve color information that LIDAR sensor cannot capture; on the other hand, LIDAR points reflect depth information that monocular camera cannot obtain. Owing to the promotion of superpixel segmentation methods, image processing in superpixel scale can cut down the computational and memory consumptions without much loss in accuracy. In our approach, ”Sticky Edge Adhesive Superpixels” are detected and used[16][17][18]. Without edge term, these superpixels are computed using an iterative approach like SLIC[19]; with edge term added, the superpixels snap to edges, resulting in higher quality boundaries. So it can be assumed that superpixels can adhere object boundary well. Therefore, it’s a reasonable choice to use superpixels to shape the initial location of drivable areas, which will be detailed in IV.

Besides, superpixels can be used to calculate local statistics so that results can be more robust. Thus, in our approach, superpixels are considered as elementary units instead of pixels, and assist in shaping the initial drivable area and accelerating feature extraction processes.

### Iii-B Illumination Invariant Color Space

To acquire color feature regardless of lighting condition or weather, RGB color images () are transformed into an illumination invariant color space, noted as . As presented in [12], a 3-channel image is converted into a 1-channel image with one parameter related to peak spectral responses of the camera:

(1) |

where ,, are wavelengths and the is obtained following:

(2) |

where is the pixel value of in and , , are , , pixel values of in , respectively. We set to 0.4706 as[12] suggested.

### Iii-C Obstacle Classification via Data Fusion

The whole process of this subsection is shown as Fig.2. To fuse LIDAR points with image pixels, the projection of 3D points is employed as presented in [20]. After the alignment, the LIDAR points set noted as is gained, where . is ’s LIDAR coordinate and is its image coordinate.

The objective of obstacle classification step is to find the mapping relations . indicates that is an obstacle point while indicates that is a non-obstacle point (as shown in Fig.3). It is assumed that whether is an obstacle point or not merely depends on how flat its corresponding surface is in the physical world, so the problem is broken up into two sub-problems: how to define the corresponding surface of and how to measure the its flatness, as shown in Fig.2.

To define the corresponding surface of , Delaunay triangulation is utilized to establish the spatial-relationships among , because it has properties that each vertex has on average six surrounding triangles in the plane, and nearest neighbor graph is a subgraph of the Delaunay triangulation. The graph is generated as proposed in [21]. For each , its image coordinate is used in a planar Delaunay triangulation to generate an undirected graph . represents the set of edges which defines the relationships among . The edge is discarded if it dose not satisfy :

(3) |

where is the Euclidean distance of and .

Then, the ”corresponding surfaces” are the surfaces (triangles) determined by , where is the set of ’s neighbor points. Then, the flatness of ’s corresponding surfaces can be measured by calculating the normal vectors of them, and is obtained by averaging the normal vectors of ’s neighboring triangles. Finally, is gained:

(4) |

where is the minimum deviation angle from horizon of an obstacle point.

## Iv Drivable Areas Detection

To locate the drivable area, we first get an initial location of it and then truing it by features, which is a coarse to fine process.

### Iv-a Locating Initial Drivable Areas

Once the classification step is completed, the corresponding drivable areas are determined. To locate initial drivable areas, we first get the ”direction ray map ()”, and then fuse it with superpixels. is obtained as shown in Algorithm.1. First, polar coordinate transformation is employed to , taking the middle bottom pixel of as the origin, noted as . Then, is restructured as where . means a LIDAR point whose transformed image coordinate is in the -th angle range. Because is sparse in laser coordinate frame, two problems emerge: the first is how to transform the sparse rays into dense pixels; the second is how to overcome the ”ray leak” problem shown in Fig.4.

As for the first problem, increasing is a natural solution, meanwhile, it aggravates the ”ray leak” problem. Therefore, is fused with superpixels to address this problem.

For the second problem, it should be noted that the width of a vehicle is not ignorable. That is, whether an area is drivable or not depends on the flatness of the area, as well as its width. Therefore, a minimum length filtering method is used to filter the leaked rays, and the final is shown in Fig.4(c).

Once has been obtained, the initial drivable area is generated by fusing with superpixels. Essentially, it is a set of superpixels noted as , and the set of LIDAR points within is noted as .

### Iv-B Feature Extraction Based on Initial Drivable Area

Once is obtained, four features (”level” feature, normal feature, color feature, strength feature) are all calculated superpixel by superpixel to describe .

#### Iv-B1 ”Level” Feature

Our method focuses on detecting the drivable area. Therefore, a feature called ”level” is proposed to describe the drivable degree, and Algorithm.2 shows steps to calculate it. LIDAR points in are arranged in accordance with the distances to , that is, every point in satisfies:

(5) |

Then, the ”level” feature of superpixel is defined as:

(6) |

Because corresponds to , a small means a high drivable degree of the relevant area. A probability map is generated in the ”level” feature space, where the probability distribution is represented by a Gaussian-like model with parameters and as:

(7) |

where is the probability that belongs to the drivable area in the ”level” feature space. The parameter and can be calculated throughout in a self-learning manner without training.

#### Iv-B2 Normal Feature

The normal feature of each superpixel is designed as the minimum value of the among . As mentioned in Section III, represents the angle deviation of the relevant spatial triangle from the horizon. Thus, a larger means a higher drivable degree of . Namely, the larger is, the more flat the area will be. Similar to (7), a Gaussian-like model with parameters and is built as:

(8) |

where is the probability that belongs to the drivable area in the normal feature space. The estimation of and is the same as and mentioned above. Similarly, no manual setting or training is needed.

#### Iv-B3 Color Feature

The color feature of is calculated using . A Gaussian model with parameters and is built as:

(9) |

where is the probability that belongs to the drivable area in the color feature space. And is the color of in illumination invariant color space. and are calculated throughout like and .

#### Iv-B4 Strength Feature

The number of ray points within superpixel is counted to measure the smoothness of the relevant area, and is defined as strength feature . Different from above Gaussian models, the probability that belongs to the drivable area is calculated as:

(10) |

where presents the Euclidean distance between and in image coordinate frame, and presents the area of .

## V Feature Fusion via Belief Prorogation

Once all features are obtained as detailed in Section IV, these features are fused to get the final results. The most straightforward fusion method is using Bayesian rule to get the maximum posteriori probability of each superpixel that belongs to the drivable area. But this fusion method ignores the positional relationship among superpixels which is valuable in this task. To model this kind of relationship between adjacent superpixels, a Markov network is used. As shown in Fig.5, a circle node represents the state of a superpixel and a square node represents the observation of the corresponding superpixel. An undirected line models relationship between adjacent superpixels and is calculated by potential compatibility function . A directed line models the observation process.

To perform the inference of the model, the belief propagation algorithm is used[22]. The local message passing from node to node is:

(11) |

where is the set of of ’s adjacent superpixels except (green nodes in Fig.5). The marginal posterior probability of can be obtained by

(12) |

Then the fusion problem is formulated as designing the likehood function and potential compatibility function . can be obtained by:

(13) |

Noticing that represents the closeness of and , so is defined as:

(14) |

which is similar to (8).

## Vi Experimental Results And Discussion

In order to validate our approach, we test it on the ROAD-KITTI benchmark [3].
The result is evaluated in BEV with the metrics max F-measure (), average precision (), precision (), recall (), false positive rate (), and false negative rate () for three datasets: Urban Marked (UM), Urban Multiple Marked (UMM), Urban Unmarked (UU).
To show the priority of the proposed algorithm (”Ours Test” in tables below), we compare it with HybridCRF, MixedCRF and LidarHisto, which are the top three methods among LIDAR involved methods from ROAD-KITTI benchmark’s websit^{1}^{1}1 http://www.cvlibs.net/datasets/kitti/eval_road.php .
Besides, our experimental results on training set are also listed below (”Ours Train” in tables below) only to demonstrate that our approach is training-free.

As TABLE I II and III show, our approach achieves the highest in set UM and UU indicating that it is robust for different situations. We also obtain the best score in and in UMM and UU, which indicates that the road areas are covered well by our results. TABLE IV lists the performance across the three datasets and our approach obtains the best , and . However, our value is higher compared with other methods. This can be explained by the fact that the ground truth represents road areas, while our approach detects drivable areas, which contains road as well as other flat area (lawn, transition zones between sidewalk and road). In practical application, self-driving cars should choose such flat areas as candidate road for emergency, such as avoiding sudden turning vehicles. Above all, our method is unsupervised and still it gets such competitive performance compared with supervised methods.

UM | MaxF | AP | PRE | REC | FPR | FNR |
---|---|---|---|---|---|---|

HybridCRF | 90.99 | 85.26 | 90.65 | 91.33 | 4.29 | 8.67 |

MixedCRF | 90.83 | 83.84 | 89.09 | 92.64 | 5.17 | 7.36 |

LidarHisto | 89.87 | 83.03 | 91.28 | 88.49 | 3.85 | 11.51 |

Ours Test | 84.96 | 86.51 | 79.94 | 90.65 | 10.37 | 9.35 |

Ours Train | 86.34 | 88.17 | 82.29 | 90.80 | 8.98 | 9.20 |

UMM | MaxF | AP | PRE | REC | FPR | FNR |
---|---|---|---|---|---|---|

HybridCRF | 91.95 | 86.44 | 94.01 | 89.98 | 6.30 | 10.02 |

MixedCRF | 92.29 | 90.06 | 93.83 | 90.80 | 6.56 | 9.20 |

LidarHisto | 93.32 | 93.19 | 95.39 | 91.34 | 4.85 | 8.66 |

Ours Test | 92.22 | 92.23 | 91.70 | 92.74 | 9.23 | 7.26 |

Ours Train | 92.74 | 93.87 | 92.51 | 92.96 | 8.20 | 7.04 |

UU | MaxF | AP | PRE | REC | FPR | FNR |
---|---|---|---|---|---|---|

HybridCRF | 88.53 | 80.79 | 86.41 | 90.76 | 4.65 | 9.24 |

MixedCRF | 82.79 | 69.11 | 79.01 | 86.96 | 7.53 | 13.04 |

LidarHisto | 86.55 | 81.13 | 90.71 | 82.75 | 2.76 | 17.25 |

Ours Test | 83.48 | 84.75 | 77.19 | 90.87 | 8.75 | 9.13 |

Ours Train | 83.20 | 84.97 | 77.45 | 89.86 | 9.31 | 10.14 |

URBAN | MaxF | AP | PRE | REC | FPR | FNR |
---|---|---|---|---|---|---|

HybridCRF | 90.81 | 86.01 | 91.05 | 90.57 | 4.90 | 9.43 |

M-CRF | 89.46 | 83.70 | 88.52 | 90.42 | 6.46 | 9.59 |

LidarHisto | 90.67 | 84.79 | 93.06 | 88.41 | 3.63 | 11.59 |

Ours Test | 87.72 | 87.84 | 83.97 | 91.83 | 9.65 | 8.17 |

Ours Train | 87.43 | 89.00 | 84.08 | 91.21 | 8.83 | 8.79 |

Besides, in order to testify how much the feature fusion step boosts the performance, we compare the final results with results from single feature space as well as . All the results in TABLE V are obtained from experiments on training set. As TABLE V shows, feature fusion achieves a significant boost in , and . It should be noticed that the (”Initial” in the table below) performs outstandingly in and with a similar with Baseline, so that it’s reasonable to use to estimate parameters as described in Section IV.

URBAN | MaxF | AP | PRE | REC | FPR | FNR |
---|---|---|---|---|---|---|

Baseline | 77.95 | 82.47 | 72.83 | 83.88 | 20.03 | 16.11 |

Initial | 81.23 | 67.64 | 70.75 | 96.25 | 21.35 | 3.75 |

Color | 85.35 | 79.76 | 78.93 | 93.43 | 12.81 | 6.57 |

Strength | 84.16 | 86.37 | 79.36 | 89.70 | 12.76 | 10.30 |

Level | 86.14 | 76.62 | 80.58 | 92.96 | 11.65 | 7.04 |

Normal | 87.04 | 79.23 | 83.48 | 91.28 | 8.98 | 8.70 |

Fusion | 87.43 | 89.00 | 84.08 | 91.21 | 8.83 | 8.79 |

## Vii Conclusion And Future Work

In this paper, an unsupervised approach for detecting drivable areas is proposed that fuses four features via belief prorogation. Our approach combines both pixel information and depth information to overcome the drawbacks of using single observation when faced with highly various traffic scene and light conditions. Without the need of strong hypothesis, training steps or manually labelled data, our method is proved to be a general approach for self-driving cars. Besides, the experiments on the ROAD-KITTI benchmark verified the efficiency and robustness of our approach. In future work, we will first focus on separating road areas from the drivable areas and locating candidate drivable areas for emergency. Besides, a more suitable dataset with hierarchical labels for drivable area is required. Finally, we intend to realize a FPGA implementation of our approach to achieve a real-time application for self-driving cars.

## Acknowledgment

This research was partially supported by the National Natural Science Foundation of China (No. 61627811, L1522023), the Programme of Introducing Talents of Discipline to University (No. B13043)

## References

- [1] A. Bar Hillel, R. Lerner, D. Levi, and G. Raz, “Recent progress in road and lane detection: a survey,” Machine vision and applications, pp. 1–19, 2014.
- [2] D.-T. Lee and B. J. Schachter, “Two algorithms for constructing a delaunay triangulation,” International Journal of Computer & Information Sciences, vol. 9, no. 3, pp. 219–242, 1980.
- [3] J. Fritsch, T. Kuehnl, and A. Geiger, “A new performance measure and evaluation benchmark for road detection algorithms,” in International Conference on Intelligent Transportation Systems (ITSC), 2013.
- [4] T. Kuehnl, F. Kummert, and J. Fritsch, “Spatial ray features for real-time ego-lane extraction,” in Proc. IEEE Intelligent Transportation Systems, 2012.
- [5] H. Badino, U. Franke, and R. Mester, “Free space computation using stochastic occupancy grids and dynamic programming,” in Workshop on Dynamical Vision, ICCV, Rio de Janeiro, Brazil, vol. 20. Citeseer, 2007.
- [6] C. Tongtong, D. Bin, L. Daxue, Z. Bo, and L. Qixu, “3d lidar-based ground segmentation,” in Pattern Recognition (ACPR), 2011 First Asian Conference on. IEEE, 2011, pp. 446–450.
- [7] L. Xiao, B. Dai, D. Liu, T. Hu, and T. Wu, “Crf based road detection with multi-sensor fusion,” in Intelligent Vehicles Symposium (IV), 2015 IEEE. IEEE, 2015, pp. 192–198.
- [8] A. Broggi and S. Berte, “Vision-based road detection in automotive systems: A real-time expectation-driven approach,” Journal of Artificial Intelligence Research, vol. 3, pp. 325–348, 1995.
- [9] Z. Nan, P. Wei, L. Xu, and N. Zheng, “Efficient lane boundary detection with spatial-temporal knowledge filtering,” Sensors, vol. 16, no. 8, p. 1276, 2016.
- [10] U. L. Jau, C. S. Teh, and G. W. Ng, “A comparison of rgb and hsi color segmentation in real-time video images: A preliminary study on road sign detection,” in Information Technology, 2008. ITSim 2008. International Symposium on, vol. 4. IEEE, 2008, pp. 1–6.
- [11] J. M. Á. Alvarez and A. M. Lopez, “Road detection based on illuminant invariance,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 1, pp. 184–193, 2011.
- [12] W. Maddern, A. Stewart, C. McManus, B. Upcroft, W. Churchill, and P. Newman, “Illumination invariant imaging: Applications in robust vision-based localisation, mapping and classification for autonomous vehicles,” in Proceedings of the Visual Place Recognition in Changing Environments Workshop, IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, vol. 2, 2014, p. 3.
- [13] V. Badrinarayanan, A. Handa, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling,” arXiv preprint arXiv:1505.07293, 2015.
- [14] J. Alvarez, T. Gevers, Y. LeCun, and A. Lopez, “Road scene segmentation from a single image,” Computer Vision–ECCV 2012, pp. 376–389, 2012.
- [15] G. L. Oliveira, W. Burgard, and T. Brox, “Efficient deep models for monocular road segmentation.” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016. [Online]. Available: http://lmb.informatik.uni-freiburg.de//Publications/2016/OB16b
- [16] P. Dollár and C. L. Zitnick, “Structured forests for fast edge detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1841–1848.
- [17] P. Dollár and C. L. Zitnick, “Structured forests for fast edge detection,” in ICCV, 2013.
- [18] C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” in ECCV, 2014.
- [19] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.
- [20] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” International Journal of Robotics Research (IJRR), 2013.
- [21] P. Y. Shinzato, D. F. Wolf, and C. Stiller, “Road terrain detection: Avoiding common obstacle detection assumptions using sensor fusion,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings, June 2014, pp. 687–692.
- [22] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, “Learning low-level vision,” International journal of computer vision, vol. 40, no. 1, pp. 25–47, 2000.