Road Crack Detection Using Deep ConvolutionalNeural Network and Adaptive Thresholding

Road Crack Detection Using Deep Convolutional
Neural Network and Adaptive Thresholding

Rui Fan, Mohammud Junaid Bocus, Yilong Zhu,
Jianhao Jiao, Li Wang, Fulong Ma, Shanshan Cheng, Ming Liu
R. Fan, J. Jiao and M. Liu are with the Robotics and Multi-Perception Laboratory in Robotics Institute at the Hong Kong University of Science and Technology, Hong Kong.M. J. Bocus is with the Visual Information Institute at the University of Bristol, Bristol, United Kingdom.Y. Zhu and F. Ma are with Unity-Drive Technology Inc, Shenzhen, China.L. Wang and S. Cheng are with the National Engineering Research Center of Road Maintenance Technologies, Beijing, China.These two authors contributed equally to this work and therefore are joint first authors. Corresponding author: Rui Fan. Email:

Crack is one of the most common road distresses which may pose road safety hazards. Generally, crack detection is performed by either certified inspectors or structural engineers. This task is, however, time-consuming, subjective and labor-intensive. In this paper, we propose a novel road crack detection algorithm based on deep learning and adaptive image segmentation. Firstly, a deep convolutional neural network is trained to determine whether an image contains cracks or not. The images containing cracks are then smoothed using bilateral filtering, which greatly minimizes the number of noisy pixels. Finally, we utilize an adaptive thresholding method to extract the cracks from road surface. The experimental results illustrate that our network can classify images with an accuracy of , and the cracks can be successfully extracted from the images using our proposed thresholding algorithm.

I Introduction

Road damages in the form of cracks may reduce the road performance and pose potential road safety hazards [1]. Every year, government bodies across the globe allocate funds to enhance the quality of their road networks [2]. Road safety should be taken very seriously and authorities are fully aware of the need for suitable road inspection and maintenance techniques [3]. Crack detection is an essential part of road maintenance systems, and it has attracted growing interest from researchers in this field over the past few years [4]. Traditional manual road crack detection approaches are known to be very time-consuming, dangerous, labor-intensive and subjective [5]. Therefore, the slow and subjective traditional methods have been gradually replaced by automated crack detection systems which provide fast and reliable analysis in intelligent transportation systems (ITS) [6]. Automated crack detection systems can effectively assess the quality of the road surfaces and help governments plan and prioritize the maintenance of the road network, thereby keeping the roads in good condition and extending their service life [7].

With the development of image analysis techniques, road crack detection and recognition have been widely investigated over the past few decades [8, 9, 10, 11]. The traditional framework for crack detection consists of defining a variety of gradient features using gradient filters, such as Sobel [12, 13], for each image pixel, and then using a binary classifier to determine whether an image pixel is part of a crack region or not [8]. In early methods, such as [14] and [15], the authors used threshold-based approaches to find crack regions based on the assumption that a pixel lying in a crack area is consistently darker than others [7, 16]. Furthermore, many researchers [10, 11, 17, 18] tried to suppress the inference of noise by considering additional local features, such as the mean and the standard deviation of an image region. However, these methods are still very sensitive to noise because only the brightness features are taken into consideration.

In recent years, some novel algorithms, such as minimal path selection (MPS) [19, 20, 21], minimum spanning tree (MST) [22, 23] and crack fundamental element (CFE) [24, 25], have been proposed to improve the existing crack detection approaches. In addition, Hu and Zhao [26] proposed a crack detection algorithm based on local binary patterns (LBP), whereas the authors of [27] utilized Gabor filter for the same purpose. In [22], an automated crack detection algorithm based on a tree structure, referred to as CrackTree, was introduced. Moreover, Oliveira et al. [7, 28] utilized a comprehensive set of image analysis algorithms to detect and characterize cracks from road pavements. Although the above-mentioned algorithms have been widely used in crack detection and they perform well on high-quality datasets [22, 28, 29], it is important to note that these algorithms are not accurate enough to distinguish cracks from the complex background in low-quality images.

Fig. 1: The structure of the proposed deep neural network for image classification.
Fig. 2: The structure of the green block in Fig. 1.

Furthermore, some machine learning-based crack detection approaches [30, 31, 32, 33, 34, 35] have been proposed in recent years, and the features produced by neural network are very likely to replace the local features utilized in traditional methods [36]. For example, restricted Boltzmann machine (RBM) anto encoder and their variants are capable of detecting cracks, when the training samples are limited [37]. In addition, deep convolutional neural networks (DCNNs) are popular for feature-learning and supervised classification [38]. Zhang et al. [36] trained a neural network to determine whether the patches in an images contain cracks or not. Hence in this paper, we build on the recent successful application of deep neural network to image classification and train a convolutional neural network (CNN) to find the images that contain cracks. Then, we present a novel thresholding method to extract cracks from classified color images.

The remainder of this paper is structured as follows: Section II introduces the proposed crack detection algorithm. In Section III, we present our experimental results and discuss the performance of the proposed method. Finally, Section IV concludes the paper and provides some recommendations for future work.

Ii Methodology

The proposed crack detection method consists of two steps: image classification and image segmentation. For notational convenience, images showing the presence and absence of cracks are referred to as positive and negative images, respectively. Firstly, an image is classified as either positive or negative using a deep convolutional neural network. The positive images are then processed using an adaptive thresholding method. The cracks in the positive images can therefore be extracted. The rest of this section gives a detailed description of these two steps.

Ii-a Image Classification

The structure of the proposed deep convolutional neural network is shown in Fig. 1, where ReLU represents a rectified linear unit, which is the most popular activation function for deep neural networks, due to its better performance than both sigmoid function and hyperbolic tangent function in terms of training and evaluation [38]. A CNN is generally considered as a hierarchical feature extractor [36]. A convolutional layer performs a convolution operation on the image input and passes the extracted features to the next layer [38]. Batch normalization is then performed on the output of the convolutional layer, whereby the extracted features are normalized by adjusting and scaling the activations. [39]. The structure of the green block in Fig. 1 is shown in Fig. 2. Max pooling downsamples the input representations [38], whereas the softmax function translates a vector into a probability distribution. Finally, a fully connected layer computes the score of each class and infers the category of the input image [36]. Therefore, the proposed network is also referred to as a fully connected network (FCN). More details on the training process are provided in Section III.

Ii-B Image Segmentation

Fig. 3: Bilateral filtering and image segmentation; (a) original positive image; (b) filtered positive image; (c) segmentation result.

Since the images have already been classified using our proposed deep neural network, only the positive images are considered for processing in this subsection. Before performing image segmentation, we first utilize a bilateral filter [40, 41] to smooth the input images. Bilateral filter outperforms other image filters in terms of edge preservation [40]. A general expression for bilateral filtering is as follows:




represents the intensity of a pixel at in the input image. denotes the intensity of a pixel at in the filtered image. and are based on spatial distance and color similarity, respectively. Their values are controlled by two parameters and , respectively. In our experiments, the values of and are set to 300 and 0.1, respectively. is set to 5. The filtered image is shown in Fig. 3.

To further reduce noise in the filtered image, the latter is downsampled as shown in Fig. 4. The downsampled image is approximately nine times smaller than the original filtered image and it is utilized as the threshold for image segmentation. It is to be noted that the intensity of a pixel in the downsampled image is normalized.

Fig. 4: Image downsampling.

The proposed thresholding method hypothesizes that the downsampled image is composed of two parts: foreground (cracks) and background (road surface), and they can be separated using one threshold . Furthermore, we assume that a pixel lying in a crack area is consistently darker than others. To find the best threshold , we formulate the thresholding problem as a 2D vector quantization problem, where each pixel and its neighborhood system provide a vector , where represents the intensity of , and denotes the mean intensity of . The vectors are stored in a 2D histogram, as shown in Fig. 5. The relationship between and is as follows:


where dictates the size of the neighborhood system. The threshold can be determined by partitioning the vectors into two clusters , where and correspond to the foreground and background, respectively.

According to the Markov Random Field [42], the intensity of a pixel which is not located near the boundary between foreground and background is similar to those of its neighbors in all directions. Hence, we search for the threshold along the principal diagonal of the 2D histogram using -mean clustering [43]. Given a threshold , the 2D histogram can be divided into four regions, as shown in Fig. 5. Regions 1 and 2 store the vectors of foreground and background, respectively. On the other hand, regions 3 and 4 store the noisy vectors. In our method, the vectors in regions 3 and 4 are not considered into the clustering process. The best threshold is computed by minimizing the within-cluster sum squares as follows:


where denotes the mean of the vectors in . The within-cluster sum squares with respect to different are shown in Fig. 5, and the corresponding segmentation result is shown in Fig. 3, where the crack and road surface are in white and black, respectively. The performance of the crack detection algorithm is evaluated in Section III.

Iii Experimental Results

In our experiments, the proposed deep neural network is trained on an NVIDIA GTX 1080 Ti GPU111, which has 3584 CUDA cores and 11 GB GDDR5X memory. The GPU memory bandwidth is 484 GB/s. The training is implemented on Matlab R2018b. Our trained neural network is publicly available at:

Fig. 5: 2D histogram and the within-cluster sum squares with respect to different .

The dataset222 utilized for training the proposed network was created by the researchers from Middle East Technical University. The dataset contains 40000 RGB images (resolution: 227227). The number of positive and negative images are both 20000.

Fig. 6: Experimental results of image classification. (a) True positive images. (b) True negative images.
Fig. 7: The failed classification results. (a) False positive images. (b) False negative images.
Fig. 8: Experimental results of image segmentation; (a) input images; (b) filtered images; (c) results obtained using Otsu’s thresholding method; (d)-(g) results obtained using the proposed method when is set to , 2, 3 and 4, respectively; (h) ground truth.

In our practical experiments, we randomly select 15000 positive images and 15000 negative images from the dataset, to train the neural network. The rest of the images are utilized to evaluate the performance of the proposed approach. The initial learning rate, the maximum number of epochs and the validation frequency are set to 0.01, 16 and 60, respectively. The stochastic gradient descent with momentum (SGDM) is utilized as the optimizer, and the value of momentum is set to 0.9.

To quantify the accuracy of our proposed image classier, we compute , , and , which represent the number of testing images that are true positive, false positive, false negative and true negative, respectively. The precision, recall, accuracy and F-measure can be computed using the following equation:


The values of precision, recall, accuracy and F-measure achieved using the proposed network are (the number of false positive images and false negative images are both four). The true positive and true negative examples are shown in Fig. 6, while the false positive and false negative results are shown in Fig. 7. The image classification takes about 4.8 ms on an Intel Core i7-8700K CPU processed with a single core (3.7 GHz).

Furthermore, we evaluate the performance of the image segmentation at pixel level. Some experimental results of image filtering and segmentation are shown in Fig. 8. Since the dataset we use does not contain pixel-level ground truth, we manually label the crack areas in a set of images and use these ground truth images to quantify the performance of our proposed segmentation method. Moreover, we compare our method with Otsu’s thresholding method [44] with respect to pixel-level precision, recall, accuracy and F-measure. The notations , , and in (6), (7) and (8) represent the number of pixels that are true positive, false positive, false negative and true negative, respectively. The comparison between these two methods is shown in Table I. It is to be noted here that the crack areas with less than 100 pixels are ignored in our experiments. From Table I, it can be observed that our method outperforms Otsu’s thresholding method in terms of precision, accuracy and F-measure, and the proposed segmentation method achieves the best performance when is set to 1.

Method Precision Recall Accuracy F-measure
Otsu’s thresholding 0.9590 0.9339 0.9848 0.9462
Proposed () 0.9774 0.9331 0.9870 0.9548
Proposed () 0.9854 0.9246 0.9867 0.9541
Proposed () 0.9967 0.9046 0.9848 0.9484
Proposed () 0.9955 0.8952 0.9831 0.9427
TABLE I: Comparison between our proposed method and Otsu’s thresholding method.

Iv Conclusion and Future work

A novel crack detection approach was proposed in this paper. The main novelties include a fully connected neural network for image classification and a -mean clustering based image segmentation algorithm. Firstly, our neural network classified the input images as either positive (crack present) or negative (crack absent). The positive images were then processed using a bilateral filter, which not only minimized the number of noisy pixels but also preserved the edges between the cracks and road surface. Finally, the filtered images were downsampled, and an adaptive threshold was computed by minimizing the within-cluster sum squares. The cracks can therefore be detected by segmenting the filtered images using the adaptively determined threshold. The experimental results showed that the precision of the image classification is and the pixel-level segmentation accuracy is around .

Although the proposed image segmentation algorithm performs better than Otsu’s thresholding method in terms of distinguishing between foreground (cracks) and background (road surface), some color images with a large number of noisy pixels cannot be properly segmented. Therefore, as a future work, a deep neural network can be trained to segment the positive images into a set of semantically meaningful regions, i.e., cracks and road surface.


This work is supported by grants from the Research Grants Council of the Hong Kong SAR Government, China (No. 11210017 and No. 21202816) awarded to Prof. Ming Liu. This work is also supported by grants from the Shenzhen Science, Technology and Innovation Commission, JCYJ20170818153518789, and National Natural Science Foundation of China (No. 61603376) awarded to Dr. Lujia Wang.


  • [1] R. Fan, Y. Liu, X. Yang, M. J. Bocus, N. Dahnoun, and S. Tancock, “Real-time stereo vision for road surface 3-d reconstruction,” in 2018 IEEE International Conference on Imaging Systems and Techniques (IST).   IEEE, 2018, pp. 1–6.
  • [2] H. Oliveira and P. L. Correia, “Automatic road crack segmentation using entropy and image dynamic thresholding,” in Signal Processing Conference, 2009 17th European.   IEEE, 2009, pp. 622–626.
  • [3] R. Fan, X. Ai, and N. Dahnoun, “Road surface 3D reconstruction based on dense subpixel disparity map estimation,” vol. 27, no. 6, pp. 3025–3035, Jun. 2018.
  • [4] T. S. Nguyen, M. Avila, and S. Begot, “Automatic detection and classification of defect on road pavement using anisotropy measure,” in Signal Processing Conference, 2009 17th European.   IEEE, 2009, pp. 617–621.
  • [5] R. Fan, “Real-time computer stereo vision for automotive applications,” Ph.D. dissertation, University of Bristol, 2018.
  • [6] R. Fan, J. Jiao, J. Pan, H. Huang, S. Shen, and M. Liu, “Real-time dense stereo embedded in a uav for road inspection,” arXiv preprint arXiv:1904.06017.
  • [7] H. Oliveira and P. L. Correia, “Automatic road crack detection and characterization,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 1, pp. 155–168, 2013.
  • [8] H. Oh, N. W. Garrick, and L. E. Achenie, “Segmentation algorithm using iterative clipping for processing noisy pavement images,” in Imaging Technologies: Techniques and Applications in Civil Engineering. Second International ConferenceEngineering Foundation; and Imaging Technologies Committee of the Technical Council on Computer Practices, American Society of Civil Engineers., 1998.
  • [9] M. Petrou, J. Kittler, and K. Song, “Automatic surface crack detection on textured materials,” Journal of materials processing technology, vol. 56, no. 1-4, pp. 158–167, 1996.
  • [10] Y. Huang and B. Xu, “Automatic inspection of pavement cracking distress,” Journal of Electronic Imaging, vol. 15, no. 1, p. 013017, 2006.
  • [11] M. Gavilán, D. Balcones, O. Marcos, D. F. Llorca, M. A. Sotelo, I. Parra, M. Ocaña, P. Aliseda, P. Yarza, and A. Amírola, “Adaptive road crack detection system by pavement classification,” Sensors, vol. 11, no. 10, pp. 9628–9657, 2011.
  • [12] U. Ozgunalp, R. Fan, X. Ai, and N. Dahnoun, “Multiple lane detection algorithm based on novel dense vanishing point estimation,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 3, pp. 621–632, 2017.
  • [13] R. Fan, V. Prokhorov, and N. Dahnoun, “Faster-than-real-time linear lane detection implementation using soc dsp tms320c6678,” in 2016 IEEE International Conference on Imaging Systems and Techniques (IST).   IEEE, 2016, pp. 306–311.
  • [14] M. S. Kaseko and S. G. Ritchie, “A neural network-based methodology for pavement crack detection and classification,” Transportation Research Part C: Emerging Technologies, vol. 1, no. 4, pp. 275–291, 1993.
  • [15] Q. Li and X. Liu, “Novel approach to pavement image segmentation based on neighboring difference histogram method,” in Image and Signal Processing, 2008. CISP’08. Congress on, vol. 2.   IEEE, 2008, pp. 792–796.
  • [16] R. Fan, M. J. Bocus, and N. Dahnoun, “A novel disparity transformation algorithm for road segmentation,” Information Processing Letters, vol. 140, pp. 18–24, 2018.
  • [17] N. Tanaka and K. Uematsu, “A crack detection method in road surface images using morphology.” MVA, vol. 98, pp. 17–19, 1998.
  • [18] H. Oliveira and P. L. Correia, “Supervised strategies for cracks detection in images of road pavement flexible surfaces,” in Signal Processing Conference, 2008 16th European.   IEEE, 2008, pp. 1–5.
  • [19] J. Jiao, R. Fan, H. Ma, and M. Liu, “Using dp towards a shortest path problem-related application,” arXiv preprint arXiv:1903.02765, 2019.
  • [20] M. Avila, S. Begot, F. Duculty, and T. S. Nguyen, “2d image based road pavement crack detection by calculating minimal paths and dynamic programming,” in Image Processing (ICIP), 2014 IEEE International Conference on.   IEEE, 2014, pp. 783–787.
  • [21] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, “Automatic crack detection on 2d pavement images: An algorithm based on minimal path selection, accepted to ieee trans,” Intell. Transp. Syst, 2015.
  • [22] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, “Cracktree: Automatic crack detection from pavement images,” Pattern Recognition Letters, vol. 33, no. 3, pp. 227–238, 2012.
  • [23] K. Fernandes and L. Ciobanu, “Pavement pathologies classification using graph-based features,” in Image Processing (ICIP), 2014 IEEE International Conference on.   IEEE, 2014, pp. 793–797.
  • [24] Y.-C. Tsai, C. Jiang, and Y. Huang, “Multiscale crack fundamental element model for real-world pavement crack classification,” Journal of Computing in Civil Engineering, vol. 28, no. 4, p. 04014012, 2012.
  • [25] Y. J. Tsai, C. Jiang, and Z. Wang, “Implementation of automatic crack evaluation using crack fundamental element,” in Image Processing (ICIP), 2014 IEEE International Conference on.   IEEE, 2014, pp. 773–777.
  • [26] Y. Hu and C.-x. Zhao, “A novel lbp based methods for pavement crack detection,” Journal of pattern Recognition research, vol. 5, no. 1, pp. 140–147, 2010.
  • [27] M. Salman, S. Mathavan, K. Kamal, and M. Rahman, “Pavement crack detection using the gabor filter,” in 16th international IEEE conference on intelligent transportation systems (ITSC 2013).   IEEE, 2013, pp. 2039–2044.
  • [28] H. Oliveira and P. L. Correia, “Crackit-an image processing toolbox for crack detection and characterization,” in Image Processing (ICIP), 2014 IEEE International Conference on.   IEEE, 2014, pp. 798–802.
  • [29] S. Varadharajan, S. Jose, K. Sharma, L. Wander, and C. Mertz, “Vision for road inspection,” in Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on.   IEEE, 2014, pp. 115–122.
  • [30] H. R. Roth, L. Lu, J. Liu, J. Yao, A. Seff, K. Cherry, L. Kim, and R. M. Summers, “Improving computer-aided detection using convolutional neural networks and random view aggregation,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1170–1181, 2016.
  • [31] D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” in Advances in neural information processing systems, 2012, pp. 2843–2851.
  • [32] D. C. Cireşan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Mitosis detection in breast cancer histology images with deep neural networks,” in International Conference on Medical Image Computing and Computer-assisted Intervention.   Springer, 2013, pp. 411–418.
  • [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [34] Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, “Improving object detection with deep convolutional networks via bayesian optimization and structured prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 249–258.
  • [35] J. Kivinen, C. Williams, and N. Heess, “Visual boundary prediction: A deep neural prediction network and quality dissection,” in Artificial Intelligence and Statistics, 2014, pp. 512–521.
  • [36] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack detection using deep convolutional neural network,” in Image Processing (ICIP), 2016 IEEE International Conference on.   IEEE, 2016, pp. 3708–3712.
  • [37] Y. Xu, S. Li, D. Zhang, Y. Jin, F. Zhang, N. Li, and H. Li, “Identification framework for cracks on a steel structure surface by a restricted boltzmann machines algorithm based on consumer-grade camera images,” Structural Control and Health Monitoring, vol. 25, no. 2, p. e2075, 2018.
  • [38] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
  • [39] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
  • [40] R. Fan and N. Dahnoun, “Real-time stereo vision-based lane detection system,” Measurement Science and Technology, vol. 29, no. 7, p. 074005, 2018.
  • [41] R. Fan, Y. Liu, M. J. Bocus, L. Wang, and M. Liu, “Real-time subpixel fast bilateral stereo,” arXiv preprint arXiv:1807.02044.
  • [42] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” vol. 23, no. 11, pp. 1222–1239, Nov. 2001.
  • [43] A. Ahmad and L. Dey, “A k-mean clustering algorithm for mixed numeric and categorical data,” Data & Knowledge Engineering, vol. 63, no. 2, pp. 503–527, 2007.
  • [44] N. Otsu, “A threshold selection method from gray-level histograms,” and Cybernetics IEEE Transactions on Systems, Man, vol. 9, no. 1, pp. 62–66, Jan. 1979.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description