Road Surface 3D Reconstruction Based on Dense Subpixel Disparity Map Estimation

Road Surface 3D Reconstruction Based on
Dense Subpixel Disparity Map Estimation

Rui Fan,  Xiao Ai, Naim Dahnoun Rui Fan is with the Visual Information Group, the University of Bristol, BS8 1UB, UK. email:; Xiao Ai is with the Quantum Technology Enterprise Centre, Nanoscience and Quantum Information Building, the University of Bristol, Bristol, BS8 1FD, UK. email: Naim Dahnoun is with the Department of Electrical and Electronic Engineering, Merchant Venturers Building, the University of Bristol, BS8 1UB, UK. email:

Various 3D reconstruction methods have enabled civil engineers to detect damage on a road surface. To achieve the millimetre accuracy required for road condition assessment, a disparity map with subpixel resolution needs to be used. However, none of the existing stereo matching algorithms are specially suitable for the reconstruction of the road surface. Hence in this paper, we propose a novel dense subpixel disparity estimation algorithm with high computational efficiency and robustness. This is achieved by first transforming the perspective view of the target frame into the reference view, which not only increases the accuracy of the block matching for the road surface but also improves the processing speed. The disparities are then estimated iteratively using our previously published algorithm where the search range is propagated from three estimated neighbouring disparities. Since the search range is obtained from the previous iteration, errors may occur when the propagated search range is not sufficient. Therefore, a correlation maxima verification is performed to rectify this issue, and the subpixel resolution is achieved by conducting a parabola interpolation enhancement. Furthermore, a novel disparity global refinement approach developed from the Markov Random Fields and Fast Bilateral Stereo is introduced to further improve the accuracy of the estimated disparity map, where disparities are updated iteratively by minimising the energy function that is related to their interpolated correlation polynomials. The algorithm is implemented in C language with a near real-time performance. The experimental results illustrate that the absolute error of the reconstruction varies from 0.1 mm to 3 mm.

3D reconstruction, road condition assessment, subpixel disparity estimation, parabola interpolation, Markov Random Fields, Fast Bilateral Stereo.

I Introduction

THE condition assessment of asphalt and concrete civil infrastructures, e.g., bridges, tunnels and pavements, is essential to ensure their usability while still providing maximum safety for the users. It also allows the government to allocate the limited resources for maintenance and appraise long-term investment schemes [1]. The manual visual inspections performed by either structural engineers or certified inspectors are cost-intensive, time-consuming and cumbersome [2]. In 2014, a one-off investment of £12bn was suggested by the Asphalt Industry Alliance to improve the road condition across England and Wales [3]. Over the last decade, various technologies such as remote sensing, vibration sensing and computer vision have been increasingly applied in civil engineering to assess the physical and functional condition of the infrastructures such as potholes, cracking, etc.

The remote sensing methods which have been used in satellites, aeroplanes, unmanned aerial vehicles or multi-purpose survey vehicles have indeed reduced the workload of inspectors. However, the traditional geotechnical methods can never be entirely replaced by the remote sensing approaches [4]. Using accelerometers and GPS for data acquisition, vibration-based methods always cause distress misdetection in spite of their advantages of small storage requirements, cost-effectiveness and real-time performance [2]. As for the approaches based on 2D computer vision, the spatial structure of the road surface cannot be illustrated explicitly [2]. Therefore, 3D reconstruction-based methods are more feasible to overcome these disadvantages and simultaneously provide an enhancement in terms of detection accuracy and processing efficiency.

3D reconstruction methods can be classified as laser scanner-based, Microsoft Kinect-based and passive sensor-based. The laser scanner collects the reflected laser pulse from an object to construct its accurate 3D model [4]. Although it provides accurate modelling results, the laser scanner equipment used for road condition analysis is still costly [2]. As for the methods based on the Microsoft Kinect sensor, the depth measurement for the outdoor environment is somewhat ineffective, especially for materials which strongly absorb the infrared light [5]. Therefore, the passive sensor-based methods, e.g., stereo vision, are more capable of reconstructing the 3D road surface for condition assessment or damage detection.

Fig. 1: Stereo vision-based road surface 3D reconstruction system workflow.

To reconstruct a real-world environment with passive sensing techniques, multiple camera views are required [6]. Images from different viewpoints can be captured using either a single moveable camera or an array of cameras [7]. In this paper, we use a ZED stereo camera to acquire a pair of images for road surface 3D reconstruction. Since the stereo rig is assumed to be well-calibrated, the main work performed in this paper is the disparity estimation. The algorithms for disparity estimation can be classified as local, global and semi-global. Local algorithms simply match a series of blocks and select the correspondence with the lowest cost or the highest correlation. This optimisation is also known as winner-take-all (WTA). Unlike local algorithms, global algorithms process the stereo matching using some more sophisticated optimisation techniques, e.g., Graph Cut (GC) [8] and Belief Propagation (BP) [9]. These algorithms are commonly developed based on the Markov Random Fields (MRF) [10], where finding the best disparities is formulated as a probability maximisation problem. This is later addressed by energy minimisation approaches. Semi-global matching (SGM) [11] approximates the MRF inference by performing cost aggregation along all directions in the image and this greatly improves the accuracy and efficiency of stereo matching. However, the occlusion problem always makes it difficult to find the optimum value for the smoothness parameters: over-penalising the smoothness term can help avoid the ambiguities around discontinuities but on the other hand can lead to errors for continuous areas [12]. Therefore, some authors have proposed to break down the global problem into multiple local problems, each of which is affected by uncertainties to a lesser extent [13]. For instance, one alternative way of setting smoothness parameters is to group pixels in the image into different slanted planes [14, 15, 13]. Disparities in different plane groups are estimated with local constraints. However, this results in high computational complexities, making real-time performance challenging.

In order to further improve the trade-off between speed and accuracy, seed-and-grow local algorithms have been used extensively. In these algorithms, the disparity map is grown from a selection of seeds to minimise expensive computations and reduce mismatches caused by ambiguities. For example, the authors of [16, 17, 18] presented an efficient quasi-dense stereo matching algorithm, named growing correspondence seeds (GCS), to estimate disparities iteratively with the search range propagated from a collection of reliable seeds. Similarly, various Delaunay triangulation-based stereo matching algorithms (DTSM) have been proposed in [19, 20, 21] to estimate tunable semi-dense disparity maps with the support of a piecewise planar mesh. Our previous algorithm [22, 23] also provides an efficient strategy for local stereo matching whereby the search range on row is propagated from three estimated neighbouring disparities on row . Our algorithm performs better than GCS and DTSM in terms of estimating dense disparity maps for road scenes where the road disparities decrease gradually from the bottom to the top, while the disparities of obstacles remain the same. The aim of this paper is to reconstruct the road scenes for pothole detection. In this regard, the proposed disparity estimation algorithm is developed based on our previous work in [23]. To assess the condition of a road surface, millimetre accuracy is desired in 3D reconstruction and thus disparities in subpixel resolution are inevitable. Therefore, the correlation costs around the initial disparity are interpolated into a parabola and the position of the extrema is selected as the subpixel disparity.

However, the subpixel disparity maps obtained from parabola interpolation are still unsatisfactory because the correlation costs of neighbourhood systems are not aggregated before finding the best disparities. To aggregate neighbouring costs adaptively, some authors have proposed to filter the whole cost volume with a bilateral filter since it provides a feasible solution for the initial message passing problem on a fully connected MRF [12]. These algorithms are also known as Fast Bilateral Stereo (FBS) [24, 25, 26]. However, the intensive computational complexity introduced when filtering the whole cost volume severely impact on the processing speed. In this regard, we believe that only the candidates around the best disparities need to be processed and a novel disparity refinement approach is proposed in this work. The workflow of our stereo vision-based road surface 3D reconstruction system is depicted in Fig. 1.

Firstly, the perspective view of the road surface in the target image is transformed into its reference view, which greatly enhances the similarity of the road surface between the two images. Since the propagated search range is sometimes insufficient, the desirable disparities have to be further verified to ensure they possess the highest correlation costs. The latter ensures the feasibility of parabola interpolation-based subpixel enhancement. To further optimise the obtained subpixel disparity map, the interpolated parabola functions are set as the labels in the MRF because they contain the information of both disparity values and correlation costs. By updating the parabola functions and subpixel disparities iteratively, a disparity in a continuous area becomes smooth but it is preserved when discontinuities occur. Finally, each 3D point on the road surface is computed based on its projections on the left and right images. The reconstruction accuracy is evaluated using three sample models (see section VI-A for more details). Our datasets are publicly available at:

The rest of the paper is structured as follows: section II presents a novel perspective transformation (PT) method. In section III, we describe a subpixel disparity estimation algorithm. A disparity map global refinement approach is introduced in section IV. In section V, the disparity map is post-processed and the 3D road surface is reconstructed. In section VI, the experimental results are illustrated and the performance of the proposed algorithm is evaluated. Finally, section VII summarises the paper and provides some recommendations for future work.

Ii Perspective Transformation

In this paper, the proposed algorithm focuses entirely on the road surface which can be treated as a ground plane (GP). To enhance the accuracy of stereo matching, we first draw on the concept of ground plane constraint in [27] and [28] to transform the perspective views of two images before estimating their disparities. GP constraint is commonly used in a wide range of obstacle detection systems, where the image on one side is set as the reference and the other image is transformed into the reference view. Pixels arising from the GP satisfy the same affine transformation while an object above the GP will not be transformed successfully [27]. Referring to the experimental results in [28], pixels from an obstacle are distorted in the transformed image. Nevertheless, the GP in the transformed image looks more similar to its reference view. Therefore, a perspective transformation makes the obstacle areas noisy and unreliable but greatly enhances the similarity of the road surface between two images. In this paper, the road surface is defined as:


where is an arbitrary 3D point on the road surface. Its projections on the left image and the right image are and , respectively. is the normal vector of the road surface. The planar transformation between and is given in Eq. 2 [6]. Here, denotes the homogeneous coordinate of .


denotes a homograph matrix, which is generally used to distinguish obstacles from the road surface [27]. It can be decomposed as [6]:


where is a SO(3) matrix and is a translation vector. in the left camera coordinate system can be transformed to in the right camera coordinate system according to . and are intrinsic matrices of the two cameras. For a well-calibrated stereo system, , , and are already known. We only need to estimate and for . Generally, can be estimated with at least four pairs of correspondences and [6]. Hattori et al. proposed a pseudo-projective camera model where several assumptions are made about road geometry to simplify the estimation of [27]. In this paper, we improve on their algorithm by considering the following hypotheses:

  • and are identical.

  • is an identity matrix.

  • is in the same direction as the -axis.

  • the road surface is a horizontal plane: .

  • rotation of the stereo rig is only about the -axis.

For a perfectly-calibrated stereo rig, . The disparity is defined as . The projection of a horizontal plane on the v-disparity map is a linear pattern [29]:


where is the pitch angle between the stereo rig and the road surface (an example can be seen in Fig. 7 (a)), is the focus length of the cameras, is the baseline, and is the principal point in pixels. When , is a constant. Otherwise, is proportional to [29]. This implies that a perspective distortion always exists for the GP in two images, which further affects the accuracy of block matching. Therefore, the PT aims to make the GP in the transformed image similar to that in the reference frame.

Fig. 2: BRISK-based on-road keypoints detection and matching between the left and right images.

Now, the PT can be straightforwardly realised using parameters . The proposed PT is detailed in algorithm 1. can be estimated by solving a least squares problem with a set of reliable correspondences and . In this paper, we use BRISK (Binary Robust Invariant Scalable Keypoints) to detect and match and . It allows a faster execution to achieve approximately the same number of correspondences as SIFT (Scale-Invariant Feature Transform) and SURF (Speeded-Up Robust Features) [30]. An example of on-road keypoints detection and matching is illustrated in Fig. 2.

Data: and
1 detect and match the keypoints in and ;
2 if  or  then
3       remove and from and , respectively;
5 estimate using the least squares fitting;
6 all points in the target image are shifted pixels to the reference view;
Algorithm 1 Perspective transformation.

Since outliers can severely affect the accuracy of least squares fitting, we first remove the less reliable correspondences before estimating , where is proposed to be 1. For the left disparity map estimation, each point on row in is shifted pixels to the right, where is a constant set to 20 (for dataset 1 and 2) or 30 (for dataset 3) to guarantee that all the disparities are positive. Similarly, each point in is shifted pixels to the left when is served as the reference. An example of perspective transformation is presented in Fig. 3. The performance improvements achieved by using the PT will be discussed in section VI.

Fig. 3: Perspective transformation. (a) left image. (b) right image. (c) transformed right image. (d) transformed left image. (a) and (c) are used as the input left and right images for the left disparity map estimation. (d) and (b) are used as the input left and right images for the right disparity map estimation.

Iii Subpixel disparity map estimation

As compared to many other stereo matching algorithms which aim at automotive applications, the trade-off between speed and precision has been greatly improved in our previous work [22] and [23]. The subpixel accuracy can be achieved by conducting a parabola interpolation for the correlation costs around the initial disparity [24]. The subpixel disparity global refinement will be discussed in section IV.

Iii-a Stereo Matching

In this paper, our previous algorithm [22] is utilised to estimate integer disparities, where the NCC (Normalised Cross-Correlation) is used to compute the matching costs, and the search range for pixel at is propagated from three estimated neighbouring disparities on row . To accelerate the NCC execution, we rearrange the NCC equation as follows:


where is defined as the correlation cost between two square blocks selected from and , and a higher corresponds to a better matching and vice-versa. or is the intensity of a pixel in or . The edge length of the square block is , and represents the number of pixels in it. and are the centres of the left and right blocks, respectively. and denote the means of the intensities within the two blocks. and are their standard deviations.

From Eq. 5, and only matter for each independent block selected from or , and determines a pair of blocks for matching. Therefore, the calculation of , , and will always be repeated in conventional NCC-based stereo matching algorithms. In [23], we propose to pre-calculate the values of and and store them in a static program storage for direct indexing. Thus, the computational complexity of the NCC is simplified to a dot product, making stereo matching more efficient. More details on the implementation procedure are available in [23].

Iii-A1 Search Range Propagation (SRP)

Since the concept of "local coherence constraint" was proposed in [31], many researchers have turned their focus on seed-and-grow algorithms for stereo matching. Either semi-dense or quasi-dense disparity maps can be estimated efficiently with the guidance from a collection of reliable feature points [16, 17, 18, 19, 20, 21]. In this paper, the road surface is treated as a GP whose disparities change gradually from the bottom of the image to its top, which makes our previous algorithm [22] more efficient than other methods in terms of estimating an accurate dense disparity map. The proposed algorithm propagates the search range iteratively row by row from the bottom of the image to its top. In the first iteration, the disparity estimation performs a full search range. Then, at is propagated from three estimated neighbouring disparities using Eq. 6, where is the bound of and is set as 1 in this paper. The left and right disparity maps, and , are shown in Fig. 4 (a) and (b), respectively.


Iii-A2 Correlation Maxima Verification (CMV)

Since the search range propagates using Eq. 6, errors may occur in subpixel enhancement when or is not computed and compared with . Therefore, CMV will run until the correlation cost of the disparity is a local maxima. More details are provided in algorithm 2.

Data: disparity map
Result: correlation maxima verified disparity map
1 if  then
2       ;
4 else if  then
5       repeat
6             compute ;
7      until ;
8      ;
10 else
11       repeat
12             compute ;
13      until ;
14      ;
16 end if
Algorithm 2 Correlation maxima verification
Fig. 4: Subpixel disparity map estimation. (a) left disparity map. (b) right disparity map. (c) left disparity map processed with the LRC check. (d) subpixel disparity map.

Iii-B Left-Right Consistency (LRC) Check

Due to the fact that each pair of correspondences from two images is unique, if we select an arbitrary pixel from the left disparity map , there should exist at most one correspondence in the right disparity map [12]:


Pixels that are only visible in one disparity map are marked as uncertainties. A LRC check is performed to remove these half-occluded areas. Although the LRC check doubles the computational complexity by re-projecting the estimated disparities from one disparity map to the other one, most of the infeasible conjugate pairs can be removed, and an outlier in the disparity map can be found. The left disparity map after the LRC check processing is illustrated in Fig. 4 (c).

Iii-C Subpixel Enhancement

In this paper, the road surface application requires a millimetre accuracy in 3D reconstruction. A disparity error larger than one pixel may result in a non-neglected difference in the reconstructed road surface [32]. Therefore, subpixel resolution is inevitable to achieve a highly accurate result.

For each pixel whose disparity is , we fit a parabola to three correlation costs , and around the initial disparity . The centreline of the parabola is selected as the subpixel displacement as follows [26]:


Since the CMV guarantees that is larger than both and , will be between and . Fig. 4 (c) after the subpixel enhancement is given in Fig. 4 (d).

Iv Disparity Map Global Refinement

Iv-a Markov Random Fields and Fast Bilateral Stereo

Unlike the principle of WTA applied in local stereo matching algorithms, the matching costs from neighbouring pixels are also taken into account in global algorithms, e.g., GC and BP. The MRF is a commonly used graphical model in these global algorithms. An example of the MRF model is depicted in Fig. 5.

The graph is a set of vertices connected by edges , where and . Two edges sharing one common vertex are called a pair of adjacent edges [33]. Since the MRF is considered to be undirected, and refer to the same edge here. is a neighbourhood system for .

For stereo vision problems, is a disparity map and is a vertex (or node) at the site of with a label of disparity . Because more candidates taken into consideration usually make the inference of a true disparity intractable, only the neighbours adjacent to are considered for stereo matching [10]. This is also known as a pairwise MRF. In this paper, and is a four-connected neighbourhood system. , , and are adjacent edges sharing the vertex . The disparity of tends to have a strong correlation with its vicinities, while it is linked implicitly to any other random nodes in the disparity map. In [10], the joint probability of the MRF is written as:


where represents the intensity differences, expresses the compatibility between possible disparities and the corresponding intensity differences, and expresses the compatibility between and its neighbourhood system. Now, the aim of finding the best disparity is equivalent to maximising the probability in Eq. 9. This can be realised by formulating Eq. 9 as an energy function [10]:

Fig. 5: Markov random fields.

and are two energy functions. corresponds to the matching cost and determines the aggregation from the neighbours. In the MRF model, the method to formulate an adaptive is important because the intensity in discontinuous areas usually varies greatly from that of its neighbours [34]. Since Tomasi et al. introduced the bilateral filter in [35], many authors have investigated its applications to aggregate the matching costs [24, 25, 26]. These methods are also grouped into fast bilateral stereo, where both intensity difference and spatial distance provide a weight to adaptively constrain the aggregation of discontinuities. A general representation of the cost aggregation in FBS is represented as follows:


where is based on the spatial distance and is based upon the colour similarity. The costs within a square block are aggregated adaptively to obtain .

Although the FBS has shown a good performance in terms of matching accuracy, it usually takes a long time to process the whole cost volume. Therefore, we propose an improved adaptive aggregation method to optimise the subpixel disparity map iteratively.

Iv-B Subpixel Disparity Refinement with Energy Minimisation

In this paper, the local algorithm proposed in section III greatly minimises the trade-off between accuracy and speed. A precise subpixel disparity map can be estimated with a near real-time performance. Compared to conventional MRF-based algorithms, our global refinement method only aggregates the costs around the best disparity and updates the disparity map in a more efficient way. The proposed disparity refinement algorithm is developed based on the following assumptions:

  • the subpixel disparity map obtained in section III is acceptable.

  • for an arbitrary pixel, its neighbours (excluding discontinuities) in all directions have similar disparities.

  • the interpolated parabola in section III-C is locally smooth.

Before going into further details about our disparity refinement approach, we first rewrite the energy function in Eq. 10 in a more general way as follows [36]:


where the term penalises the solutions that are inconsistent with the observed data, enforces the piecewise smoothness and is the smoothness parameter. For conventional MRF-based stereo matching algorithms, denotes the matching cost and is the cost aggregation from the neighbourhood system. By minimising the global energy of the whole random field, a disparity map can be estimated.

In section III-C, we fit a parabola to three correlation costs , and to get the subpixel disparity . The parabola function contains the information of both subpixel disparity and correlation costs. Since is assumed to be locally smooth, the neighbouring pixels tend to have similar parabola parameters. However, when an abrupt change occurs, they vary significantly and in this case, the condition for uniform smoothness is no longer valid. Therefore, we use function as the label in MRF. By adaptively aggregating functions of the neighbourhood system to , is updated iteratively.

In order to ensure energy minimisation rather than energy maximisation as widely presented in literature, the term is defined as:


has a value of in this paper. Using the same strategy of adaptive aggregation in FBS, we define the smoothness energy as the adaptive sum of negative interpolated parabolas of spatially varying horizontal and vertical nearest neighbours:




The weighting coefficient is determined by both the spatial distance between and and the difference between and . and are two parameters used to control and they are respectively set to and in this paper. If is similar to , the weight for cost aggregation is higher. The energy function with respect to the correlation costs is updated iteratively. The subpixel disparity map is optimised by approximating the minima of the updated energy functions. In this paper, the proposed process is iterated three times, and the result after the third iteration is shown in Fig. 6 (a).

Fig. 6: Disparity map global refinement and post-processing. (a) subpixel disparity map after the third iteration. (b) post-processed disparity map.

V Post-Processing and 3D reconstruction

Fig. 7: Extrinsic rotations. (a) pitch angle . (b) roll angle . (c) yaw angle . is the height of the proposed binocular system.

Due to the fact that the perspective views have been transformed in section II, the estimated subpixel disparities on row should be added to obtain the post-processed disparity map which is illustrated in Fig. 6 (b). Then, the intrinsic and extrinsic parameters of the stereo system are used to compute each 3D point from its projections and , where is equivalent to , and is associated with by disparity .

For many state-of-the-art road model estimation algorithms, the effects caused by the non-zero roll angle (Fig. 7 (b)) are always ignored because the stereo cameras will not change significantly over time [37]. However, the experimental set-up in this paper is installed manually and the roll angle may introduce a distortion on the v-disparity histogram. Therefore, the roll angle needs to be estimated for the initial frame to minimise its impact on the perspective transformation for the rest of the sequences. As in [37], the roll angle can be estimated by fitting a linear plane () to a small patch from the near field in the disparity map and . The pitch angle can be estimated by rearranging Eq. 4 as Eq. 16, where the parameters have been approximated in section II. The yaw angle shown in Fig. 7 (c) is assumed to be .


Each 3D point can be transformed into using Eq. 17 [38]. The rotation matrix is a SO(3) matrix. The rotation with makes pothole detection much easier. The 3D reconstruction of Fig. 3 (a) is illustrated in Fig. 8.



Fig. 8: Road surface 3D reconstruction.

Vi Experimental Results

In this section, we evaluate the performance of our proposed road surface 3D reconstruction algorithm both qualitatively and quantitatively. The algorithm is programmed in C language on an Intel Core i7-4720HQ CPU using a single thread. The following subsections detail the experimental set-up and the performance evaluation.

Vi-a Experimental Set-up

In our experiments, a state-of-the-art stereo camera from ZED Stereolabs is used to capture 1080p videos at 30 fps or 2.2K videos at 15 fps [39]. The baseline is 120 mm. With its ultra sharp six element all-glass dual lenses and 16:9 native sensors, the video is wide-angle and able to cover the scene up to 20 m. An example of the experimental set-up is shown in Fig. 9. The stereo camera is calibrated manually using the stereo calibration toolbox from MATLAB R2017a. The overall calibration mean error in pixels is 0.335.

Fig. 9: Experimental set-up.
Fig. 10: Designed 3D sample models. The unit is millimetre.
Fig. 11: Experimental results. The first and third columns are the input left images. The second and fourth columns are the subpixel disparity map without post-processing.
Sample model Design size (mmmmmm) Actual size (mmmmmm)
Model Groove Model Groove
C n/a n/a
TABLE I: Design size and actual size of the sample models.

To quantify the accuracy of the proposed algorithm, we designed three sample models A, B and C with different sizes. They are printed with a MakerBot Replicator 2 Desktop 3D Printer whose layer resolution is from 0.1 mm to 0.3 mm. Their top views and the stereogram of model A are illustrated in Fig. 10, where A and B are designed with grooves to simulate potholes. To get the ground truth for our experiments, we measured the actual size of these models using an electronic vernier caliper. Both the design and actual sizes of the models are presented in Table I. Since the models are printed with a single colour, resulting in homogeneous areas, we attached them with a piece of paper with the texture of the road surface printed on it to avoid the ambiguities during stereo matching, as can be seen in Fig. 9.

Using the above experimental set-up, we create three datasets (91 stereo image pairs) for the road surface 3D reconstruction. Datasets 1 and 2 aim at road sceneries, and dataset 3 contains the sample models to help researchers qualify their reconstruction results. The datasets are available at:

The following subsections analyse the performance of our algorithm in terms of disparity accuracy, reconstruction accuracy and processing speed.

Vi-B Disparity Evaluation

Fig. 12: Comparison between SRP and PT+SRP in terms of the average of the highest correlation costs.

Some examples of the disparity maps are illustrated in Fig. 11. Before estimating the disparity map, we transform the target image into its reference view, which greatly eliminates the perspective distortion for a GP between two images. Since the GP in the left and right images now looks similar to each other, the average of the highest correlation costs goes higher, which is depicted in Fig. 12. For stereo matching with only SRP, the average of the highest correlation increases gradually from () to (). However, when goes above , keeps decreasing. If we pre-process the input image pairs with the PT, the average of the highest correlation costs in the SRP stereo will grow gradually between and . In this paper, our datasets are created with high-resolution images, and is proposed to be . Compared with the conventional SRP stereo, the PT improves the average correlation cost with an increase of .

Fig. 13: Evaluation of subpixel enhancement and disparity global refinement.
Fig. 14: Experimental results of the KITTI stereo 2012 dataset. The first row shows the left images, where areas in magenta are our manually selected road surface. The second row shows the disparity ground truth. The third row shows the results obtained from the proposed algorithm.
Target Measurement range (mm)
Model A height
Model B height
Model C height
Groove A depth
Groove B depth
TABLE II: 3D reconstruction measurement range.

Furthermore, we select one row from the disparity map to evaluate the performance of subpixel enhancement and global refinement (see Fig. 13). The integer disparity oscillates along the selected row and drops down abruptly when a discontinuity occurs. After the subpixel enhancement, the disparity is replaced with a better one between and . The iterative global refinement further optimises the subpixel disparity map. After the third iteration, the disparities change more smoothly in a continuous area but interrupt suddenly when reaching a discontinuity.

Since the datasets we create only contain the ground truth of 3D reconstruction, the KITTI stereo 2012 dataset [40] is used to further evaluate the disparity accuracy of our algorithm. Some experimental results are illustrated in Fig. 14. Due to the fact that the proposed algorithm only aims at reconstructing the road surface, we select a region of interest (see the magenta areas in the first row) from each image to evaluate the performance of our algorithm. The corresponding disparity results in the region of interest are shown in the third row. The percentage of error pixels (threshold: two pixels) is around and the average error in pixels is about .

Fig. 15: Sample model 3D reconstruction. (a) left image. (b) subpixel disparity map with post-processing. (c) reconstructed scenery. (d) selected 3D point cloud which includes model B.

Vi-C Reconstruction Evaluation

To further evaluate the accuracy of the reconstruction results, we create dataset 3 (see section VI-A for details) with three different sample models. An example of the left image is illustrated in Fig. 15 (a). The corresponding subpixel disparity map and 3D reconstruction are depicted in Fig. 15 (b) and (c), respectively. We select a rectangular region which includes one of the sample models from Fig. 15 (a), and the 3D reconstruction of this region can be seen in Fig. 15 (d). A surface is fitted to four corners , , and of the selected region. Then, we select a set of random points on the surface of the model and estimate the distances between them and the fitted road surface. These random distances provide the measurement range of the model height. Similarly, the groove depth can be estimated by computing the distances between a group of points in a groove and the model surface. Table II details the range of the measured model height and groove depth, where represents the approximated distance from the camera to sample models.

From Table II, the maximal absolute error of the 3D reconstruction is approximately 3 mm, and it increases slightly when increases. The reconstruction precision is inversely proportional to the depth [41]. Furthermore, since the baseline of the ZED camera is fixed and cannot be increased to further improve the precision, we mount it to a relatively low height and it is kept as perpendicular as possible to the road surface to reduce the average depth, which guarantees a high reconstruction accuracy.

Vi-D Processing Speed

Fig. 16: Comparison between SRP and PT+SRP in terms of the runtime.

The algorithm is implemented in C language on an Intel Core i7-4720HQ CPU (2.6 GHz) using a single thread. After the PT, each point on row in the target image is shifted pixels to obtain a reference view, which greatly reduces the search range for stereo matching. The evaluation of the PT with respect to the runtime is illustrated in Fig. 16. The PT accelerates the processing speed of the SRP stereo when using different block sizes. When , the processing speed is increased by over . The runtime of different datasets is shown in Table III. Although the proposed algorithm does not run in real time, the authors believe that its speed can be increased in the future by exploiting the parallel computing architectures.

Dataset Frames Resolution Runtime (s)
Dataset 1 35 0.71
Dataset 2 35 0.84
Dataset 3 21 2.23
TABLE III: Algorithm Runtime.

Vii Conclusion and Future Work

The main novelties of this paper include PT, CMV, and disparity map global refinement. We created three datasets and made them publicly available to contribute to 3D reconstruction-based pothole detection. The PT not only enhances the similarity of a GP between two images but also reduces the search range for stereo matching. This helps the SRP stereo perform more accurately and efficiently. The CMV further offsets the insufficient propagation in the SRP stereo and guarantees the feasibility of parabola interpolation in the subpixel enhancement phase. By iteratively minimising the energy with respect to the interpolated parabolas, the subpixel disparity map is optimised. The disparities in a continuous area become more smooth, but they are preserved when discontinuities occur. The maximal absolute error of the 3D reconstruction is around 3 mm, which satisfies the requirement of millimetre accuracy for on-road damage detection. Furthermore, due to the high precision of the proposed system, users can apply it to road surface SLAM (Simultaneous Localisation and Mapping) for many smart city applications.

However, the propagation strategy in the proposed algorithm makes it difficult to fully exploit the parallel computing architecture of the graphics cards to estimate disparity maps. Therefore, we aim to come up with a more efficient SRP strategy which can be adapted for different platforms. Furthermore, errors in stereo calibration always affect the precision of the stereo matching dramatically. Hence, we aim to design a self-calibration algorithm to enhance the robustness of our proposed stereo vision system, and the reconstructed sceneries will be used for 3D pothole detection.


  • [1] Christian Koch, Kristina Georgieva, Varun Kasireddy, Burcu Akinci, and Paul Fieguth, “A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure,” Advanced Engineering Informatics, vol. 29, no. 2, pp. 196–210, 2015.
  • [2] Taehyeong Kim and Seung-Ki Ryu, “Review and analysis of pothole detection methods,” Journal of Emerging Trends in Computing and Information Sciences, vol. 5, no. 8, pp. 603–608, 2014.
  • [3] BBC News,, Councils in England face huge raod repair bills., Accessed: 2015-01-06.
  • [4] E Schnebele, BF Tanyu, G Cervone, and N Waters, “Review of remote sensing methodologies for pavement management and assessment,” European Transport Research Review, vol. 7, no. 2, pp. 1–19, 2015.
  • [5] L Cruz, L Djalma, and V Luiz, “Kinect and rgbd images: Challenges and applications graphics,” in 2012 25th SIBGRAPI Conference on Patterns and Images Tutorials (SIBGRAPI-T), 2012.
  • [6] Richard Hartley and Andrew Zisserman, Multiple view geometry in computer vision, Cambridge university press, 2003.
  • [7] Firooz Sadjadi and Evan Ribnick, “Passive 3d sensing, and reconstruction using multi-view imaging,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. IEEE, 2010, pp. 68–74.
  • [8] Yuri Boykov, Olga Veksler, and Ramin Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
  • [9] Alexander T Ihler, W Fisher John III, and Alan S Willsky, “Loopy belief propagation: Convergence and effects of message errors,” Journal of Machine Learning Research, vol. 6, no. May, pp. 905–936, 2005.
  • [10] Marshall F Tappen and William T Freeman, “Comparison of graph cuts with belief propagation for stereo, using identical mrf parameters,” in Proceedings Ninth IEEE International Conference on Computer Vision. IEEE, 2003, p. 900.
  • [11] Heiko Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on pattern analysis and machine intelligence, vol. 30, no. 2, pp. 328–341, 2008.
  • [12] Mikhail G Mozerov and Joost van de Weijer, “Accurate stereo matching by two-step energy minimization,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 1153–1163, 2015.
  • [13] Sudipta N Sinha, Daniel Scharstein, and Richard Szeliski, “Efficient high-resolution stereo matching using local plane sweeps,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1582–1589.
  • [14] Michael Bleyer, Christoph Rhemann, and Carsten Rother, “Extracting 3d scene-consistent object proposals and depth from stereo images,” Computer Vision–ECCV 2012, pp. 467–481, 2012.
  • [15] Koichiro Yamaguchi, David McAllester, and Raquel Urtasun, “Efficient joint segmentation, occlusion labeling, stereo and flow estimation,” in European Conference on Computer Vision. Springer, 2014, pp. 756–771.
  • [16] Radim Sara, “Finding the largest unambiguous component of stereo matching,” Computer Vision-ECCV 2002, pp. 900–914, 2002.
  • [17] Radim Sara, R., “Robust correspondence recognition for computer vision,” in Compstat 2006-Proceedings in Computational Statistics, pp. 119–131. Springer, 2006.
  • [18] Jan Cech and Radim Sara, “Efficient sampling of disparity space for fast and accurate matching,” in Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007, pp. 1–8.
  • [19] Robert Spangenberg, Tobias Langner, and Raúl Rojas, “Weighted semi-global matching and center-symmetric census transform for robust driver assistance,” in International Conference on Computer Analysis of Images and Patterns. Springer, 2013, pp. 34–41.
  • [20] Ondrej Miksik, Yousef Amar, Vibhav Vineet, Patrick Pérez, and Philip HS Torr, “Incremental dense multi-modal 3d scene reconstruction,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015, pp. 908–915.
  • [21] Sudeep Pillai, Srikumar Ramalingam, and John J Leonard, “High-performance and tunable stereo reconstruction,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016, pp. 3188–3195.
  • [22] Zhen Zhang, Xiao Ai, and Naim Dahnoun, “Efficient disparity calculation based on stereo vision with ground obstacle assumption,” in 21st European Signal Processing Conference (EUSIPCO 2013). IEEE, 2013, pp. 1–5.
  • [23] Rui Fan and Naim Dahnoun, “Real-time implementation of stereo vision based on optimised normalised cross-correlation and propagated search range on a gpu,” in Imaging Systems and Techniques (IST), 2017 IEEE International Conference on. IEEE, 2017, pp. 1–6.
  • [24] Qingxiong Yang, Liang Wang, Ruigang Yang, Henrik Stewénius, and David Nistér, “Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 492–504, 2009.
  • [25] Asmaa Hosni, Christoph Rhemann, Michael Bleyer, Carsten Rother, and Margrit Gelautz, “Fast cost-volume filtering for visual correspondence and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 504–511, 2013.
  • [26] Zhen Zhang, Xiao Ai, Nishan Canagarajah, and Naim Dahnoun, “Local stereo disparity estimation with novel cost aggregation for sub-pixel accuracy improvement in automotive applications,” in Intelligent Vehicles Symposium (IV), 2012 IEEE. IEEE, 2012, pp. 99–104.
  • [27] Hiroshi Hattori and Atsuto Maki, “Stereo without depth search and metric calibration,” in Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on. IEEE, 2000, vol. 1, pp. 177–184.
  • [28] Hiroaki Nakai, Nobuyuki Takeda, Hiroshi Hattori, Yasukazu Okamoto, and Kazunori Onoguchi, “A practical stereo scheme for obstacle detection in automotive use,” in Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. IEEE, 2004, vol. 3, pp. 346–350.
  • [29] Zhencheng Hu, Francisco Lamosa, and Keiichi Uchimura, “A complete uv-disparity study for stereovision based 3d driving environment analysis,” in Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM’05). IEEE, 2005, pp. 204–211.
  • [30] Stefan Leutenegger, Margarita Chli, and Roland Y Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in 2011 International conference on computer vision. IEEE, 2011, pp. 2548–2555.
  • [31] Sébastien Roy, “Stereo without epipolar lines: A maximum-flow formulation,” International Journal of Computer Vision, vol. 34, no. 2-3, pp. 147–161, 1999.
  • [32] Istvan Haller and Sergiu Nedevschi, “Design of interpolation functions for subpixel-accuracy stereo-vision systems,” IEEE Transactions on image processing, vol. 21, no. 2, pp. 889–898, 2012.
  • [33] Andrew Blake, Pushmeet Kohli, and Carsten Rother, Markov random fields for vision and image processing, Mit Press, 2011.
  • [34] Stan Z Li, Markov random field modeling in computer vision, Springer Science & Business Media, 2012.
  • [35] Carlo Tomasi and Roberto Manduchi, “Bilateral filtering for gray and color images,” in Computer Vision, 1998. Sixth International Conference on. IEEE, 1998, pp. 839–846.
  • [36] Richard Szeliski, Ramin Zabih, Daniel Scharstein, Olga Veksler, Vladimir Kolmogorov, Aseem Agarwala, Marshall Tappen, and Carsten Rother, “A comparative study of energy minimization methods for markov random fields with smoothness-based priors,” IEEE transactions on pattern analysis and machine intelligence, vol. 30, no. 6, pp. 1068–1080, 2008.
  • [37] Umar Ozgunalp, Rui Fan, Xiao Ai, and Naim Dahnoun, “Multiple lane detection algorithm based on novel dense vanishing point estimation,” IEEE Transactions on Intelligent Transportation Systems, vol. PP, pp. 1–12, 2016.
  • [38] Gregory G Slabaugh, “Computing euler angles from a rotation matrix,” Retrieved on August, vol. 6, no. 2000, pp. 39–63, 1999.
  • [39] STEREOLABS,, STEREOLABS PRODUCTS, Accessed: 2017-05-29.
  • [40] A Andreas, Philip Lenz, and Raquel Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012.
  • [41] David F Llorca, Miguel A Sotelo, Ignacio Parra, Manuel Ocaña, and Luis M Bergasa, “Error analysis in a stereo vision-based pedestrian detection sensor for collision avoidance applications,” Sensors, vol. 10, no. 4, pp. 3741–3758, 2010.

Rui Fan received his B.Sc. degree in control science and engineering from the Harbin Institute of Technology in 2015. He is currently working towards his Ph.D. degree with the Visual Information Laboratory at the University of Bristol. His research interests include multi-view geometry, real-time depth measurement, high performance computing and automotive applications, e.g., lane detection, obstacle detection, pothole detection and visual tracking. He is currently a student member of IEEE.

Xiao Ai received his B.Sc. degree in electrical and electronics engineering from University of Bristol, U.K., in 2007 and he received the Ph.D. degree in electrical and electronics engineering from University of Bristol, U.K., in 2012. He is a Postdoctoral Researcher with University of Bristol, Bristol, U.K. His Ph.D. specialised in 3-D imaging techniques and applications. His current research interests include embedded real-time signal processing, optoelectronics for spaceborne remote sensing, and automotive obstacle detection applications. He also has extensive experience in machine vision.

Naim Dahnoun received his Ph.D. degree in biomedical engineering from the University of Leicester, Leicester, U.K., in 1990. He was with the Leicester Royal Infirmary as a Researcher on blood flow measurements for femoral bypass grafts and then with University of Leicester as a Lecturer in digital signal processing (DSP). In 1993, he started new research in optical communication at the University of Manchester Institute of Science and Technology, Manchester, U.K., on wideband optical communication links before joining the Department of Electrical and Electronic Engineering, University of Bristol, Bristol, U.K., in 1994, where he is a Reader in Learning and Teaching of DSP. His main research interests include real-time digital signal processing applied to biomedical engineering, video surveillance, automotive, and optics. In 2003, in recognition of the important role played by universities in educating engineers in new technologies such as real-time DSP, Texas Instruments (NYSE:TXN) presented the first Texas Instruments DSP Educator Award to Dr. Dahnoun for his outstanding contributions to furthering education in DSP technology.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description