Simultaneous drone localisation and wind turbine model fitting
during autonomous surface inspection
Abstract
We present a method for simultaneous localisation and wind turbine model fitting for a drone performing an automated surface inspection. We use a skeletal parameterisation of the turbine that can be easily integrated into a nonlinear least squares optimiser, combined with a pose graph representation of the drone’s 3D trajectory, allowing us to optimise both sets of parameters simultaneously. Given images from an onboard camera, we use a CNN to infer projections of the skeletal model, enabling correspondence constraints to be established through a cost function. This is then coupled with GPS/IMU measurements taken at key frames in the graph to allow successive optimisation as the drone navigates around the turbine. We present two variants of the cost function, one based on traditional 2D point correspondences and the other on direct image interpolation within the inferred projections. Results from experiments on simulated and realworld data show that simultaneous optimisation provides improvements to localisation over only optimising the pose and that combined use of both cost functions proves most effective.
I Introduction
Over the operational lifetime of a wind turbine, it will likely incur a wide range of structural damage [1] due to adverse weather and events such a lightning strikes and bird collisions. For operational efficiency, this damage needs to be detected and fixed as soon as possible [2]. This currently relies on manual inspections using climbing equipment, or the use of telephoto lenses from the ground. From the danger incurred during manual inspection, to the inability to get full coverage of the turbine surface using ground based cameras, it is evident that both methods have downsides.
One way of addressing the problem is the use of unmanned autonomous vehicles (UAVs) – or drones – to perform inspections [3]. This provides a number of benefits. It is safe, as no human personnel are required to climb the turbine. Also, as it is able to navigate freely around the turbine, it is able to obtain a complete view of the surface, including from above. Another significant benefit is the abilitiy to perform consistent and repeatable inspections.
Key to the drone’s inspection performance is its ability to navigate around the turbine. To enable this, it needs an accurate estimate of its 3D pose w.r.t the turbine, as well as an accurate estimate of the turbine’s structure and configuration. Having these allows faster and safer navigation, more consistent inspections, and provides important information needed for postprocessing operations such a image stitching and automatic default detection. Although the drone has access to a global positioning system (GPS) and inertial measurement units (IMUs) to aid in localisation, it has been shown that incorporating image measurements can improve accuracy significantly [4, 5].
We present a method that is able to simultaneously optimise the drone’s pose, as well as fit a model of the turbine during inspection. These two processes are complementary, in that the estimation of the turbine model is needed for accurate localisation, and localisation is required for fitting the model. The joint optimisation is achieved using a skeletal parameterisation of the turbine structure, accompanied with a set of functions that are able to instantiate the model. These functions, along with a pose graph representation of the drone’s localisation are integrated into a nonlinear least squares optimiser. We also present a novel cost function based on direct image interpolation that is better suited to this application than the typical cost based on 2D corresponding points. The work builds on our localisation method previously described in [5].
There are two main contributions. The first is the integration of the turbine model fitting process into the nonlinear least squares optimiser. The skeletal representation and set of functions were carefully chosen so that they could be easily integrated into the optimiser as well as providing a model that fully describes the main structures of the turbine and is sufficiently general so as to be used with a range of turbine configurations and types. The second contribution is the use of an interpolationbased cost measure, which proves to be effective for optimising over line structures in the skeletal model and complements the point based approach described in [5].
In Section II we highlight some of the relevant literature related to this work. In Section III, we give a detailed description of the proposed method. This includes the parameterisation of the model and accompanying functions in Section IIIA, the production of the image measurements using a CNN in Section IIIB, and the joint optimisation of the pose and model parameters in Section IIIC. In Section IV, we show our evaluation of the method. This was done using both simulated data in Section IVA as well as real data in Section IVB. In Section V we present some conclusions as well as potential avenues for future work.
Ii Previous Work
Likely due to the relative novelty of the application and the difficulty in acquiring data, there is little published research on the autonomous inspection of wind turbines using drones. Moreover, the works that tackle this problem have a number of important omissions. In [6], a method is presented for the prediction of the configuration of the wind turbine during an initial approach of the drone. They use a combination of the Hough Transform to extract edges and estimate the state of the turbine and a Kalman Filter to keep track of the drone’s position. One problem with this work is that it only addresses the initial approach of the drone, and does not attempt to estimate the parameters during the remaining – and most important – part of the inspection. In [7], the authors suggest the use of LiDAR sensors to both estimate the state of the turbine and the position of the drone. The turbine model is represented as a voxel grid which is filled in using a Bayesian update scheme during the inspection process. The grid is then used for path planning. One drawback of this work is that it is only applied in simulation. Furthermore, the authors make no attempt at the simultaneous localisation of the drone.
Although not directly related to wind turbine inspection, there are a number of methods that attempt simultaneous localisation and model fitting. These works are mainly focussed on the use of prior knowledge to improve the performance of simultaneous localisation and mapping (SLAM). In [8], a method using RGBD data is proposed which makes use of a partial scene model that can constrain the optimiser, improving performance when using noisy and inaccurate depth data. Similarly, the work by Loesch et. al. [9], describes the integration of known objects into a SLAM process. Using an edgletbased feature description of the object, the method is able to constraint the SLAM process and reduce absolute drift in the localisation estimates.
Research into articulated object pose detection is relevant to the model fitting part of our method. Pauwels et. al. [10] propose an Iterative Closest Point (ICP) method that is tailored to articulated models. This is achieved using a set of constraints on the objects which restrict the possible movements between object parts based on the current view. Constraints are also used in [11], in the form of a kinematic chain. This allows the authors to fully localise multi part objects by estimating only the parameters of the chain rather than the full 6DoF poses of the parts. To aid in this they make use of a random forest to predict object part probabilities and object coordinates. Katz et. al. [12] also make use of a kinematic chain, extracting and tracking visual features as a robot is interacting with articulated objects. Using clustering, tracked features are grouped into sets that represent different parts of the articulated object. From these, trajectories are established which define the types of constraints needed by different object parts.
The work presented here is an extension of the localisation method described in [5], which optimises the pose of the drone using a fixed skeletal turbine model. It used a convolutional neural network (CNN) to infer the projection of the model in images captured onboard, which are then integrated into a pose graph optimisation framework. We follow a similar approach in this work, except that we simultaneously estimate the turbine model parameters, producing improved localisation results and providing greater flexibility in the use of the method. In the following, we omit some of the detail regarding the CNN due to space constraints and refer readers to [5] for a fuller description.
Iii Method
The joint turbine model fitting and pose estimation process is performed throughout the duration of an inspection flight. We model the problem as a nonlinear least squares optimisation over the set of model parameters and pose parameters. In Section IIIA we detail the parameterisation of the wind turbine model, and describe the functions that instantiate the model. In Section IIIB we describe our CNNbased model projection estimation method. The projection estimates allow us to easily find correspondences between the model and image features. In Section IIIC we describe the optimisation process, detailing the cost function and how it is optimised.
Iiia Turbine Model Parameterisation
There are two aims determining the selection of the parameterisation of the wind turbine model. The first is that it should be as minimal as possible to simplify the optimisation process. The second is that the parameterisation must be complete, in that it fully describes the structure of the turbine. This enables the drone’s navigation system to make path planning decisions based on it. The parameterisation we have chosen is based on the following values:

The 2D location of the turbine’s base on the plane:

The height of the turbine’s tower:

The heading of the turbine relative to the drone’s coordinate system:

The length of the turbine nacelle which joins the top of the turbine tower with the blade centre:

The rotation angle of the turbine blades:

The length of the turbine blades:
Using the parameterisation above , a number of different turbine models can be instantiated. During the optimisation process we make use an instantiation based on points and an instantiation based on lines .
IiiA1 Turbine Point Model
The point model consists of a set of six points with . Each point represents a key location on the wind turbine structure. These are the turbine base , the top of the turbine tower , the centre of the turbine blades , and the three tips of the blades where . To instantiate a point model given a set of parameters , we make use of a “default” model . This “default” model consists of the full set of model points , initialised to the following values
(1) 
Note that in the “default” model, the tower is of unit height, the blades are of unit length, and the heading is oriented along the axis. We now define the set of functions that take the points defined in and the turbine parameters to create an instance of the turbine point model model
(2) 
The function that computes the location of the point at the base of the turbine is defined as
(3) 
where maps into . To compute the location of the point at the top of the turbine tower , we use
(4) 
where scales the input point along the axis. The point at the centre of the blades is computed using
(5) 
where scales the input point along the axis, and is the rotation matrix that rotates the scaled points around the axis by an angle of . The turbine blade tips are computed in the following manner
(6) 
where scales the length of the blades along the axis. The rotation is the matrix that rotates the blades around the axis by . For the three blade tips, the values for each are . These values produce three blade tips, rotated from each other. A diagram showing an instantiated point model can be seen in Figure 2.
IiiA2 Turbine Line Model
The second type of model we use is a line model . These lines are defined by their endpoints, which are taken from the previously defined point model
(7) 
The line connects the base of the tower to the top of the tower. The line , connects the top of the tower and the centre of the blades. The lines connect the centre of the blades with their tips. To instantiate an instance of given a set of parameters , we use the set of functions defined in (2), applying them to each of the line endpoints. This gives us a set of lines running along the centre of the main structural pieces of the wind turbine. An example of can be seen in the connecting lines in Figure 2.
The reason for using a line model in combination with a point model is that during the inspection, there a numerous instances when only a small section of the wind turbine is in view. As these models are used to constrain our optimiser, if we were to rely solely on a , very few constraints would be added. By using in combination with , as long as some of the turbine structure is in view of the drone, constraints can be established.
IiiB Turbine Projection Estimation using CNNs
To enable the nonlinear least squares optimisation, we need to be able to establish correspondences between model points, and their 2D locations on images of the wind turbine. Due to the wide variation in the visual appearance of wind turbines, accurately establishing correspondences using the raw image data is difficult. To address this we use a CNN to infer the projection of our skeletal model in the images captured by the drone, i.e. the projection of and into the camera coordinates, from the drone’s current view of the turbine. This results in a set of projection images . Important to note are the superscripts referring to the type of model, the subscript associating the images with a vertex in the pose graph (see Section IIIC) and the second subscript referring to the type of structure within the model. The set of projection images are used to establish correspondences between the models, and locations in the images. Example outputs of the CNN can be seen in Figure 3. The network is trained using manually labelled images of wind turbines. Readers are referred to [5] for a full description of the CNN and training process.
IiiC Optimisation
The parameter estimation problem is solved using a nonlinear least squares optimisation over a pose graph and a set of turbine parameters . The graph vertices , where is the total number of vertices in the graph, each contain a full 6DoF pose , with the orientation represented by a quaternion and a translation . The edges connect the vertices sequentially , mirroring the sequential addition of new vertices to the graph during inspection. The total set of parameters that we are optimising is then . We define the cost function to be optimised as
(8) 
The two unary terms measure the accuracy of the current parameter estimates based on the image data using points from and points from . The pairwise term regularises the cost function by penalising poses that differ from pose measurements obtained by the GPS/IMU.
IiiC1 Unary Cost Function Term
Both unary terms in (8) are represented by the same function. We use the superscript to designate that one is computed using the points from and one computed using the points from . We will use the notation to equal either or . In this section we present two different forms of ; one based on 2D point correspondences following the work in [5] , and a new method based on direct evaluation of the pixel values in the projection images . Necessary for both these formulations of the unary term is the following projection function
(9) 
Typical in monocular tracking applications, this function allows us to transform a point from the world frame into camera image coordinates using the transformation and held in a vertex , and which represents the camera intrinsics. The first version of the unary term we present is based on 2D point correspondences and is formulated as
(10) 
The value is the 2D correspondence to the model point , on the projection image . This term transforms the model point into the world frame using the functions , and then projects it into the image using (9). One important process is establishing these correspondences prior to optimising. Depending on the type of model, correspondences are found in different ways.
The first uses the point model . We take each of the current estimates of the poses in the graph and the current estimate of the parameters and project each of the model points into the image.
(11) 
We then search the image pixels in around and find the pixel with the largest value. This pixel, if the value is above a threshold is selected as . This process is illustrated in Figure 4.
The second method using the line model is somewhat more involved. The reason for using the line model in the optimiser is that as the drone is often close to the turbine during the inspection, the points in are not always visible. Using allows us to establish correspondences inbetween the line end points. To do this we split each of the lines in into points , where and use these to establish correspondences.
Similar to the points in , for each split point , we find the 2D projection using (11). Using the projected location, we then perform a perpendicular line search from . The pixel along the line in with the highest value above a threshold is then chosen as . This process is illustrated in Figure 4. For a more detail on the process of establishing correspondences, please see [5].
The second formulation of the unary cost function is based on directly sampling the pixel values in using the projection of the model points into the image and interpolating the values during optimisation. The output of the CNN can be viewed as a heat map, with pixel values ranging between . The higher the pixel value, the great the expectation that the corresponding turbine feature lies at that location. We take this as a probability, and optimising the cost function becomes a process of finding the parameters that project the point model to the locations on the image with the highest probability, corresponding to the minimum of the sum of the negative log over all points, i.e.
(12) 
The function uses bicubic interpolation to evaluate the pixel value under the projected point using Cubic Hermite Splines. Importantly, this type of interpolation is differentiable, allowing us to integrate the function into the nonlinear optimiser.
In this application, there are a number of benefits of using the direct image interpolation method over the 2D correspondence method . It means we don’t need to establish correspondences prior to optimisation, reducing the complexity of the process. More importantly, the interpolation method allows the model point projections to move freely in image space during optimisation. This is especially important for the points from . If we use the 2D point correspondences as defined in , the optimiser is discouraged from using update steps that will move line points in the same direction as the corresponding line in the projection image . This is unwanted, as the cost incurred by a line point should be equal anywhere along the length of the corresponding line in .
IiiC2 Pairwise Cost Function Term
The second term of the cost function is used to constrain the pose parameters during optimisation such that they don’t differ too far from the original estimates of the drone’s pose measured using GPS/IMU. To enable this we compute the relative poses in the following way
(13) 
where is the rotation matrix formed from the quaternion rotation . Using relative measurements as constraints rather than absolute measurements is beneficial in that it allows the optimised poses to drift over time – correcting for drift in the GPS/IMU – while harshly penalising abrupt differences between successive key frames. The full pairwise term is then defined as
(14) 
where ‘Vec’ corresponds to the vector component of the quaternion. The matrix is an information matrix, which encodes the certainty about the relative pose measurements obtained by the GPS/IMU. Due to limitations on our system, we don’t have access to this data during the flight. Instead we fix this matrix using a set of reasonable values. Further details can be found in [5].
Iv Evaluation
To evaluate our work we present two types of experiments. The first in Section IVA relies on simulated data, allowing us to empirically evaluate the performance of our method against a known ground truth. The second set in Section IVB applies our work to real wind turbine inspection data. As this data does not have a known ground truth, we only provide a qualitative evaluation of the results.
Iva Simulated Data
To provide a qualitative evaluation of our method we applied it to a set of simulated inspection data. To generate the synthetic data, we extracted pose graphs and turbine parameter sets from a number of real inspection flights which we use as pseudo ground truths. This provides us with a representative set of data for evaluation. For each of these ground truths we applied varying levels of noise to the data. For the pose graphs, we sequentially added noise to the position and orientations, allowing us to simulate the sort of drift one might expect from using GPS/IMU readings during a flight. For the turbine parameters we added error from a Gaussian distribution to each of the values. Due to the different scales for the parameters in , the values for each parameter were different. Using this data we evaluated a number of different aspects of our method. An example of the output can be seen in Figure 5.
IvA1 Unary cost function comparison
The first aspect we evaluated was the difference in performance of the two unary cost terms we defined in Section IIIC. These are the one based on 2D point correspondences and the one based on direct image interpolation . As we have two different sets of model points that these can be applied to – the points from and the points from – we compared a total of 4 different configurations. We applied each of these configurations to our set of synthetic data and recorded the mean error in position and orientation prior to optimisation, and the same values after optimisation. The results of the experiments can be seen in Figure 6. As the plots show, the method that uses for and the for performs best for both position and orientation. The method that performs the worse is when this configuration is reversed. This performance is expected due to the factors highlighted in Section IIIC. As the points in correspond to actual 2D correspondences in the image, using the point correspondence cost makes sense. As the points in should contribute a similar amount to the cost at any location along the line of the corresponding turbine feature, the more relaxed function is appropriate. Given these results, for the remainder of the experiments we use for and for .
IvA2 Combined Optimisation Comparison
The second aspect of our method we evaluated was the benefit of simultaneously optimising both the pose and turbine parameters over optimising just the pose. To do this we applied our method to the set of synthetic data twice; once where both sets of parameters were optimised, once where only the pose was optimised. We then evaluated the error of both methods. The results of this experiment can be seen in Figure 7. As these figures show, when we are optimising both sets of parameters, the method is significantly improved over optimising the pose alone. The reason for this is that errors in the turbine parameters will propagate into errors in the pose of the drone, if the model parameters are fixed during optimisation. The most obvious instance of this is when the value – which sets the rotation of the blades – is wrong. During optimisation, this will propagate into error in the roll as the drone seeks to align the incorrect model with the image data. We can see this in Figure 9.
IvA3 Recovering the Turbine Parameters
The final set of synthetic evaluations we performed was to establish the ability of the method to optimise the parameters of the turbine model. To do this we applied the set of functions to the ground truth parameters, the initial parameters (those with noise added) and the optimised parameters. We then computed the mean distance error across all the points in for the initial model and the optimised model and plotted them against each other. The results of this can be seen in Figure 8. We can see that as expected, when there is more error in the initial turbine parameters, the optimiser is less able to recover to the true pose. We do see however, that the method is able to recover around half the error introduced into the initial parameters.
IvB Real Data
To give an example of the performance of our method we applied it to some recordings of real inspection flights. For this data, the initial pose estimates are measured using GPS/IMU readings and the turbine model parameters were initialised by eye. In Figure 9, we give an example of the benefit of optimising both model and pose parameters simultaneously. As we can see, when there is error in blade rotation value , if we are not able to correct for that error during the flight, the optimiser will be encouraged to correct for it by incorrectly adding roll to the orientation.
In Figure 10, we show a series of images showing the reprojection of the turbine model. We can see that our method is able to correct for a large proportion of the error in the initial estimate of the turbine and pose.
V Conclusions
In this work we have presented a method for the simultaneous optimisation of a drone’s pose as well the parameters of a wind turbine model during an inspection flight. We present a set of functions that allow us to integrate the model parameter fitting into a nonlinear least squares optimiser. In addition we also present a new cost function term which is better suited to the image data than the traditional corresponding points method. In our evaluations, we have compared the two types of unary cost terms, shown the benefit of the simultaneous optimisation using both simulated and real data, and measured how effective our method is at correcting errors in the model parameters.
There are a number of avenues open for future work. One of these is the incorporation of extra sensor measurements into the optimisation. This could take the form of depth data from a LiDAR or from stereo cameras. We also aim to properly integrate the GPS/IMU measurements using a visual inertial odometry method.
Acknowledgements
The authors acknowledge the support of Innovate UK (project number 104067). The work described herein is the subject of UK patent applications GB1815864.2 and GB1902475.1.
References
 [1] J. Ribrant and L. M. Bertling, “Survey of failures in wind power systems with focus on Swedish wind power plants during 19972005,” IEEE Transactions on Energy Conversion, vol. 22, no. 1, pp. 167–173, mar 2007.
 [2] F. P. García Márquez, A. M. Tobias, J. M. Pinar Pérez, and M. Papaelias, “Condition monitoring of wind turbines: Techniques and methods,” Renewable Energy, vol. 46, pp. 169–178, oct 2012.
 [3] L. Wang and Z. Zhang, “Automatic Detection of Wind Turbine Blade Surface Cracks Based on UAVTaken Images,” IEEE Transactions on Industrial Electronics, vol. 64, no. 9, pp. 7293–7309, sep 2017.
 [4] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframebased visualinertial odometry using nonlinear optimization,” International Journal of Robotics Research, vol. 34, no. 3, pp. 314–334, mar 2015.
 [5] O. MoolanFeroze, K. Karachalios, D. N. Nikolaidis, and A. Calway, “Improving drone localisation around wind turbines using monocular modelbased tracking,” feb 2019.
 [6] M. Stokkeland, K. Klausen, and T. A. Johansen, “Autonomous visual navigation of Unmanned Aerial Vehicle for wind turbine inspection,” in 2015 International Conference on Unmanned Aircraft Systems, ICUAS 2015. IEEE, jun 2015, pp. 998–1007.
 [7] B. E. Schäfer, D. Picchi, T. Engelhardt, and D. Abel, “Multicopter unmanned aerial vehicle for automated inspection of wind turbines,” in 24th Mediterranean Conference on Control and Automation, MED 2016. IEEE, jun 2016, pp. 244–249.
 [8] K. Melbouci, S. N. Collette, V. GayBellile, O. AitAider, and M. Dhome, “Model based RGBD SLAM,” in Proceedings  International Conference on Image Processing, ICIP, vol. 2016Augus. IEEE, sep 2016, pp. 2618–2622.
 [9] A. Loesch, S. Bourgeois, V. GayBellile, O. Gomez, and M. Dhome, “Localization of 3D objects using modelconstrained SLAM,” Machine Vision and Applications, vol. 29, no. 7, pp. 1041–1068, oct 2018.
 [10] K. Pauwels, L. Rubio, and E. Ros, “Realtime modelbased articulated object pose detection and tracking with variable rigidity constraints,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, jun 2014, pp. 3994–4001.
 [11] F. Michel, S. Gumhold, E. Brachmann, M. Y. Yang, C. Rother, and A. Krull, “Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression,” in Procedings of the British Machine Vision Conference 2015. British Machine Vision Association, 2015, pp. 181.1–181.11.
 [12] D. Katz, M. Kazemi, J. Andrew Bagnell, and A. Stentz, “Interactive segmentation, tracking, and kinematic modeling of unknown 3D articulated objects,” in Proceedings  IEEE International Conference on Robotics and Automation. IEEE, may 2013, pp. 5003–5010.
 [13] D. W. Marquardt, “An Algorithm for LeastSquares Estimation of Nonlinear Parameters,” vol. 11, no. 2, pp. 431–441, jun 2005.
 [14] S. Agarwal, K. Mierle, and Others, “Ceres solver,” http://ceressolver.org.