An Effective Multi-Cue Positioning System for Agricultural Robotics

An Effective Multi-Cue Positioning System for Agricultural Robotics


The self-localization capability is a crucial component for Unmanned Ground Vehicles (UGV) in farming applications. Approaches based solely on visual information or on low-cost GPS are easily prone to fail in such scenarios. In this paper, we present a robust and accurate 3D global pose estimation framework, designed to take full advantage of heterogeneous sensory data. By modeling the pose estimation problem as a pose graph optimization, our approach simultaneously mitigates the cumulative drift introduced by motion estimation systems (wheel odometry, visual odometry, …), and the noise introduced by the raw GPS readings. Along with a suitable motion model, our system also integrates two additional types of constraints: (i) a Digital Elevation Model (DEM) and (ii) a Markov Random Field (MRF) assumption. We demonstrate how using these additional cues substantially reduces the error along the altitude axis, and this benefit spreads to the other components of the state. We report exhaustive experiments combining several sensor setups, showing improvements from 37% to 76% of the localization accuracy with respect to using only the GPS. We also show that our approach provides accurate results even if the GPS temporarily change positioning mode. We released our C++ open-source implementation and two challenging datasets with their relative ground truth.


ParameterParameters \SetKwInOutFunctionFunctions

1 Introduction

It is commonly believed that the exploitation of autonomous robots in agriculture represents one of the applications with the greatest impact on food security, sustainability, reduction of chemical treatments, and minimization of the human effort. In this context, an accurate global pose estimation system is an essential component for an effective farming robot in order to successfully accomplish several tasks: (i) navigation and path planning; (ii) autonomous ground intervention, e.g, selective spraying or mechanical removal of accurately detected weeds; (iii) acquisition of relevant semantic information, such as weed distribution and positions, fertilizer demand, crop health and global field status. However, self-localization inside an agricultural environment is a complex task: the scene is rather homogeneous, visually repetitive and often poor of distinguishable reference points. For this reason, conventional landmark based localization approaches can easily fail. Currently, most systems rely on high-end Real-Time Kinematic Global Positioning Systems1 (RTK-GPSs2) to localize the UGV on the field with high accuracy [1, 2]. Unfortunately, such RTK-GPSs are typically very expensive and, moreover, they require at least one nearby geo-localized ground station to work properly. On the other side, consumer-grade GPSs usually provide noisy data, thus not guaranteeing enough accuracy and reliability for safe and effective operations. Moreover, a GPS cannot provide the full state estimation of the vehicle, i.e. its attitude, that is an essential information to perform a full 3D reconstruction of the environment.

Figure 1: (Left) The Bosch BoniRob farm robot used in the experiments; (Center) Example of a trajectory (Dataset B, see Sec. 4) optimized by using our system: the optimized pose graph can be then used, for example, to stitch together the images acquired from a downward looking camera; (Right) The obtained trajectory (red solid line) with respect to the trajectory obtained using only the raw GPS readings (blue solid line). Both trajectories have been over-imposed on the actual field used during the acquisition campaign.

In this paper we present a robust and accurate 3D global pose estimation system for UGVs (Unmanned Ground Vehicles) designed to address the specific challenges of an agricultural environment. Our system effectively fuses several heterogeneous cues extracted from low-cost, consumer grade sensors, by leveraging the strengths of each sensor and the specific characteristics of the agricultural context. We cast the global localization problem as a pose graph optimization problem (Sec. 2): the constraints between consecutive nodes are represented by motion estimations provided by the UGV wheel odometry, local point-cloud registration, and a visual odometry (VO) front-end that provides a full 6D ego-motion estimation with small cumulative drift3. Noisy but drift-free GPS readings (i.e., the GPS pose solution), along with a pitch and roll estimation extracted by using a MEMS Inertial Measurement Units (IMU), are directly integrated as priors of the nodes. Driven by the fact that both GPS and visual odometry provide poor estimates along the -axis, i.e. the axis parallel to the gravity vector, we propose to improve the state estimation by introducing two additional altitude constraints:

  1. An altitude prior, provided by a Digital Elevation Model (DEM);

  2. A smoothness constraint for the altitude of adjacent nodes4.

Both the newly introduced constraints are justified by the assumption that in an agricultural field the altitude varies slowly, i.e. the terrain can be approximated by piece-wise smooth surfaces. The smoothness constraints exploit the fact that a farming robot traverses the field by following the crop rows, hence, by using the Markov assumption, the built pose graph can be arranged as a Markov Random Field. The motion of the UGV is finally constrained using an Ackermann motion model extended to the non-planar motion case. The integration of such constraints not only improves the accuracy of the altitude estimation, but it also positively affects the estimate of the remaining state components, i.e. and (see Sec. 4).
The optimization problem (Sec. 3) is then iteratively solved by exploiting a graph based optimization framework [3] in a sliding-window fashion (Sec. 3.3), i.e., optimizing the sub-graphs associated to the most recent sensor readings. The sliding-window optimization allows to obtain on-line global localization results that approximate the results achievable by an off-line optimization over the whole dataset.
In order to validate our approach (Sec. 4), we used and make publicly available with this paper two novel challenging datasets acquired using a Bosch BoniRob UGV (Fig. 1, left) equipped with, among several others calibrated sensors, two types of low-cost GNSSs: a Precise Point Positioning (PPP) GPS and a consumer grade RTK-GPS. We report exhaustive experiments with several sensors setups, showing remarkable results: the global localization accuracy has been improved up to percentages of 37% and 76% compared with the raw localization obtained by using only the raw RTK-GPS and PPP-GPS readings, respectively (e.g., Fig. 1). We also show that our approach enables to accurately localize the UGV even if the GPS changes positioning mode for some time (e.g., from float to DGPS mode).
All the datasets and an open-source C++ implementation of the proposed system are freely available for download at: .

1.1 Related Work

The problem of global pose estimation for UGVs has been intensively studied in the past, especially with regard to the localization of self driving vehicles and outdoor autonomous robots moving in an urban scenario. All these systems rely on GPS readings, often coupled with position priors estimated, e.g., from IMUs [4], cameras [5, 6, 7, 8, 9], or LIDARs [10]. Moreover, a given map can be used to constrain the robot motion (e.g. 2D road maps [11] or Digital Elevation Models [12]). The sensors fusion is usually based on parametric [11] or discrete [10] filtering, pose graph optimization [7, 8], set-membership positioning [12], or hybrid topological/filtering [5].
Localization in an agricultural setting, in some respects, is even more challenging: most of the approaches presented above can’t be directly applied to such scenario. The environment is poorly or no structured, with a very strong sensor aliasing. Actually, most of the available farming robots currently mainly rely on high-end RTK-GPSs to localize themselves on the field [13, 14, 1, 2] with high accuracy. Some interesting attempts to remove the need for an almost perfect GPS use vision based crop rows detection and tracking during navigation [15].

1.2 Contributions

In this paper, we take advantage of several contributions to provide a robust, unified multi-cue positioning framework for agricultural applications that aims to achieve high level accuracy with low cost GPSs. We locally rely on visual odometry as in [5], depth information as in [10] and inertial information as in [4]. We exploit maps priors as in [12] and defined motion priors as in [15]. We infer the full 6D position of the robot solving an on-line graph based problem as in [7], proposing a Markov Random Field assumption to constrain the pose graph. We evaluate our system with exhaustive experiments that highlight the contribution of each employed cue. We also provide an open-source implementation of our code and two challenging datasets acquired with the Bosch BoniRob farm robot with relative ground truth.

2 Multi-Cue Pose Graph

The challenges that must be addressed to design a robust and accurate global pose estimation framework for farming applications are twofold: (i) the agricultural environment appearance, being usually homogeneous and visually repetitive; (ii) The high number of cues that have to be fused together. In this section we describe how we formulate a robot pose estimation procedure able to face both these issues.

Figure 2: Illustration of an edge connecting two nodes and . The error is computed between the measurement and the predicted measurement . In addition, each edge encodes an information matrix that represents the uncertainty of the measurement.

The proposed system handles the global pose estimation problem as a pose graph optimization problem. A pose graph is a special case of factor graph5 where the factors are only connected to variables (i.e., nodes) pairs, and variables are only represented by robot poses. For this reason, it is common to represent each factor with an edge. Solving a factor graph means finding a configuration of the nodes for which the likelihood of the actual measurements is maximal. Since we assume that all the involved noises follow a Gaussian distribution, we can solve this problem by employing an iterative least square approach.
We define as the vector of graph nodes that represents the robot poses at discrete points in time, where each is represented by the full 3D pose in terms of a translation vector and, using the axis-angle representation, an orientation vector , both in . This pose is defined with respect to a global reference centered in 6. We define as the sensor measurements that can be related to pairs or single nodes. Let be a relative motion measurement between nodes and , while be a global pose measurement associated to the node . Additionally, let and represent the information matrices encoding the reliability of such measurements, respectively.

From the poses of two nodes and , it is possible to compute the expected relative motion measurement and the expected global measurement (see Fig. 2). We formulate the errors between those quantities as:


Thus, for a general sensor providing a relative information, we can characterize an edge (i.e., a binary factor ) by the error and the information matrix of the measurement, as described in [16]. In other words, an edge represents the relative pose constraint between two nodes (Fig. 2). In order to take into account also global pose information, we use unary constraints, namely a measurement that constraints a single node. Hence, for a general sensor providing an absolute information, we define as the prior edge (i.e., an unary factor) induced by the sensor on node . Fig. 3 depicts a portion of a pose graph highlighting both unary and binary edges. Each edge acts as an directed spring with elasticity inversely proportional to the relative information matrix associated with the measurement that generates the link.
Our pose graph is built by adding an edge for each sensor reading, for both relative (e.g., wheel odometry readings) and global (e.g., GPS readings) information. In addition, we propose to integrate other prior information that exploit both the specific target environment and the assumptions we made. In the following, we report the full list of edges exploited in this work divided between local (relative) and global measurements (we report in brackets the acronyms used in Fig. 3 and described in the following of this section):

  • Local measurements:

    • Wheel odometry measurements (WO)

    • Visual odometry estimations (VO)

    • Elevation constraints between adjacent nodes (MRF)

    • Ackermann motion model (AMM)

    • Point-clouds local registration (LID)

  • Global measurements:

    • GPS readings (GPS)

    • Digital Elevation Model data (DEM)

    • IMU readings (IMU)

We define as the relative constraint induced by a visual odometry algorithm, as the relative constraint induced by the wheel odometry, and as the relative constraint obtained by aligning the local point-clouds perceived by the 3D LIDAR sensor.

Figure 3: Overview of the built pose graph. Solid arrows represent graph edges, that encode conditional dependencies between nodes. For the sake of clarity, we show here only the edges directly connected with the node , by representing only one instance for each class of edge: (i) the binary non directed MRF constraint as ; (ii) the binary directed edge induced from sensor ; (iii) the unary edge induced by sensor . We superimposed the graph on a cultivated field to remark the relationship between the graph structure and the crop rows arrangement in an agricultural environment: dotted arrows represent temporal relationships between consecutive nodes.

Oftentimes, GPS and visual odometry provide a poor estimates of the robot position along the -axis (i.e, the axis that represents its elevation). In the GPS case, this low accuracy is mainly due to the Dilution of Precision, multipath or atmospheric disturbances, while in the visual odometry this is due to the 3D locations of the tracked points. In particular, in a typical agricultural scenario most of the visual features belong to the ground plane. Hence, the small displacement of the features along the z-axis may cause a considerable drift.

On the other hand, agricultural fields usually present locally flat ground levels and, moreover, a farming robot usually traverses the field by following the crop rows. Driven by these observations, we enforce the local ground smoothness assumption by introducing an additional type of local constraints that penalizes the distance along the -coordinate between adjacent nodes. Therefore, the built pose graph can be augmented by a 4-connected Markov Random Field (MRF) [17]: each node is conditionally connected with the previous and the next node in the current crop row, and with the nearest nodes that belong to the previous and next crop rows, respectively. We refer to this constraint as (e.g., the set in Fig. 3).

We then add a further type of local constraint based on a standard motion model, namely the Ackermann steering model. Actually, it assumes that the robot is moving on an ideal plane, that is not our case. In this work, we exploit this model by assuming local planar motion between temporal adjacent nodes. Such a motion plane is updated with the attitude estimation for the subsequent node. We integrate this constraint by means of a new type of edge, namely .

The aforementioned local constraints are intrinsically affected by a small cumulative drift: to overcome this problem, we integrate drift-free global measurements in the graph as position prior information. In particular, we define a GPS prior and an IMU prior with the associated information matrices, respectively and .

Finally, we introduce an additional global measurement by means of an altitude prior, provided by a Digital Elevation Model. A DEM is a special type of DTM (Digital Terrain Model) that represents the elevation of the terrain at some location, by means of a regularly spaced grid of elevation points [18]. The DEM maps a 2D coordinate to an absolute elevation. Since we assume that the altitude varies slowly, we can use the current position estimate (i.e., the and components) to query the DEM for a reliable altitude estimation , with associated information matrix .
The cost function is then assembled as follows:


where and represent respectively the set of binary and unary constraints defined above (see Fig. 3), and stands for the 4-connected neighborhood of the node .

3 Pose Graph Optimization

In this section, we focus on the solution of the cost function reported in Eq. 2, describing the errors computation, the employed weighting factors assignment procedure and the on-line and off-line versions of the optimization. We finally report some key implementation insights that let the framework reach the desired level of accuracy.

3.1 Error Computation

For each measurement , given the current graph configuration, we need to provide a in order to compute errors in Eq. 2 that represents the expected measurement given a configuration of the nodes which are involved in the constraint. Usually, for a binary constraint, this prediction is the relative transformation between the nodes and , while for unary constraint is just the full state or a subset of its components. We define as a general homogeneous transformation matrix related to the full state of the node (e.g., the homogeneous rigid body transformation generated from and ) and as a generic mapping function from to a vector, we can express and as:


In this work not all the constraints belong to : indeed, most of used sensors (e.g., WO, IMU, etc…) can only observe a portion of the full state encoded in . Therefore, in the following, we will show how we obtain the expected for each involved cue (for some components, we omit the subscripts and by using the relative translations and rotations between adjacent nodes):

VO and LID: these front-ends provide the full 6D robot motion. Hence we build and by computing the relative transformation between the two connected nodes as in Eq. 3a;

WO: the robot odometry provides only the planar motion by means of a roto-translation . Therefore we build as , where the subscript used after specifies that the map to the vector involves only such components;

MRF and DEM: both measurements constraint only the altitude of the robot, we obtain the estimated measurements as:


GPS: this sensor only provides the robot position:


IMU: from this measurement we actually exploit only the roll and pitch angles, being the rotation around the axis provided by the IMU affected by a cumulative drift. Therefore, we obtain ;

AMM: we formulate such a constraint by a composition of two transformation matrices. The first one encodes a roto-translation of the robot around the so called Instantaneous Center of Rotation (ICR). We follow the same formulation presented in [19]:


where is the norm of the translation along and . Additionally, we add a further rotation along those two axis, taking also into account the ground slope by rotating the ideal plane on which the vehicle steers by following the Ackermann motion model:


Hence, we obtain as .

3.2 Dynamic Weights Assignment

The impact of each constraint in the final cost function (Eq. 2) is weighted by its relative information matrix. As a consequence, such information matrices play a crucial role in weighting the measurements, i.e. giving much reliability to a noisy sensor can lead to errors in the optimization phase. We tackle this problem by dynamically assigning the information matrix for each component as follows:

WO: we use as information matrix the inverse of the covariance matrix of the robot odometry scaled by the magnitude of the distance and rotation traveled between the nodes and , as explained in [20];

VO: we use the inverse of the covariance matrix provided as output by the visual odometry front-end, weighting the rotational and translational sub-matrices ( and ) with two scalars and , experimentally tuned;

MRF: we set the information matrix . The weight is proportional to the distance in the plane between the two nodes, while has been experimentally tuned;

GPS: we use as information matrix the inverse of the covariance matrix provided as output by the GPS sensor;

AMM: we use as information matrix an identity matrix scaled by the magnitude of the traveled distance between the nodes and , similarly to the wheel odometry constraint. This allows to model the reliability of such a constraint as inversely proportional to the traveled distance;

IMU: we use as information matrix the inverse of the covariance matrix provided by the IMU front-end;

DEM: we set the information matrix , where is empirically tuned;

LID: From the full state provided by the Point-cloud registration front-end, in this work we exploit only the components , , and (i.e., roll, pitch and elevation). This is mainly due to the lack of geometric structure in a farming scenario that can’t provided reliable constraints along the other components. Hence, we set information matrix as , where all the parameters are empirically tuned.

3.3 Sliding-Window Optimization

A re-optimization of the whole pose graph presented above, every time a new node is added, cannot guarantee the real-time performances required for on-line field operations, especially when the graph contains a large amount of nodes and constraints. We solve this issue by employing a sliding-window approach, namely performing the optimization procedure only on a sub-graph that includes a sequence of recent nodes. More specifically, each time a new node associated with the most recent sensor readings is added to the graph, the sub-graph is updated by adding the new node and removing its oldest one, in a sliding-window fashion. The optimization process is performed only on the current sub-graph, while older nodes maintain the state assigned during the last optimization where they were involved. In order to preserve the MRF constraints, the size of the sub-graph is automatically computed so that any adjacent nodes in the previous row are included (see Fig. 4).A global optimization of the whole pose graph is then carried out off-line using as initial guess the node states assigned on-line using the sliding-window approach.

Figure 4: Sliding-window sub-graph optimization: nodes that belong to the current sub-graph are painted in red, old nodes no more optimized are painted in blue, while nodes that will be added in future are painted in white.

3.4 Implementation Details

Temporal Synchronization: In the previous sections we tacitly assumed that measurements associated with a node (e.g., WO and GPS readings) share the same time stamp. Usually, in realistic contexts, this might not happen since measurements arrive at different time. We solve this issue by interpolating the wheel odometry motion estimations, being the sensor with the highest frame-rate, to the time stamp of the other sensors.
Visual Odometry Failures: Another central challenge is introduced by the VO failures. We mitigate this problem by introducing a consistency check that exploits the local reliability of the wheel odometry: when the difference between WO and VO is grater than a given threshold, we assume a failure of the latter. In this case, in the optimization, we decrease the influence of the VO by scaling down its information matrix.
Point-cloud registration: Point-clouds acquired by a 3D LIDAR are typically too sparse to perform a robust alignment: thus, we accumulate different LIDAR readings into a single point-cloud by using the motion estimations provided by the visual odometry. The point-clouds registration is finally performed using the Iterative Closest Point (ICP) algorithm.
Graph Optimization: We perform both the on-line and off-line pose graph optimization (Sec. 3.3) using the Levenberg-Marquardt algorithm implemented in the g2o graph optimization framework [3].

4 Experiments

In order to analyze the performance of our system, we collected two datasets7 with different UGV steering modalities. In Dataset A the robot follows 6 adjacent crop rows by constantly maintaining the same global orientation, e.g. half rows has been traversed by moving the robot backward, while in Dataset B the robot is constantly moving in the forward direction.

Both datasets include data from the same set of sensors: (i) wheel odometry; (ii) VI-Sensor device (stereo camera + IMU) [21]; (iii) Velodyne VLP-16 3D LIDAR; (iv) a low cost U-blox RTK-GPS; (v) an U-blox Precise Point Positioning (PPP) GPS. We acquire the ground truth reference 3D location (, and ) by using a LEICA laser tracker. Both datasets have been acquired by using the Bosch BoniRob farm robot (Fig. 1, left) on a field in Eschikon, Switzerland (Fig. 1, right). In addition to these two datasets, we have created a third dataset, namely Dataset C, where we simulate a sudden RTK-GPS signal loss, e.g. due to a communication loss between the base and the rover stations. In particular, we simulated the accuracy losses by randomly switching for some time to the PPP-GPS readings.
In the following, we report the quantitative results by using the following statistics build upon the localization errors with respect to the ground truth reference: Root Mean Square Error ( in the tables), maximum and mean absolute error ( and ), and mean absolute error along each component (, and ).

4.1 Visual Odometry Evaluation

DatasetA DatasetB
ORBSLAM2 [22] 0.116 0.091 0.171 0.139
SPTAM [23] 0.085 0.061 0.168 0.152
DSO [24] (*) 0.071 0.068 0.124 0.109
SVO2 [25] (*) 0.129 0.104 0.229 0.205
Table 1: Error statistics in Dataset A and Dataset B by using different Open-Source VO systems. (*): monocular system.

We selected the VO subsystem on the basis of an exhaustive performance evaluation, over our datasets, of four open-source implementation of modern visual odometry front-ends ([22, 23, 24, 25]). In Tab. 1 we report the results of the evaluation, where we compensated the cumulative drift of monocular systems by estimating the scale from the wheel odometry front-end. Unlike the subsequent experiments, the and errors refer here to the relative translational error, computed for each meter of trajectory. The best performing system was DSO [24], the one we then used in all our experiments.

4.2 Dataset A and Dataset B

DatasetA DatasetB










PPP 0.349 0.582 1.577 2.959 1.710 0.306 0.501 1.484 2.875 1.621
0.311 0.520 1.537 2.954 1.630 0.246 0.416 1.424 2.829 1.504
0.242 0.421 0.682 1.558 0.978 0.228 0.368 1.321 2.453 1.289
0.236 0.421 0.651 1.401 0.865 0.225 0.362 1.289 2.569 1.240
0.244 0.413 0.523 1.399 0.728 0.224 0.365 1.022 2.380 1.120
0.223 0.414 0.558 1.369 0.719 0.196 0.342 0.870 2.033 0.943
0.228 0.381 0.528 1.276 0.724 0.232 0.395 0.678 1.649 0.818
0.235 0.353 0.517 1.284 0.749 0.228 0.365 0.642 1.432 0.759
0.224 0.369 0.454 1.138 0.616 0.218 0.358 0.621 1.602 0.732
0.235 0.359 0.441 1.113 0.566 0.198 0.361 0.478 1.153 0.657
0.235 0.348 0.308 0.935 0.424 0.196 0.353 0.467 1.198 0.617
0.205 0.328 0.298 0.913 0.417 0.181 0.342 0.363 1.158 0.476
0.199 0.330 0.296 0.865 0.405 0.172 0.337 0.313 1.149 0.455
0.257 0.413 0.345 0.987 0.551 0.292 0.427 0.457 1.301 0.622
RTK 0.059 0.051 0.121 0.431 0.128 0.054 0.062 0.091 0.322 0.122
0.053 0.042 0.105 0.431 0.125 0.049 0.058 0.086 0.321 0.119
0.053 0.042 0.055 0.281 0.087 0.047 0.048 0.063 0.195 0.092
0.048 0.048 0.061 0.298 0.091 0.045 0.046 0.064 0.203 0.091
0.046 0.047 0.061 0.271 0.090 0.045 0.045 0.064 0.200 0.090
0.046 0.047 0.061 0.284 0.089 0.045 0.045 0.061 0.190 0.089
0.046 0.049 0.056 0.251 0.088 0.045 0.046 0.039 0.162 0.075
0.046 0.049 0.034 0.253 0.075 0.045 0.046 0.034 0.154 0.073
0.051 0.050 0.069 0.313 0.098 0.048 0.047 0.065 0.223 0.097
0.047 0.048 0.034 0.258 0.075 0.045 0.046 0.033 0.154 0.072
0.052 0.055 0.041 0.272 0.084 0.050 0.051 0.037 0.169 0.083
Table 2: Error statistics in Dataset A and Dataset B by using different sensor setups and constraints for the global, off-line and the sliding-window (SW), on-line pose graph optimization procedures.

0.49 {subfigure}0.49 {subfigure}0.49 {subfigure}0.49

Figure 5: Dataset A, PPP-GPS: (Top) Qualitative top view comparison between the raw GPS trajectory (left) and the optimized trajectory (right); (Bottom): absolute (left) and (right) error plots for the same trajectories.
Figure 6: Dataset A, RTK-GPS: Absolute error plots for the raw GPS trajectory and the optimized trajectory obtained by using the best sensors configuration (see Tab. 2).

This set of experiments is designed to demonstrate the effectiveness of the proposed method, and the benefits introduced by each cue within the final estimation. We report in Tab. 2 the results obtained by using different sensor combinations and optimization procedures over Dataset A and Dataset B. The table is split according to the type of GPS sensor used, the best results are highlighted in bold. A first remarkable result is the decreasing error trend, almost monotonic: the more sensors we introduced in the optimization process, the smaller the resulting and errors are. This behavior occurs in both Dataset A and Dataset B, and proves how the proposed method properly handles all the available sources of information. Another important outcome is the relative improvement obtained between the worst and the best set of cues, which is around the 37% for RTK case, and 76% for the PPP case. A noteworthy decreasing also happens to the error statistic, respectively, 40% and 70%: this fact brings a considerable benefit to agricultural applications, where spikes in the location error might lead to crushing crops. For the best performing sensor setup, we also report the results obtained by using the sliding-window, on-line pose graph optimization procedure (Sec. 3.3): also in this case the relative improvement is remarkable (32% and 67%, respectively) enabling a safer and more accurate real-time UGV navigation.
Fig. 5 (Top) depicts a qualitative top view comparison between the raw PPP-GPS trajectory (Top-left) and the trajectory (Top-right) obtained after the pose graph optimization using the best sensors configuration (see Table 2) in Dataset A. The error plots (Bottom) show how the introduction of additional sensors and constraints allows to significantly improve the pose estimation. Similar qualitative results for Dataset B are reported in Fig. 1. Fig. 6 reports the error plot of the RTK-GPS trajectory versus the trajectory estimated by our system with the best sensors ensemble.
For both GPSs, the maximal error reduction happens when all the available cues are used within the optimization procedure except for the low cost RTK-GPS case, where the ELEV constraints worsens the error along the axis. This phenomenon happens since a RTK-GPS provides, for most of the time, an accurate estimate along all the translational axes: in this case, a DEM model does not provide any useful information to the optimization procedure. Another important aspect to highlight is how the addition of constraints that only act on a part of the state, also positively affect the remaining state components. This especially happens above all with PPP-GPS, that brings a large amount of uncertainty within the optimization. In this case, adding the attitude information (e.g. IMU measurements) involves also a small error reduction along the translational state components. Similarly, as shown in Tab. 2, the addition of the LIDAR and the ELEV constraints leads to an error reduction also along the and axis. As a further qualitative evaluation, in Fig. 7, we report the point-cloud obtained by rendering LIDAR scans at each estimated position with and without the IMU contribution within the optimization procedure: the attitude estimation greatly benefits from this contribution. The runtime of our system are reported in Tab. 3, for both the off-line and on-line, sliding-window cases.


.49 {subfigure}.49

Figure 7: Comparison between output point-clouds: (left) no IMU and (right) with IMU in the optimization.
Dataset A 786 8259 24 13.493
98 763 4 0.0989
Dataset B 754 8032 22 12.993
104 851 5 0.1131
Table 3: Runtime performance for the global, off-line and the sliding-window (SW), on-line pose graph optimization (Core-i7 2.7 GHz laptop).

4.3 Dataset C










RTK+PPP 1.313 0.647
1.291 0.613
1.268 0.567
1.180 0.435
0.892 0.359
0.556 0.227
0.649 0.201
0.505 0.200
0.537 0.182
0.411 0.166
Table 4: Error statistics in Dataset C by using different sensor setups and constraints in the optimization procedure.

0.47 {subfigure}0.51

Figure 8: (Left) Top view of the optimized trajectory obtained by using the best sensors configuration (see Tab. 4): In red we highlight the part of the trajectory affected by the signal loss (simulated by using the PPP-GPS output), in blue the part where the RTK-GPS is working regularly; (Right) Absolute error plots for the raw GPS trajectory and the optimized trajectory.

The second set of experiments is designed to prove the robustness of the proposed system against sudden losses in the GPS sensor accuracy. In Tab. 4 we report the quantitative results of our system over Dataset C by means of and errors. Even in presence of a RTK-GPS signal loss that lasts for more than one crop row (see Fig. 8 (Left)), the best sensors setup leads to a remarkable of 0.166 m and a relative improvement around the 72%. Moreover, also in Dataset C the and error statistics follow the same decreasing trend shown in Tab. 2.
Fig. 8 (Left) depicts a top view of the optimized trajectory obtained using the best sensors configuration (see Tab. 4), highlighting the part of the path where the accuracy loss occurs. In Fig. 8 (Right) we compare the absolute error trajectories for the best sensors configuration against the error trajectory obtained by using only the GPS measurements: the part where the signal loss occurs turns into a higher error. Another interesting observation regards the inconsistent effects related to the use of the ELEV constraint. As shown in Tab. 4, in some cases it allows to decrease the overall error, while in others it considerably worsens the estimate. The latter mostly happens when the pose estimation is fairly reliable (e.g. when most of the available constraints are already in use). Indeed, as explained in section 4.2, in such cases the ELEV constraint does not provide any additional information to the optimization procedure, while with a less accurate GPS (e.g., the PPP-GPS used in our experiments) its employment is certainly desirable.

5 Conclusion

In this paper, we present an effective global pose estimation system for agricultural applications that leverages in a reliable and efficient way an ensemble of cues. We take advantage from the specificity of the scenario by introducing new constraints exploited inside a pose-graph realization that aims to enhance the strengths of each integrated information. We report a comprehensive set of experiments that support our claims: the provided localization accuracy is remarkable, the accuracy improvement well scale with the number of integrated cues, the proposed system is able to work effectively with different types of GPS, even in presence of signal degradations. The open-source implementation of our system along with the acquired datasets are made publicly available with this paper.


We are grateful to Wolfram Burgard for providing us with the Bosch BoniRob, and to Raghav Khanna and Frank Liebisch to help us in acquiring the datasets.


  1. High-end RTK-GPSs typically provide 1-2 cm horizontal accuracy and 3-5 cm vertical accuracy.
  2. In this paper, we use GPS as a synonym of the more general acronym GNSS (Global Navigation Satellite System) since almost all GNSSs use at least the GPS system, included the two GNSSs used in our experiments.
  3. In visual odometry, open-loop systems, the cumulative drift is unavoidable.
  4. We use here the term adjacent to denote nodes that are temporally or spatially close.
  5. A factor graph is a bipartite graph where nodes encode either variables or measurements, namely the factors.
  6. We transform each global measurement (e.g., GPS measurements) in the reference frame .
  7. Publicly available at:


  1. U. Weiss and P. Biber, “Plant detection and mapping for agricultural robots using a 3D lidar sensor,” Robotics and autonomous systems, vol. 59, no. 5, pp. 265–273, 2011.
  2. M. Nørremark, H. W. Griepentrog, J. Nielsen, and H. T. Søgaard, “The development and assessment of the accuracy of an autonomous gps-based system for intra-row mechanical weed control in row crops,” Biosystems Engineering, vol. 101, no. 4, pp. 396–410, 2008.
  3. R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, “g2o: A general framework for graph optimization,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA).   IEEE, 2011.
  4. J. Farrell, Aided Navigation: GPS with High Rate Sensors.   McGraw-Hill, Inc., 2008.
  5. D. Schleicher, L. M. Bergasa, M. Ocana, R. Barea, and M. E. Lopez, “Real-time hierarchical outdoor SLAM based on stereovision and gps fusion,” IEEE Transactions on Intelligent Transportation Systems, vol. 10, no. 3, pp. 440 – 452, 2009.
  6. I. Parra, M. Ãngel Sotelo, D. F. Llorca, C. Fernández, A. Llamazares, N. Hernández, and I. García, “Visual odometry and map fusion for GPS navigation assistance,” in Proc. of the IEEE International Symposium on Industrial Electronics (ISIE).   IEEE, 2011.
  7. J. Rehder, K. Gupta, S. T. Nuske, and S. Singh, “Global pose estimation with limited GPS and long range visual odometry,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2012.
  8. K. Vishal, C. V. Jawahar, and V. Chari, “Accurate localization by fusing images and gps signals,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015.
  9. M. Schreiber, H. Königshof, A.-M. Hellmund, and C. Stiller, “Vehicle localization with tightly coupled GNSS and visual odometry,” in Proc. of the IEEE Intelligent Vehicles Symposium (IV), 2016.
  10. R. Kümmerle, M. Ruhnke, B. Steder, C. Stachniss, and W. Burgard, “A navigation system for robots operating in crowded urban environments,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2013.
  11. C. Fouque and P. Bonnifait, “On the use of 2D navigable maps for enhancing ground vehicle localization,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2009.
  12. V. Drevelle and P. Bonnifait, “A set-membership approach for high integrity height-aided satellite positioning,” GPS Solutions, vol. 15, no. 4, pp. 357–368, October 2011.
  13. A. Stoll and H. D. Kutzbach, “Guidance of a forage harvester with GPS,” Precision Agriculture, vol. 2, no. 3, pp. 281–291, 2000.
  14. B. Thuilot, C. Cariou, L. Cordesses, and P. Martinet, “Automatic guidance of a farm tractor along curved paths, using a unique CP-DGPS,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS).   IEEE, 2001.
  15. A. English, P. Ross, D. Ball, and P. Corke, “Vision based guidance for robot navigation in agriculture,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2014.
  16. G. Grisetti, R. Kuemmerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based SLAM,” Intelligent Transportation Systems Magazine, IEEE, vol. 2, no. 4, pp. 31–43, 2010.
  17. A. Blake, P. Kohli, and C. Rother, Markov Random Fields for Vision and Image Processing.   MIT Press, 2011.
  18. C. Hirt, Digital Terrain Models.   Springer International Publishing, 2014, pp. 1–6.
  19. A. Pretto, E. Menegatti, and E. Pagello, “Omnidirectional dense large-scale mapping and navigation based on meaningful triangulation,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2011.
  20. S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Robotics and Autonomous Agents).   The MIT Press, 2005.
  21. J. Nikolic, J. Rehder, M. Burri, P. Gohl, S. Leutenegger, P. T. Furgale, and R. Siegwart, “A synchronized visual-inertial sensor system with fpga pre-processing for accurate real-time slam,” in Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 2014.
  22. R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras,” arXiv preprint arXiv:1610.06475, 2016.
  23. T. Pire, T. Fischer, J. Civera, P. De Cristóforis, and J. Jacobo berlles, “Stereo Parallel Tracking and Mapping for robot localization,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2015.
  24. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” in arXiv:1607.02565, July 2016.
  25. C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza, “Svo: Semidirect visual odometry for monocular and multicamera systems,” IEEE Transactions on Robotics, vol. 33, no. 2, pp. 249–265, 2017.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description