Denoising IMU Gyroscopes with Deep Learning for OpenLoop Attitude Estimation
Abstract
This paper proposes a learning method for denoising gyroscopes of Inertial Measurement Units (IMUs) using ground truth data, to estimate in real time the orientation (attitude) of a robot in dead reckoning. The obtained algorithm outperforms the stateoftheart on the (unseen) test sequences. The obtained performances are achieved thanks to a well chosen model, a proper loss function for orientation increments, and through the identification of key points when training with highfrequency inertial data. Our approach builds upon a neural network based on dilated convolutions, without requiring any recurrent neural network. We demonstrate how efficient our strategy is for 3D attitude estimation on the EuRoC and TUMVI datasets. Interestingly, we observe our dead reckoning algorithm manages to beat topranked visualinertial odometry systems in terms of attitude estimation although it does not use vision sensors. We believe this paper offers new perspectives for visualinertial localization and constitutes a step toward more efficient learning methods involving IMUs. Our opensource implementation is available at https://github.com/mbrossar/denoiseimugyro.
I Introduction
Inertial Measurement Units (IMUs) consist of gyroscopes that measure angular velocities i.e. the rate of change of the sensor’s orientation, and accelerometers that measure proper accelerations [kokUsing2017]. IMUs allow estimating a robot’s trajectory relative to its starting position, a task called odometry [scaramuzzaVisualInertial2019].
Small and cheap IMUs are ubiquitous in smartphones, industrial and robotics applications, but suffer from difficulties to estimate sources of error, such as axismisalignment, scale factors and timevarying offsets [rehderExtending2016, rohacCalibration2015]. Hence, IMU signals are not only noisy, but they are biased. In the present paper, we propose to leverage deep learning for denoising the gyroscopes (gyros) of an IMU, that is, reduce noise and biases. As a byproduct, we obtain accurate attitude (i.e. orientation) estimates simply by openloop integration of the obtained noisefree gyro measurements.
Ia Links and Differences with Existing Literature
IMUs are generally coupled with complementary sensors to obtain robust pose estimates in sensorfusion systems [forsterOnManifold2017], where the supplementary information is provided by either cameras in VisualInertial Odometry (VIO) [genevaOpenVINS2019, qinVINSMono2018, svachaInertial2019], LiDAR, GNSS, or may step from side information about the model [brossardRINSW2019, brossardAIIMU2019, madgwickEstimation2011, solinInertial2018]. To obtain accurate pose estimates, a proper IMU calibration is required, see e.g. the widely used Kalibr library [furgaleUnified2013, rehderExtending2016], which computes offline the underlying IMU intrinsic and extrinsic calibration parameters. Our approach, which is recapped in Figure 1, is applicable to any system equipped with an IMU. It estimates offline the IMU calibration parameters and extends methods such as [furgaleUnified2013, rehderExtending2016] to timevarying and difficult to model signal corrections.
Machine learning (more specifically deep learning) has been recently leveraged to perform LiDAR, visualinertial, and purely inertial localization, where methods are divided into supervised [chenIONet2018, clarkVINet2017, yanRIDI2018, esfahaniOriNet2020] and unsupervised [almaliogluSelfVIO2019]. Most works extract relevant features in the sensors’ signals which are propagated in time through recurrent neural networks, whereas [yanRoNIN2019] proposes convolutional neural networks for pedestrian inertial navigation. A related approach [nobreLearning2019] applies reinforcement learning for guiding the user to properly calibrate visualinertial rigs. Our method is supervised (we require ground truth poses for training), leverages convolutions rather than recurrent architectures, and outperforms the latter approach. We obtain significantly better results while requiring considerably less data and less time. Finallly, the reference [esfahaniOriNet2020] estimates orientation with an IMU and recurrent neural networks, but our approach proves simpler.
IB Contributions
Our main contributions are as follows:

detailed modelling of the problem of learning orientation increments from lowcost IMUs;

the convolutional neural network which regresses gyro corrections and whose features are carefully selected;

the training procedure which involves a trackable loss function for estimating relative orientation increments;

the approach evaluation on datasets acquired by a drone and a handheld device [burriEuRoC2016, schubertTUM2018], where our method outperforms [esfahaniOriNet2020] and competes with VIO methods [genevaOpenVINS2019, qinVINSMono2018] although it does not use vision;

perspectives towards more efficient VIO and IMU based learning methods;

a publicly available opensourced code, where training is done in 5 minutes per dataset.
Ii Kinematic & LowCost IMU Models
We detail in this section our model.
Iia Kinematic Model based on Orientation Increments
The 3D orientation of a rigid platform is obtained by integrating orientation increments, that is, gyro outputs of an IMU, through
(1) 
where the rotation matrix at timestamp maps the IMU frame to the global frame, the angular velocity is averaged during , and with the exponential map. The model (1) successively integrates in openloop and propagates estimation errors. Indeed, let denotes an estimate of . The error present in is propagated in through (1).
IiB LowCost Inertial Measurement Unit (IMU) Sensor Model
The IMU provides noisy and biased measurements of angular rate and specific acceleration at high frequency ( in our experiments) as, see [rohacCalibration2015, rehderExtending2016],
(2) 
where are quasiconstant biases, are commonly assumed zeromean white Gaussian noises, and
(3) 
is the acceleration in the IMU frame without the effects of gravity , with the IMU velocity expressed in the global frame. The intrinsic calibration matrix
(4) 
contains the information for correcting signals: axis misalignments (matrices , ); scale factors (diagonal matrices , ); and linear accelerations on gyro measurements, a.k.a. gsensitivity (matrix ). Remaining intrinsic parameters, e.g. levelarm between gyro and accelerometer, can be found in [rohacCalibration2015, rehderExtending2016].
We now make three remarks regarding (1)(4):

equations (2)(4) represent a model that approximates reality. Indeed, calibration parameters and biases should both depend on time as they vary with temperature and stress [kokUsing2017, rohacCalibration2015], but are difficult to estimate in realtime. Then, vibrations and platform excitations due to, e.g., rotors make Gaussian noise colored in practice [luIMUBased2019], albeit commonly assumed white;

substituting actual measurements in place of true value in (1) leads generally to quick drift (in a few seconds) and poor orientation estimates;
Iii Learning Method for Denoising the IMU
We describe in this section our approach for regression of noisefree gyro increments in (2) in order to obtain accurate orientation estimates by integration of in (1). Our goal thus boils down to estimating , , and correcting the misknown .
Iiia Proposed Gyro Correction Model
Leveraging the analysis of Section II, we compute the noisefree increments as
(5) 
with the intrinsic parameters that account for gyro axismisalignment and scale factors, and where the gyro bias is included in the gyro correction . Explicitly considering the small accelerometer influence , see (2)(4), does not affect the results so it is ignored.
CNN layer #  1  2  3  4  5 
kernel dim.  7  7  7  7  1 
dilatation gap  1  4  16  64  1 
channel dim.  16  32  64  128  1 
We now search to compute and . The neural network described in Section IIIB computes by leveraging information present in a past local window of size around . In contrast, we let be static parameters initialized at and optimized during training since each considered dataset uses one IMU. The learning problem involving a time varying and/or multiple IMUs is let for future works.
The consequences of opting for the simple model (5) and the proposed network structure are as follows. First, the corrected gyro may be initialized on the original gyro, i.e. with and before training. This way, the method improves the estimates as soon as the first training epoch. Then, our method is intrinsically robust to overfitting as measurements outside the local windows, i.e. whose timestamps are less than or greater than , see Figure 2, do not participate in infering . This allows us to train the method with 8 or less minutes of data, see Section IVA.
IiiB Dilated Convolutional Neural Network Structure
We define here the neural network structure which infers the gyro correction as
(6) 
where is the function defined by the neural network. The network should extract information at temporal multiscales and compute smooth corrections. Note that, the input of the network consists of IMU data, that is, gyros naturally, but also accelerometers signals. Indeed, from (3), if the velocity varies slowly between successive increments we have
(7) 
which also provides information about angular velocity.
We leverage dilated convolutions that infer a correction based on a local window of previous measurements, which represents of information before timestamp in our experiments. Dilated convolutions are a method based on convolutions applied to input with defined dilatation gap, see [yuMultiScale2016], which: ) supports exponential expansion of the receptive field, i.e., , without loss of resolution or coverage; ) is computationally efficient with few memory consumption; and ) maintains the temporal ordering of data. We thus expect the network to detect and correct various features such as rotor vibrations that are not modeled in (2). Our configuration given in Figure 2 requires learning parameters, which is extremely low and contrasts with recent (visual)inertial learning methods, see e.g. [almaliogluSelfVIO2019] Figure 2, where IMU processing only requires more than parameters.
IiiC Loss Function based on Integrated Gyro Increments
Defining a loss function directly based on errors requires having a ground truth at IMU frequency (), which is not feasible in practice as the best tracking systems are accurate at . In place, we suggest defining a loss based on the following integrated increments
(8) 
i.e., where the IMU frequency is reduced by a factor . We then compute the loss for a given as
(9) 
where is the logarithm map, and is the Huber loss function. We set in our experiments the Huber loss parameter to 0.005, and define our loss function as
(10) 
The motivations for (9)(10) are as follows:

the choice of Huber loss yields robustness to ground truth outliers;

the choice of (10) corresponds to error increments at and , which is barely slower than ground truth. Setting too high a , or in the extreme case using a loss based on the overall orientation error , would make the algorithm prone to overfitting, and hence makes the method too sensitive to specific trajectory patterns of training data.
dataset  sequence  VINS  VINSMono  Open  OpenVINS  zero  raw  OriNet*  calibrated IMU  proposed 

Mono [qinVINSMono2018]  (loopclosure)  VINS [genevaOpenVINS2019]  (proposed)  motion  IMU  [esfahaniOriNet2020]  (proposed)  IMU  
MH 02 easy  1.34/1.32  0.57/0.50  1.11/1.05  1.21/1.12  44.4/43.7  146/130  5.12/–  7.09/1.49  1.39/0.85  
MH 04 difficult  1.44/1.40  1.06/1.00  1.60/1.16  1.40/0.89  42.3/41.9  130/77.9  7.77/–  5.64/2.53  1.40/0.25  
EuRoC  V1 01 easy  0.97/0.90  0.57/0.44  0.80/0.67  0.80/0.67  114/76  71.3/71.2  5.01/–  6.65/3.95  1.13/0.49 
[burriEuRoC2016]  V1 03 difficult  4.72/4.68  4.06/4.00  2.32/2.27  2.25/2.20  81.4/80.5  119/84.9  13.2/–  3.56/2.04  2.70/0.96 
V2 02 medium  2.58/2.41  1.83/1.61  1.85/1.61  1.81/1.57  93.9/93.5  117/86  9.59/–  4.63/2.30  3.85/2.25  
average  2.21/2.14  1.62/1.52  1.55/1.37  1.50/1.30  66.1/66.1  125/89.0  7.70/–  5.51/2.46  2.10/0.96  
room 2  0.60/0.45  0.69/0.50  2.47/2.36  1.95/1.84  91.8/90.4  118/88.1  –/–  10.6/10.5  1.31/1.18  
TUMVI  room 4  0.76/0.63  0.66/0.51  0.97/0.88  0.93/0.83  107/103  74.1/48.2  –/–  2.43/2.30  1.48/0.85 
[schubertTUM2018]  room 6  0.58/0.38  0.54/0.33  0.63/0.51  0.60/0.51  138/131  94.0/76.1  –/–  4.39/4.31  1.04/0.57 
average  0.66/0.49  0.63/0.45  1.33/1.25  1.12/1.05  112/108  95.7/70.8  –/–  5.82/5.72  1.28/0.82 
IiiD Efficient Computation of (8)(10)
First, note that thanks to parallelization applying e.g., , to one or parallelly to many takes similar execution time on a GPU (we found experimentally that one operation takes whereas 10 million operations in parallel take , that is, the time per operation drops to ). We call an operation that is parallelly applied to many instances a batch operation. That said, an apparent drawback of (8) is to require matrix multiplications, i.e. operations. However, first, we may compute ground truth only once, store it, and then we only need to compute multiple times. Second, by viewing (8) as a tree of matrix multiplications, see Figure 3, we reduce the computation to batch GPU operations only. We finally apply subsampling and take one every timestamps to avoid counting multiple times the same increment. Training speed is thus increased by a factor .
IiiE Training with Data Augmentation
Data augmentation is a way to significantly increase the diversity of data available for training without actually collecting new data, to avoid overfitting. This may be done for the IMU model of Section II by adding Gaussian noise , adding static bias , uncalibrating the IMU, and shifting the orientation of the IMU in the accelerometer measurement. The two first points were noted in [esfahaniOriNet2020], whereas the two latter are to the best of our knowledge novel.
Although each point may increase the diversity of data, we found they do not necessarily improve the results. We opted for addition of a Gaussian noise (only), during each training epoch, whose standard deviation is the half the standard deviation that the dataset provides ().
Iv Experiments
We evaluate the method in term of 3D orientation and yaw estimates, as the latter are more critical regarding longterm odometry estimation [delmericoBenchmark2018, scaramuzzaVisualInertial2019].
Iva Dataset Descriptions
We divide data into training, validation, and test sets, defined as follows, see Chapter I.5.3 of [goodfellowDeep2016]. We optimize the neural network and calibration parameters on the training set. Validation set intervenes when training is over, and provides a biased evaluation, as the validation set is used for training (data are seen, although never used for “learning”). The test set is the gold standard to provide an unbiased evaluation. It is only used once training (using the training and validation sets) is terminated. The datasets we use are as follows.
EuRoC
the dataset [burriEuRoC2016] contains image and inertial data at from a micro aerial vehicle divided into 11 flight trajectories of 23 minutes in two environments. The ADIS16448 IMU is uncalibrated and we note ground truth from laser tracker and motion capture system is accurately timesynchronized with the IMU, although dynamic motions deteriorate the measurement accuracy. As yet noticed in [genevaOpenVINS2019], ground truth for the sequence V1 01 easy needs to be recomputed.
We define the train set as the first of the six sequences MH{01,03,05}, V1{02}, V2{01,03}, the validation set as the remaining ending parts of these sequences, and we constitute the test set as the five remaining sequences. We show in Section IVE that using only 8 minutes of accurate data for training  the beginning and end of each trajectory are the most accurately measured  is sufficient to obtain relevant results.
TumVi
the recent dataset [schubertTUM2018] consists of visualinertial sequences in different scenes from an handheld device. The cheap BMI160 IMU logs data at and was properly calibrated. Ground truth is accurately timesynchronized with the IMU, although each sequence contains periodic instants of duration where ground truth is unavailable as the acquisition platform was hidden from the motion capture system, see Figure 4. We take the 6 room sequences, which are the sequences having longest ground truth (23 minutes each).
We define the train set as the first of the sequences room 1, room 3, and room 5, the validation set as the remaining ending parts of these sequences, and we set the test set as the 3 other room sequences. This slipt corresponds to training data points ( for EuRoC) which is in the same order as the number of optimized parameters, , and requires regularization techniques such as weight decay and dropout during training.
IvB Method Implementation & Training
Our opensource method is implemented on PyTorch 1.5, where we configure the training hyperparameters as follows. We set weight decay with parameter 0.1, and dropout with 0.1 the probability of an element to be set equal to zero. Both techniques reduce overfitting.
We choose the ADAM optimizer [kingmaAdam2014] with cosines warning restart scheduler [loshchilovSGDR2016] where learning rate is initialized at 0.01. We train for epochs, which is is very fast as it takes less than 5 minutes for each dataset with a GTX 1080 GPU device.
IvC Compared Methods
We compare a set of methods based on camera and/or IMU.
Methods Based on the IMU Only
we compare the following approaches:

raw IMU, that is an uncalibrated IMU. It refers also to the proposed method once initialized but not trained;

OriNet [esfahaniOriNet2020], which is based on recurrent neural networks, and whose validation set corresponds to the test set (our training setting is thus more challenging);

calibrated IMU, that is, our method where the 12 parameters and are constant, nonzero, and optimized;

proposed IMU, which is our learning based method described in Section III.
Methods Based on Camera and the IMU
we run each of the following method with the same setting, ten times to then average results, and on a Dell Precision Tower 7910 workstation desktop, i.e., without deterioration due to computational limitations [delmericoBenchmark2018]. We compare:

VINSMono [qinVINSMono2018], a monocular VIO framework with notable performances on the EuRoC benchmark [delmericoBenchmark2018];

VINSMono (loop closure), which is the original VINSMono [qinVINSMono2018] reinforced with loopclosure ability;

OpenVINS [genevaOpenVINS2019], a versatile filterbased visualinertial estimator for which we choose the stereo configuration, and that is topranked on the drone dataset of [delmericoAre2019];

OpenVINS (proposed), which is OpenVINS of [genevaOpenVINS2019] but where gyro inputs are the proposed corrected gyro measurements (5) output by our method (trained on sequences that are of course different from those used for evaluation).
Remaining Methods
we finally add a basic zero motion, that is considered as the standard prior in visual odometry when IMU is not available.
IvD Evaluation Metrics
We evaluate the above methods using the following metrics that we compute with the toolbox of [genevaOpenVINS2019].
Absolute Orientation Error (AOE)
which computes the mean square error between the ground truth and estimates for a given sequence as
(11) 
with the sequence length, the logarithm map, and where the estimated trajectory has been aligned on the ground truth at the first instant .
Relative Orientation Error (ROE)
which is computed as [zhangTutorial2018]
(12) 
for each pair of timestamps representing an IMU displacement of 7, 21 or 35 meters. Collecting the error (12) for all the pairs of subtrajectories generates a collection of errors where informative statistics such as the median and percentiles are calculated. As [zhangTutorial2018, delmericoBenchmark2018, genevaOpenVINS2019], we strongly recommend ROE for comparing odometry estimation methods since AOE is highly sensitive to the time when the estimation error occurs. We finally consider slight variants of (11)(12) when considering yaw (only) errors, and note that errors of visual methods generally scale with distance travelled whereas errors of inertial only methods scales with time. We provide in the present paper errors w.r.t. distance travelled to favor comparison with benchmarks such as [delmericoBenchmark2018], and same conclusions hold when computing ROE as function of different times.
IvE Results
Results are given in term of AOE and ROE respectively in Table 1 and Figure 5. Figure 6 illustrates roll, pitch and yaw estimates for a test sequence of each dataset, and Figure 7 shows orientation errors. We note that:
Uncalibrated IMU is Unreliable
raw IMU estimates deviate from ground truth in as low as , see Figure 6, and are barely more reliable than null rotation assumption.
Calibrated IMU Outperforms OriNet
only calibrating an IMU (via our optimization method) leads to surprisingly accurate results, see e.g., Figure 6 (right) where it is difficult to distinguish it from ground truth. This evidences cheap sensors can provide very accurate information once they are correctly calibrated.
The Proposed Method Outperforms Inertial Methods
OriNet [esfahaniOriNet2020] is outperformed. Moreover, our method improves accurate calibrated IMU by a factor 2 to 4. Our approach notably obtains as low as a median error of and on respectively EuRoC and TUMVI datasets.
The Proposed Method Competes with VIO
our IMU only method is accurate even on the high motion dynamics present in both datasets, see Figure 6, and competes with VINSMono and OpenVINS, although trained with only a few minutes of data.
Finally, as the performance of each method depends on the dataset and the algorithm setting, see Figure 5, it is difficult to conclude which VIO algorithm is the best.
IvF Further Results and Comments
We provide a few more comments, supported by further experimental results.
Small Corrections Might Lead to Large Improvement
The Proposed Method is Well Suited to Yaw Estimation
according to Table 1 and Figure 5, we see yaw estimates are particularly accurate. Indeed, VIO methods are able to recover at any time roll and pitch thanks to accelerometers, but the yaw estimates drift with time. In contrast our deadreckoning method never has access to information allowing to recover roll and pitch during testing, and nor does it use “future” information such as VINSMono with loopclosure ability. We finally note that accurate yaw estimates could be fruitful for yawindependent VIO methods such as [svachaInertial2019].
Correcting Gyro Slightly Improves OpenVINS [genevaOpenVINS2019]
both methods based on OpenVINS perform similarly, which is not surprising as camera alone already provides accurate orientation estimates and the gyro assists stereo cameras.
Our Method Requires few Computational Ressources
each VIO method performs here at its best while resorting to high computational requirements, and we expect our method  once trained  is very attractive when running onboard with limited resources. Note that, the proposed method performs e.g. 3 times better in terms of yaw estimates than a slightly restricted VINSMono, see Figure 3 of [delmericoBenchmark2018].
V Discussion
We now provide the community with feedback regarding the method and its implementation. Notably, we emphasize a few points that seem key to a successful implementation when working with a lowcost high frequency IMU.
Va Key Points Regarding the Dataset
One should be careful regarding the quality of data, especially when IMU is sampled at highfrequency. This concerns:
IMU Signal
the IMU signal acquisition should be correct with constant sampling time.
Ground Truth Pose Accuracy
we note that the EuRoC ground truth accuracy is better at the beginning of the trajectory. As such, training with only this part of data (the first of the training sequences) is sufficient (and best) to succeed.
Ground Truth TimeAlignement
the time alignment between ground truth and IMU is significant for success, otherwise the method is prone to learn a time delay.
We admit that our approach requires a proper dataset, which is what constitutes its main limitation.
VB Key Points Regarding the Neural Network
Our conclusions about the neural network are as follows.
Activation Function
the GELU and other smooth activation functions [ramachandranSearching2018], such as ELU, perform well, whereas ReLU based network is more prone to overfit. We believe ReLU activation function favors sharp corrections which does not make sense when dealing with physical signals.
Neural Network Hyperparameters
increasing the depth, channel and/or kernel sizes of the network, see Figure 2, does not systematically lead to better results. We tuned these hyperparameters with random search, although more sophisticated methods such as [liHyperband2017] exist.
Normalization Layer
batchnorm layer improves both training speed and accuracy [ioffeBatch2015], and is highly recommended.
VC Key Points Regarding Training
As in any machine learning application, the neural network architecture is a key component among others [goodfellowDeep2016]. Our comments regarding training are as follows:
Optimizer
the ADAM optimizer [kingmaAdam2014] performs well.
Learning Rate Scheduler
adopting a learning rate policy with cosinus warning restart [loshchilovSGDR2016] leads to substantial improvement and helps to find a correct learning rate.
Regularization
dropout and weight decay hyperparameters are crucial to avoid overfitting. Each has a range of ideal values which is quickly tuned with basic gridsearch.
VD Remaining Key Points
We finally outline two points that we consider useful to the practitioner:
Orientation Implementation
we did not find any difference between rotation matrix or quaternion loss function implementation once numerical issues are solved, e.g., by enforcing quaternion unit norm. Both implementations result in similar accuracy performance and execution time.
Generalization and Transfert Learning
it may prove useful to assess to what extent a learning method is generalizable. The extension of the method, trained on one dataset, to another device or to the same device on another platform is considered as challenging, though, and left for future work.
Vi Conclusion
This paper proposes a deeplearning method for denoising IMU gyroscopes and obtains remarkable accurate attitude estimates with only a lowcost IMU, that outperforms stateoftheart [esfahaniOriNet2020]. The core of the approach is based on a careful design and feature selection of a dilated convolutional network, and an appropriate loss function leveraged for training on orientation increment at the ground truth frequency. This leads to a method robust to overfitting, efficient and fast to train, which serves as offline IMU calibration and may enhance it. As a remarkable byproduct, the method competes with stateoftheart visualinertial methods in term of attitude estimates on a drone and handheld device datasets, where we simply integrate noisefree gyro measurements.
We believe the present paper offers new perspectives for (visual)inertial learning methods. Future work will address new challenges in three directions: learning from multiple IMUs (the current method is reserved for one IMU only which serves for training and testing); learning from moderately accurate ground truth that can be output of visualinertial localization systems; and denoising accelerometers based on relative increments from preintegration theory [forsterOnManifold2017, barrauMathematical2020].
Acknowledgements
The authors wish to thank Jeffrey Delmerico for sharing the results of the VIO benchmark [delmericoBenchmark2018].