Inverting Learned Dynamics Models for Aggressive Multirotor Control

Inverting Learned Dynamics Models for Aggressive Multirotor Control

Alexander Spitzer and Nathan Michael The Robotics Institute
Carnegie Mellon University
Pittsburgh, PA 15213, USA
{spitzer, nmichael}

We present a control strategy that applies inverse dynamics to a learned acceleration error model for accurate multirotor control input generation. This allows us to retain accurate trajectory and control input generation despite the presence of exogenous disturbances and modeling errors. Although accurate control input generation is traditionally possible when combined with parameter learning-based techniques, we propose a method that can do so while solving the relatively easier non-parametric model learning problem. We show that our technique is able to compensate for a larger class of model disturbances than traditional techniques can and we show reduced tracking error while following trajectories demanding accelerations of more than m/s in multirotor simulation and hardware experiments.

I Introduction

I-a Motivation

In the last several years, aerial robotics has seen a surge in popularity, largely due to the increasing viability of applications [29, 19, 7]. Multirotors have been particularly well represented, due to their agility and versatility, and have additionally been a fruitful testbed for nonlinear controllers and trajectory generation strategies [25, 21, 33, 20].

Computing precise control inputs for a dynamical system often requires accurate knowledge of its dynamics. Van Nieuwstadt and Murray [38] showed how the concept of differential flatness can be used to generate control inputs that follow a given trajectory for differentially flat systems.

For a multirotor, differential flatness can be used to compute the exact inputs required to follow a specified trajectory in , , , and yaw (See Mellinger and Kumar [28]). The computed control inputs are only accurate if the fixed dynamic model and its associated parameters, e.g. mass, inertia, etc., are correct. Often, this fixed dynamic model assumption fails and the estimated parameters are inaccurate. This results in suboptimal trajectory tracking performance.

One possible approach to alleviate this problem is to estimate the model parameters from vehicle trajectory data. This however, can be difficult, and is still suboptimal when the chosen parameterization cannot realize the true vehicle model. On the other hand, non-parametric error models are commonly used and relatively easy to learn but are not readily used in the differential flatness framework. In this work, we show how a non-parametric error model can be used to generate control inputs that follow a specified trajectory. We additionally provide an extension to the proposed approach that can deal with input-dependent model errors via numerical optimization. We validate the control input generation strategy both in simulation and through experiments on a quadrotor.

Fig. 1: Our experimental platform while executing an aggressive circle trajectory (top) and an aggressive line trajectory (middle) using the proposed control input generation strategy that is capable of compensating for dynamic and input-dependent acceleration disturbances (FF5). Our method substantially reduces tracking error along the aggressive line trajectory (bottom).

I-B Related Works

Accurate and aggressive multirotor flight has been explored in [37, 30, 10, 28] among others. As for many other robotic platforms, accurate modeling has been shown to improve flight performance [36, 4]. Traditional non-learning based modeling can be achieved via hand crafted experiments, calibration procedures, and computer-aided design [27]. Since this requires significant manual effort and engineering hours, there have been many works exploring automatic parameter estimation methods [6, 5] and non-parametric model learning methods [22, 8] for multirotor control. In this work, we focus on non-parametric model learning methods, since parameter learning methods can be limited in their accuracy by the choice of parameterization [31]. There has also been work on learning control input corrections for aggressive flight without learning a dynamical model [24]. These methods are not a focus of this paper since they can typically only be applied while executing the trained trajectories or reference quantities. A learned dynamical model can be applied to any trajectory or reference.

Non-parametric model learning methods for robot control have been employed in [13, 35, 1, 26]. Model learning performed in real-time incrementally has been studied in [16, 14, 3]. Florez et al. [13] use Locally Weighted Projection Regression (LWPR [39]) while Gijsberts and Metta [14] use Random Fourier Features [32], which was extended to Incremental Sparse Spectrum Gaussian Process Regression (ISSGPR), a Bayesian regression formulation, in Gijsberts and Metta [15]. Droniou et al. [9] evaluated LWPR and ISSGPR for the purposes of robot control and found ISSGPR to perform better. In this work, we use linear regression, but our approach can use any model learning strategy.

Once an accurate dynamical system model is known, a Model Predictive Control (MPC) strategy can be used to optimize a desired cost function, subject to the dynamics [8, 25, 23, 2]. These approaches often make approximations to ensure real time feasibility [8, 2]. Furthermore, Desaraju [8] does not perform full inverse dynamics on the disturbance, which can lead to suboptimal performance while tracking aggressive trajectories.

The differential flatness property of multirotors has been widely exploited for accurate trajectory tracking [11, 17, 28, 12, 34, 30]. Differential flatness of the multirotor subject to linear drag was shown in Faessler et al. [11]. This extends the applicability of the approach to a limited family of disturbances. Faessler et al. [11] do not address the issue of nonlinear disturbances as a function of state and/or control input in the flatness computations. Issues arising from singularities, commonly encountered during aggressive flight, were discussed and mitigated in Morrell et al. [30], increasing the robustness of the differential flatness approach.

Although control inputs computed using the differential flatness framework will automatically take into account dynamical model parameter changes, such as mass, inertia, etc., it is not clear how to incorporate non-parametric model corrections. In this work, we build on the differential flatness formulation by extending it to compensate for learned non-parametric dynamic model disturbances. Our approach can compensate for arbitrary disturbances that are a function of vehicle position and velocity, as well as control input dependent disturbances that are a function of vehicle orientation and thrust. This increases the applicability of the approach to a much wider range of realistic flight conditions.

I-C Notation

Lowercase letters such as and are scalars in . Boldface lowercase letters such as and are vectors. is the by identity matrix. denotes the total time derivative of . is mass, is the gravitational constant and is inertia. All functions in this paper are assumed to have continuous second derivatives everywhere and thus all second partial derivatives are symmetric, i.e. . Unless otherwise indicated, all vector quantities are expressed in a fixed reference frame.

Ii Method

In this section, we first introduce the problem statement in Part A. Part B details our approach for compensating for dynamic disturbances that can be a function of vehicle position, vehicle velocity, or other quantities that are independent of the applied control inputs. Part C extends the approach to compensate for disturbances that are input-dependent and can be a function of e.g. the applied vehicle thrust or vehicle orientation. Finally, Part D describes the model learning approach.

Ii-a Problem Statement

Assume we are given a desired position over time, , along with its first four time derivatives, the velocity, acceleration, jerk, and snap: , , , .

Equation (1) shows a typical acceleration model of a multirotor, where the commanded acceleration is aligned with the body -axis.


Here is the commanded body acceleration, is the body -axis (), is the gravity vector, and is an additive acceleration error model that can, in general, be a function of both vehicle state and control input .

The objective is to compute the body acceleration , body -axis , angular velocity , and angular acceleration such that integrating forwards in time twice results in an orientation with as the -axis and that the vehicle acceleration, which is a function of , equals the desired vehicle acceleration . This will ensure that the vehicle follows the specified trajectory . Note that while and are not true control inputs to the system, they are necessary as feedforward references to the attitude feedback controller. Once the body acceleration and angular acceleration are computed, they are multiplied by mass and inertia and used as the feedforward force and torque in the position and attitude feedback controllers respectively.

For simplicity, we will assume that the yaw of the vehicle is always zero, but all of the methods presented are applicable while following yaw trajectories as well.

Ii-B Input-independent error compensation

The simplest version of our control input generation strategy assumes that the disturbance model is a function of the vehicle position and velocity only: . In this case, the desired acceleration vector can be computed directly, as shown in (2).


Since must be of unit length, both and can be computed from by computing the magnitude and normalizing.

The angular velocity and angular acceleration are found by first computing the first and second time derivatives of .

Differentiating (2) in time results in


Since is of unit length, and it must remain so, it must be perpendicular to . Thus taking a dot product of (3) with allows us to find .


Inserting into (3) gives us .


The body angular velocity can be extracted from by first defining the body and body -axes using a desired vehicle yaw, then projecting onto those axes. See [28] for the details.

To find , we differentiate (3).


The second time derivative of the learned disturbance is shown in (7). Note that the second partial derivative of the error model with respect to its vector inputs is a 3rd order tensor.


Noting that differentiating implies and again taking a dot product with , we can compute .


Inserting into (6) gives us .


To compute the body angular acceleration from , we note that and proceed as before for the angular velocity, by projecting onto the body and -axes.

Note that the above equations for and are similar to those derived in [28] with the difference that here, the first and second derivatives of the learned dynamics model are incorporated. In this way, the control inputs generated anticipate changes in the disturbance.

One practical issue that arises is that the vehicle acceleration, , and jerk, , are not readily available during operation. Computing them from odometry by taking finite-differences will introduce noise. To alleviate this in our experiments, we use the acceleration and jerk demanded by the trajectory, which are good approximations of the true vehicle acceleration and jerk when tracking error is low.

Ii-C Input-dependent error compensation

In many cases, additive dynamics model errors are a function of the applied control input and vehicle orientation, in addition to the vehicle position and velocity. For example, if the mass of the vehicle is not accurately known (or alternatively, the actuators are not properly modeled), the disturbance will be a linear function of the applied acceleration. The input-dependent acceleration model is shown in (10).


Here, contains the vehicle position and velocity and .

Without assuming a particular form for the additive error term , it is not possible to solve for the required acceleration and orientation analytically. We must resort to solving the problem numerically. Interestingly however, once a solution for the acceleration and orientation is found, the rest of the control inputs can be found analytically in a method similar to the input-independent case described above.

We first rewrite the acceleration model as the functional equation that is only a function of and time. We compute the time derivative of by taking a derivative of the above equation and solving the resulting linear system.


For our acceleration model, . The necessary derivatives are shown in (13) and (14).


To find , we take a derivative of (11) and again solve the resulting linear system.


The necessary derivatives for our acceleration model are shown in (17), (18), and (19).


To compute from , we take a derivative of and proceed as before, by projecting onto and solving first for .


To compute from , and the angular velocity and angular acceleration, we follow the same approach as for the input-independent case.

It should be noted that this approach requires the existence of a solution to and the analogous equation for . Solutions will only fail to exist when the estimated disturbance model is strong enough to completely negate the acceleration imparted by . This may be a concern when learning a model from data, but in practice has not occurred in our experiments.

Ii-D Model Learning

To estimate from vehicle trajectory data, we fit a model to differences between the observed and the predicted acceleration at every time step. The observed acceleration is computed using finite-differences of the estimated vehicle velocity while the predicted acceleration is .

In principle, any regressive model whose derivatives are available can be used.

Iii Experiments

We first evaluate the proposed approach on a simulated 2D multirotor that is subjected to a series of input-independent and input-dependent disturbances. We then evaluate how the approach reduces tracking error on a quadrotor executing aggressive trajectories.

Iii-a Simulation

The 2D planar multirotor captures many of the important dynamics present in the 3D multirotor. Namely, orientation and acceleration are coupled. In fact, the motion of a 3D multirotor moving in a vertical plane, e.g. in a straight line trajectory, can essentially be described with the 2D multirotor. As such, we believe a planar simulation is an appropriate testbed for our method.

The 2D multirotor force model is shown in Fig. 2. The dynamics are shown in (23) – (25), where is the applied body force and is the applied body acceleration. The mass, , was set to kg, gravity to m/s, and inertia to kg-m.

Fig. 2: Force diagram of the 2D multirotor used in the simulation experiment. and are the control inputs.

We subject the simulated multirotor to disturbances selected from Table I. Disturbance 1 is constant and emulates a fixed force field in the direction, e.g. due to wind. It is not input-dependent and is not dynamic since it does not change along with the vehicle state. Disturbance 2 is velocity dependent and emulates drag in the direction. Disturbance 3 depends on the vehicle angle and is thus input-dependent. Disturbance 4 is velocity dependent and emulates drag in the direction. Disturbance 5 is a mass perturbation that adds a disturbance linear in the applied acceleration, which makes it input-dependent.

No. Effect Input-dependent? Dynamic?
1 -= no no
2 -= no yes
3 += yes yes
4 -= no yes
5 += yes yes
TABLE I: Disturbances used in the 2D multirotor simulation experiment.

The vehicle is given a desired trajectory that takes it from , , to , in one second. The trajectories in and are both 7th-order polynomials that have the velocity, acceleration, and jerk equal to zero at each of their endpoints. This ensures that the trajectory starts and ends with the vehicle at rest, at an angle of zero, and with an angular velocity of zero. When Disturbance 1 is in effect, the vehicle’s angle is initialized such that maintaining zero acceleration in also maintains zero acceleration in . This ensures that the trajectory can be perfectly followed with correct control inputs despite the constant acceleration disturbance in . In all other cases, the vehicle state starts at 0.

We show and tracking error for the following feedforward input generation strategies with and without feedback.

  • No disturbance learning

  • Basic disturbance compensation (no disturbance dynamics)

  • Disturbance compensation w/ numerical optimization

  • Dist. comp. w/ disturbance dynamics (ours)

  • Dist. comp. w/ num. opt. and disturbance dynamics (ours)

FF1 uses the feedforward generation strategy as presented in [28] and does not do any regression for disturbance learning. FF2 and FF3 do not consider the dynamics of the disturbance; they compute the angular velocity and angular acceleration feedforward terms as in [28] while incorporating the learned disturbance in the acceleration model, (1). FF4 is the proposed approach that deals with input-independent disturbances while FF5 is the proposed approach that deals with input-dependent disturbances.

In this experiment, FF3 and FF5 solve (10) numerically using the modified Powell method root finder in SciPy [18]. The initial guess for the optimization is the solution from the previous timestep.

Position and angle feedback is provided by PD controllers with gains of 10 on position and velocity errors, 300 on angle errors, and 30 on angular velocity errors. The position PD controller output is added to the desired acceleration and the angle PD controller output is added to the desired angular acceleration.

In all simulation experiments, the feature vector used for linear regression of model errors is shown in (26). The features were hand selected to appropriately model the disturbances in Table I.


The learned model is thus


is the result of regressing the projected input data to the observed acceleration errors and minimizing least squared error. In this experiment, is recomputed after every trajectory execution using data from all past executions. Results reported are on the 3rd run, since we found that only two regression steps were needed to converge to an accurate enough model. This is not surprising, as in this simulation there is no noise and the features used can appropriately reproduce the applied disturbances.

Each control configuration is subjected to the following set of disturbance combinations.

  • Disturbance 1

  • Disturbances 1, 2, and 4

  • Disturbances 3 and 5

  • Disturbances 1, 2, 3, 4, and 5

Error plots for each of the four disturbance sets without feedback control are shown in Figs. (a)a(d)d. Under only a constant disturbance (Fig. (a)a), all disturbance compensation strategies work well, since the disturbance is neither input-dependent nor dynamic. When we introduce drag, a dynamic disturbance, in disturbance set B (Fig. (b)b), only the approaches that compensate for disturbance dynamics, FF4 and FF5, achieve low error. Although basic disturbance compensation as in FF2 helps considerably, accounting for disturbance dynamics improves performance further. Since in disturbance set B, the disturbances are still input-independent, the use of numerical optimization to solve the acceleration model (1) has no effect.

Under input-dependent disturbances, we see that FF5 is the only approach that achieves low error. This is expected, as for both disturbance sets C and D, there are dynamic and input-dependent disturbances present.

(a) Errors for disturbance set A, containing only a constant disturbance. All strategies that compensate for the disturbance perform well.
(b) Errors for disturbance set B, containing dynamic, but input-independent disturbances. FF4 and FF5, which compensate for dynamic disturbances, perform the best.
(c) Errors for disturbance set C, containing dynamic and input-dependent disturbances. Only FF5 performs well.
(d) Errors for disturbance set D, which contains all considered disturbances. FF5 performs the best.
Fig. 7: Error plots for all five feedforward strategies without feedback control under each of the four disturbance sets.

Error plots for disturbance set D with feedback control are shown in Fig. 8. We see that although feedback can reduce the error, it is not enough to completely eliminate the error. FF5 still outperforms the other methods, achieving nearly zero error in all trials.

Fig. 8: Error plots for all five feedforward strategies with feedback control under disturbance set D. FF5 outperforms all others.

The maximum absolute position errors over the trajectory for all tested configurations are listed in Tables II and III.

  Dist. Set
TABLE II: Maximum absolute position errors, in meters, for each control strategy without feedback in simulation
  Dist. Set
TABLE III: Maximum absolute position errors, in meters, for each control strategy with feedback in simulation

Iii-B Hardware

Fig. 9: The g quadrotor used for the hardware experiments. Onboard computation is performed by an Odroid XU4 and the Pixhawk 1 Flight Controller.

Iii-B1 Platform & Setup

To validate the usefulness of dynamic disturbance compensation and input-dependent disturbance compensation, we compare the five aforementioned feedforward generation strategies, FF1 through FF5, on a g quadrotor while following aggressive trajectories. Figure 9 shows the hardware platform and Fig. 1 shows the robot while following aggressive circle and line trajectories.

Position, velocity, and yaw feedback is provided by a motion capture arena at Hz, while pitch, roll, and angular velocity feedback is provided by a Pixhawk PX4 at Hz. Feedback control is performed by a cascaded PD system following [28]. The feedforward terms are as computed by FF1 through FF5. FF3 and FF5 solve (10) numerically using the Newton-Raphson method. All control computation is performed onboard the vehicle’s Odroid XU4 computer. The position control loop runs at Hz and the attitude control loop runs at Hz.

For the hardware experiments, we use three test trajectories: a s straight line trajectory, a circle trajectory, and a figure 8 trajectory. The trajectories are designed to be near the limit of what the robot can feasibly track. Table IV lists the three trajectories and their maximum absolute derivatives.

Traj. (m) (m/s) (m/s) (m/s) (m/s)
Figure 8
TABLE IV: The aggressive trajectories used to evaluate the proposed approach and their maximum derivatives.

Iii-B2 Model Learning

We use linear regression as the model learning strategy in the hardware experiment. Input data to the regression is a 6 dimensional vector consisting of the vehicle velocity and the commanded acceleration vector . The system starts with an uninitialized model and uses a few test trajectories per trial to regress to the acceleration error. The error model is then held fixed during the remaining trajectories used for error evaluation. Although in principle, the model can be updated incrementally, keeping it fixed allows for a fair comparison between the control strategies.

Iii-B3 Results

For the line trajectory, each of FF1, FF2, FF4, and FF5 is evaluated four times. The first four trajectories, run using FF1, are used to train the acceleration error model. An overlay of the vehicle executing the m line trajectory can be seen in Fig. 1. Absolute errors along the trajectory and errors along the vertical axis for the line trajectory are shown in Fig. 10. FF1 performs the worst, especially along the vertical axis, indicating that the robot is underestimating the control input required to maintain hover. FF2 eliminates much of the error in the vertical axis, but still accumulates significant error along the trajectory, rising above cm consistently. FF4 and FF5 provide on average a 30% reduction in the average absolute tracking error along the trajectory when compared to FF2. This indicates that taking disturbance dynamics into account can significantly improve tracking performance. This trajectory does not provide sufficient clarity to determine the impact of FF5, input-dependent disturbance compensation.

Fig. 10: Average absolute errors during an aggressive straight line trajectory for four of the five control strategies. Shaded regions denote the minimum and maximum errors per timestep over four trials. Means (m) ( std (m)) over the 4 trajectories of the average error for FF1, FF2, FF4, and FF5 respectively are 0.120 0.003, 0.067 0.003, 0.047 0.009, and 0.048 0.10. Those for the average error are 0.525 0.023, 0.173 0.034, 0.142 0.025, and 0.173 0.030.

For the circle trajectory, all of the feedforward strategies are evaluated once, with FF3 and FF5 receiving two and four more trajectories respectively. An overlay of the vehicle executing the circle trajectory can be seen in Fig. 1. Fig. 11 shows the resulting error. As expected FF1, with no disturbance compensation, performs the worst. FF2, FF4, and FF5 all perform similarly well, with FF2 achieving slightly lower vertical error than the others. FF3 performs slightly worse here than FF2, suggesting that the numerical routine may be failing to converge or that the input’s dependence on the acceleration error has not been properly modeled.

Fig. 11: Average absolute errors during an aggressive circle trajectory for the five control strategies. Shaded regions denote the minimum and maximum errors per timestep. Avg. errors (m) for FF1, FF2, FF3, FF4, and FF5 are 0.118, 0.071, 0.103, 0.069, and 0.076, respectively, while avg. errors (m) are 0.41, 0.084, 0.122, 0.069, and 0.066, respectively.

Fig. 12 shows the error of FF1, FF2, FF4, and FF5 along the figure 8 trajectory for one trial each. The improvement of FF4 over FF2 here is smaller than in the other trajectories, suggesting that dynamic disturbances have relatively less of an impact when following the figure 8, though more experimental trials are warranted to strengthen this claim.

Fig. 12: Errors during an aggressive figure 8 trajectory for four of the five control strategies. Avg. errors (m) for FF1, FF2, FF4, and FF5 are 0.105, 0.076, 0.063, and 0.059 respectively, while avg. errors (m) are 0.473, 0.064, 0.064, and 0.088, respectively.

Iv Conclusion

We have presented a method that allows compensation of dynamic disturbances through evaluation of the derivatives of a learned model. We have shown in both simulation and hardware experiments that our dynamic disturbance compensation method improves performance over traditional disturbance compensation. We have also shown the usefulness of input-dependent disturbance compensation in simulation and preliminary results on hardware. The versatility of the approach in a realistic robotics application has been verified through evaluation on three distinct test trajectories.

Future work will evaluate nonlinear regression techniques, such as ISSGPR, on hardware platforms, as well as consider regression techniques that explicitly optimize model derivative accuracy. An interesting avenue of future study is to analyze theoretically how the error model accuracy affects the performance of each of the feedforward generation strategies. Lastly, we hope to apply this technique to the attitude dynamics of multirotors, in order to fully compensate for vehicle disturbances and modeling errors.

V Acknowledgments

The authors thank Xuning Yang for helpful feedback on this manuscript.


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description