Inverting Learned Dynamics Models for Aggressive Multirotor Control
Abstract
We present a control strategy that applies inverse dynamics to a learned acceleration error model for accurate multirotor control input generation. This allows us to retain accurate trajectory and control input generation despite the presence of exogenous disturbances and modeling errors. Although accurate control input generation is traditionally possible when combined with parameter learningbased techniques, we propose a method that can do so while solving the relatively easier nonparametric model learning problem. We show that our technique is able to compensate for a larger class of model disturbances than traditional techniques can and we show reduced tracking error while following trajectories demanding accelerations of more than m/s in multirotor simulation and hardware experiments.
I Introduction
Ia Motivation
In the last several years, aerial robotics has seen a surge in popularity, largely due to the increasing viability of applications [29, 19, 7]. Multirotors have been particularly well represented, due to their agility and versatility, and have additionally been a fruitful testbed for nonlinear controllers and trajectory generation strategies [25, 21, 33, 20].
Computing precise control inputs for a dynamical system often requires accurate knowledge of its dynamics. Van Nieuwstadt and Murray [38] showed how the concept of differential flatness can be used to generate control inputs that follow a given trajectory for differentially flat systems.
For a multirotor, differential flatness can be used to compute the exact inputs required to follow a specified trajectory in , , , and yaw (See Mellinger and Kumar [28]). The computed control inputs are only accurate if the fixed dynamic model and its associated parameters, e.g. mass, inertia, etc., are correct. Often, this fixed dynamic model assumption fails and the estimated parameters are inaccurate. This results in suboptimal trajectory tracking performance.
One possible approach to alleviate this problem is to estimate the model parameters from vehicle trajectory data. This however, can be difficult, and is still suboptimal when the chosen parameterization cannot realize the true vehicle model. On the other hand, nonparametric error models are commonly used and relatively easy to learn but are not readily used in the differential flatness framework. In this work, we show how a nonparametric error model can be used to generate control inputs that follow a specified trajectory. We additionally provide an extension to the proposed approach that can deal with inputdependent model errors via numerical optimization. We validate the control input generation strategy both in simulation and through experiments on a quadrotor.
IB Related Works
Accurate and aggressive multirotor flight has been explored in [37, 30, 10, 28] among others. As for many other robotic platforms, accurate modeling has been shown to improve flight performance [36, 4]. Traditional nonlearning based modeling can be achieved via hand crafted experiments, calibration procedures, and computeraided design [27]. Since this requires significant manual effort and engineering hours, there have been many works exploring automatic parameter estimation methods [6, 5] and nonparametric model learning methods [22, 8] for multirotor control. In this work, we focus on nonparametric model learning methods, since parameter learning methods can be limited in their accuracy by the choice of parameterization [31]. There has also been work on learning control input corrections for aggressive flight without learning a dynamical model [24]. These methods are not a focus of this paper since they can typically only be applied while executing the trained trajectories or reference quantities. A learned dynamical model can be applied to any trajectory or reference.
Nonparametric model learning methods for robot control have been employed in [13, 35, 1, 26]. Model learning performed in realtime incrementally has been studied in [16, 14, 3]. Florez et al. [13] use Locally Weighted Projection Regression (LWPR [39]) while Gijsberts and Metta [14] use Random Fourier Features [32], which was extended to Incremental Sparse Spectrum Gaussian Process Regression (ISSGPR), a Bayesian regression formulation, in Gijsberts and Metta [15]. Droniou et al. [9] evaluated LWPR and ISSGPR for the purposes of robot control and found ISSGPR to perform better. In this work, we use linear regression, but our approach can use any model learning strategy.
Once an accurate dynamical system model is known, a Model Predictive Control (MPC) strategy can be used to optimize a desired cost function, subject to the dynamics [8, 25, 23, 2]. These approaches often make approximations to ensure real time feasibility [8, 2]. Furthermore, Desaraju [8] does not perform full inverse dynamics on the disturbance, which can lead to suboptimal performance while tracking aggressive trajectories.
The differential flatness property of multirotors has been widely exploited for accurate trajectory tracking [11, 17, 28, 12, 34, 30]. Differential flatness of the multirotor subject to linear drag was shown in Faessler et al. [11]. This extends the applicability of the approach to a limited family of disturbances. Faessler et al. [11] do not address the issue of nonlinear disturbances as a function of state and/or control input in the flatness computations. Issues arising from singularities, commonly encountered during aggressive flight, were discussed and mitigated in Morrell et al. [30], increasing the robustness of the differential flatness approach.
Although control inputs computed using the differential flatness framework will automatically take into account dynamical model parameter changes, such as mass, inertia, etc., it is not clear how to incorporate nonparametric model corrections. In this work, we build on the differential flatness formulation by extending it to compensate for learned nonparametric dynamic model disturbances. Our approach can compensate for arbitrary disturbances that are a function of vehicle position and velocity, as well as control input dependent disturbances that are a function of vehicle orientation and thrust. This increases the applicability of the approach to a much wider range of realistic flight conditions.
IC Notation
Lowercase letters such as and are scalars in . Boldface lowercase letters such as and are vectors. is the by identity matrix. denotes the total time derivative of . is mass, is the gravitational constant and is inertia. All functions in this paper are assumed to have continuous second derivatives everywhere and thus all second partial derivatives are symmetric, i.e. . Unless otherwise indicated, all vector quantities are expressed in a fixed reference frame.
Ii Method
In this section, we first introduce the problem statement in Part A. Part B details our approach for compensating for dynamic disturbances that can be a function of vehicle position, vehicle velocity, or other quantities that are independent of the applied control inputs. Part C extends the approach to compensate for disturbances that are inputdependent and can be a function of e.g. the applied vehicle thrust or vehicle orientation. Finally, Part D describes the model learning approach.
Iia Problem Statement
Assume we are given a desired position over time, , along with its first four time derivatives, the velocity, acceleration, jerk, and snap: , , , .
Equation (1) shows a typical acceleration model of a multirotor, where the commanded acceleration is aligned with the body axis.
(1) 
Here is the commanded body acceleration, is the body axis (), is the gravity vector, and is an additive acceleration error model that can, in general, be a function of both vehicle state and control input .
The objective is to compute the body acceleration , body axis , angular velocity , and angular acceleration such that integrating forwards in time twice results in an orientation with as the axis and that the vehicle acceleration, which is a function of , equals the desired vehicle acceleration . This will ensure that the vehicle follows the specified trajectory . Note that while and are not true control inputs to the system, they are necessary as feedforward references to the attitude feedback controller. Once the body acceleration and angular acceleration are computed, they are multiplied by mass and inertia and used as the feedforward force and torque in the position and attitude feedback controllers respectively.
For simplicity, we will assume that the yaw of the vehicle is always zero, but all of the methods presented are applicable while following yaw trajectories as well.
IiB Inputindependent error compensation
The simplest version of our control input generation strategy assumes that the disturbance model is a function of the vehicle position and velocity only: . In this case, the desired acceleration vector can be computed directly, as shown in (2).
(2) 
Since must be of unit length, both and can be computed from by computing the magnitude and normalizing.
The angular velocity and angular acceleration are found by first computing the first and second time derivatives of .
Differentiating (2) in time results in
(3) 
Since is of unit length, and it must remain so, it must be perpendicular to . Thus taking a dot product of (3) with allows us to find .
(4) 
Inserting into (3) gives us .
(5) 
The body angular velocity can be extracted from by first defining the body and body axes using a desired vehicle yaw, then projecting onto those axes. See [28] for the details.
To find , we differentiate (3).
(6) 
The second time derivative of the learned disturbance is shown in (7). Note that the second partial derivative of the error model with respect to its vector inputs is a 3rd order tensor.
(7) 
Noting that differentiating implies and again taking a dot product with , we can compute .
(8) 
Inserting into (6) gives us .
(9) 
To compute the body angular acceleration from , we note that and proceed as before for the angular velocity, by projecting onto the body and axes.
Note that the above equations for and are similar to those derived in [28] with the difference that here, the first and second derivatives of the learned dynamics model are incorporated. In this way, the control inputs generated anticipate changes in the disturbance.
One practical issue that arises is that the vehicle acceleration, , and jerk, , are not readily available during operation. Computing them from odometry by taking finitedifferences will introduce noise. To alleviate this in our experiments, we use the acceleration and jerk demanded by the trajectory, which are good approximations of the true vehicle acceleration and jerk when tracking error is low.
IiC Inputdependent error compensation
In many cases, additive dynamics model errors are a function of the applied control input and vehicle orientation, in addition to the vehicle position and velocity. For example, if the mass of the vehicle is not accurately known (or alternatively, the actuators are not properly modeled), the disturbance will be a linear function of the applied acceleration. The inputdependent acceleration model is shown in (10).
(10) 
Here, contains the vehicle position and velocity and .
Without assuming a particular form for the additive error term , it is not possible to solve for the required acceleration and orientation analytically. We must resort to solving the problem numerically. Interestingly however, once a solution for the acceleration and orientation is found, the rest of the control inputs can be found analytically in a method similar to the inputindependent case described above.
We first rewrite the acceleration model as the functional equation that is only a function of and time. We compute the time derivative of by taking a derivative of the above equation and solving the resulting linear system.
(11)  
(12) 
(13) 
(14) 
To find , we take a derivative of (11) and again solve the resulting linear system.
(15)  
(16) 
(17) 
(18) 
(19) 
To compute from , we take a derivative of and proceed as before, by projecting onto and solving first for .
(20)  
(21)  
(22) 
To compute from , and the angular velocity and angular acceleration, we follow the same approach as for the inputindependent case.
It should be noted that this approach requires the existence of a solution to and the analogous equation for . Solutions will only fail to exist when the estimated disturbance model is strong enough to completely negate the acceleration imparted by . This may be a concern when learning a model from data, but in practice has not occurred in our experiments.
IiD Model Learning
To estimate from vehicle trajectory data, we fit a model to differences between the observed and the predicted acceleration at every time step. The observed acceleration is computed using finitedifferences of the estimated vehicle velocity while the predicted acceleration is .
In principle, any regressive model whose derivatives are available can be used.
Iii Experiments
We first evaluate the proposed approach on a simulated 2D multirotor that is subjected to a series of inputindependent and inputdependent disturbances. We then evaluate how the approach reduces tracking error on a quadrotor executing aggressive trajectories.
Iiia Simulation
The 2D planar multirotor captures many of the important dynamics present in the 3D multirotor. Namely, orientation and acceleration are coupled. In fact, the motion of a 3D multirotor moving in a vertical plane, e.g. in a straight line trajectory, can essentially be described with the 2D multirotor. As such, we believe a planar simulation is an appropriate testbed for our method.
The 2D multirotor force model is shown in Fig. 2. The dynamics are shown in (23) – (25), where is the applied body force and is the applied body acceleration. The mass, , was set to kg, gravity to m/s, and inertia to kgm.
(23)  
(24)  
(25) 
We subject the simulated multirotor to disturbances selected from Table I. Disturbance 1 is constant and emulates a fixed force field in the direction, e.g. due to wind. It is not inputdependent and is not dynamic since it does not change along with the vehicle state. Disturbance 2 is velocity dependent and emulates drag in the direction. Disturbance 3 depends on the vehicle angle and is thus inputdependent. Disturbance 4 is velocity dependent and emulates drag in the direction. Disturbance 5 is a mass perturbation that adds a disturbance linear in the applied acceleration, which makes it inputdependent.
No.  Effect  Inputdependent?  Dynamic? 

1  =  no  no 
2  =  no  yes 
3  +=  yes  yes 
4  =  no  yes 
5  +=  yes  yes 
The vehicle is given a desired trajectory that takes it from , , to , in one second. The trajectories in and are both 7thorder polynomials that have the velocity, acceleration, and jerk equal to zero at each of their endpoints. This ensures that the trajectory starts and ends with the vehicle at rest, at an angle of zero, and with an angular velocity of zero. When Disturbance 1 is in effect, the vehicle’s angle is initialized such that maintaining zero acceleration in also maintains zero acceleration in . This ensures that the trajectory can be perfectly followed with correct control inputs despite the constant acceleration disturbance in . In all other cases, the vehicle state starts at 0.
We show and tracking error for the following feedforward input generation strategies with and without feedback.

No disturbance learning

Basic disturbance compensation (no disturbance dynamics)

Disturbance compensation w/ numerical optimization

Dist. comp. w/ disturbance dynamics (ours)

Dist. comp. w/ num. opt. and disturbance dynamics (ours)
FF1 uses the feedforward generation strategy as presented in [28] and does not do any regression for disturbance learning. FF2 and FF3 do not consider the dynamics of the disturbance; they compute the angular velocity and angular acceleration feedforward terms as in [28] while incorporating the learned disturbance in the acceleration model, (1). FF4 is the proposed approach that deals with inputindependent disturbances while FF5 is the proposed approach that deals with inputdependent disturbances.
In this experiment, FF3 and FF5 solve (10) numerically using the modified Powell method root finder in SciPy [18]. The initial guess for the optimization is the solution from the previous timestep.
Position and angle feedback is provided by PD controllers with gains of 10 on position and velocity errors, 300 on angle errors, and 30 on angular velocity errors. The position PD controller output is added to the desired acceleration and the angle PD controller output is added to the desired angular acceleration.
In all simulation experiments, the feature vector used for linear regression of model errors is shown in (26). The features were hand selected to appropriately model the disturbances in Table I.
(26) 
The learned model is thus
(27) 
is the result of regressing the projected input data to the observed acceleration errors and minimizing least squared error. In this experiment, is recomputed after every trajectory execution using data from all past executions. Results reported are on the 3rd run, since we found that only two regression steps were needed to converge to an accurate enough model. This is not surprising, as in this simulation there is no noise and the features used can appropriately reproduce the applied disturbances.
Each control configuration is subjected to the following set of disturbance combinations.

Disturbance 1

Disturbances 1, 2, and 4

Disturbances 3 and 5

Disturbances 1, 2, 3, 4, and 5
Error plots for each of the four disturbance sets without feedback control are shown in Figs. (a)a – (d)d. Under only a constant disturbance (Fig. (a)a), all disturbance compensation strategies work well, since the disturbance is neither inputdependent nor dynamic. When we introduce drag, a dynamic disturbance, in disturbance set B (Fig. (b)b), only the approaches that compensate for disturbance dynamics, FF4 and FF5, achieve low error. Although basic disturbance compensation as in FF2 helps considerably, accounting for disturbance dynamics improves performance further. Since in disturbance set B, the disturbances are still inputindependent, the use of numerical optimization to solve the acceleration model (1) has no effect.
Under inputdependent disturbances, we see that FF5 is the only approach that achieves low error. This is expected, as for both disturbance sets C and D, there are dynamic and inputdependent disturbances present.
Error plots for disturbance set D with feedback control are shown in Fig. 8. We see that although feedback can reduce the error, it is not enough to completely eliminate the error. FF5 still outperforms the other methods, achieving nearly zero error in all trials.
The maximum absolute position errors over the trajectory for all tested configurations are listed in Tables II and III.
FF1  FF2  FF3  FF4  FF5  

A  
B  
C  
D 
FF1  FF2  FF3  FF4  FF5  

A  
B  
C  
D 
IiiB Hardware
IiiB1 Platform & Setup
To validate the usefulness of dynamic disturbance compensation and inputdependent disturbance compensation, we compare the five aforementioned feedforward generation strategies, FF1 through FF5, on a g quadrotor while following aggressive trajectories. Figure 9 shows the hardware platform and Fig. 1 shows the robot while following aggressive circle and line trajectories.
Position, velocity, and yaw feedback is provided by a motion capture arena at Hz, while pitch, roll, and angular velocity feedback is provided by a Pixhawk PX4 at Hz. Feedback control is performed by a cascaded PD system following [28]. The feedforward terms are as computed by FF1 through FF5. FF3 and FF5 solve (10) numerically using the NewtonRaphson method. All control computation is performed onboard the vehicle’s Odroid XU4 computer. The position control loop runs at Hz and the attitude control loop runs at Hz.
For the hardware experiments, we use three test trajectories: a s straight line trajectory, a circle trajectory, and a figure 8 trajectory. The trajectories are designed to be near the limit of what the robot can feasibly track. Table IV lists the three trajectories and their maximum absolute derivatives.
Traj.  (m)  (m/s)  (m/s)  (m/s)  (m/s) 

Line  
Circle  
Figure 8 
IiiB2 Model Learning
We use linear regression as the model learning strategy in the hardware experiment. Input data to the regression is a 6 dimensional vector consisting of the vehicle velocity and the commanded acceleration vector . The system starts with an uninitialized model and uses a few test trajectories per trial to regress to the acceleration error. The error model is then held fixed during the remaining trajectories used for error evaluation. Although in principle, the model can be updated incrementally, keeping it fixed allows for a fair comparison between the control strategies.
IiiB3 Results
For the line trajectory, each of FF1, FF2, FF4, and FF5 is evaluated four times. The first four trajectories, run using FF1, are used to train the acceleration error model. An overlay of the vehicle executing the m line trajectory can be seen in Fig. 1. Absolute errors along the trajectory and errors along the vertical axis for the line trajectory are shown in Fig. 10. FF1 performs the worst, especially along the vertical axis, indicating that the robot is underestimating the control input required to maintain hover. FF2 eliminates much of the error in the vertical axis, but still accumulates significant error along the trajectory, rising above cm consistently. FF4 and FF5 provide on average a 30% reduction in the average absolute tracking error along the trajectory when compared to FF2. This indicates that taking disturbance dynamics into account can significantly improve tracking performance. This trajectory does not provide sufficient clarity to determine the impact of FF5, inputdependent disturbance compensation.
For the circle trajectory, all of the feedforward strategies are evaluated once, with FF3 and FF5 receiving two and four more trajectories respectively. An overlay of the vehicle executing the circle trajectory can be seen in Fig. 1. Fig. 11 shows the resulting error. As expected FF1, with no disturbance compensation, performs the worst. FF2, FF4, and FF5 all perform similarly well, with FF2 achieving slightly lower vertical error than the others. FF3 performs slightly worse here than FF2, suggesting that the numerical routine may be failing to converge or that the input’s dependence on the acceleration error has not been properly modeled.
Fig. 12 shows the error of FF1, FF2, FF4, and FF5 along the figure 8 trajectory for one trial each. The improvement of FF4 over FF2 here is smaller than in the other trajectories, suggesting that dynamic disturbances have relatively less of an impact when following the figure 8, though more experimental trials are warranted to strengthen this claim.
Iv Conclusion
We have presented a method that allows compensation of dynamic disturbances through evaluation of the derivatives of a learned model. We have shown in both simulation and hardware experiments that our dynamic disturbance compensation method improves performance over traditional disturbance compensation. We have also shown the usefulness of inputdependent disturbance compensation in simulation and preliminary results on hardware. The versatility of the approach in a realistic robotics application has been verified through evaluation on three distinct test trajectories.
Future work will evaluate nonlinear regression techniques, such as ISSGPR, on hardware platforms, as well as consider regression techniques that explicitly optimize model derivative accuracy. An interesting avenue of future study is to analyze theoretically how the error model accuracy affects the performance of each of the feedforward generation strategies. Lastly, we hope to apply this technique to the attitude dynamics of multirotors, in order to fully compensate for vehicle disturbances and modeling errors.
V Acknowledgments
The authors thank Xuning Yang for helpful feedback on this manuscript.
References
 Abbeel et al. [2005] P. Abbeel, V. Ganapathi, and A. Ng. Learning vehicular dynamics, with application to modeling helicopters. In Proceedings of the 18th International Conference on Neural Information Processing Systems, NIPS’05, Cambridge, MA, USA, 2005. MIT Press. URL https://dl.acm.org/citation.cfm?id=2976248.2976249.
 Aswani et al. [2013] A. Aswani, H. Gonzalez, S. Sastry, and C. Tomlin. Provably safe and robust learningbased model predictive control. Automatica, 49(5):1216–1226, 2013. URL https://www.sciencedirect.com/science/article/pii/S0005109813000678.
 Balakrishnan [2003] S. V. Balakrishnan. Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent. In INTERSPEECH, 2003. URL https://pdfs.semanticscholar.org/ffc2/422d108b11cb8f0e947cf0f8b92c6f7607b5.pdf.
 Bangura and Mahony [2012] M. Bangura and R. Mahony. Nonlinear dynamic modeling for high performance control of a quadrotor. In Australasian conference on robotics and automation, pages 1–10, 2012. URL https://www.araa.asn.au/acra/acra2012/papers/pap121.pdf.
 Burri et al. [2016a] M. Burri, M. Bloesch, D. Schindler, I. Gilitschenski, Z. Taylor, and R. Siegwart. Generalized information filtering for MAV parameter estimation. In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on, pages 3124–3130. IEEE, 2016a. URL https://ieeexplore.ieee.org/document/7759483/.
 Burri et al. [2016b] M. Burri, J. Nikolic, H. Oleynikova, M. W. Achtelik, and R. Siegwart. Maximum likelihood parameter identification for MAVs. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 4297–4303. IEEE, 2016b. URL https://ieeexplore.ieee.org/document/7487627.
 Cappo et al. [2018] E. Cappo, A. Desai, M. Collins, and N. Michael. Online planning for humanâmultirobot interactive theatrical performance. Autonomous Robots, 42(8):1771–1786, December 2018. ISSN 09295593, 15737527. doi: 10.1007/s1051401897550. URL https://link.springer.com/10.1007/s1051401897550.
 Desaraju [2017] V. Desaraju. Safe, efficient, and robust predictive control of constrained nonlinear systems. Ph.D. Thesis, Carnegie Mellon University., 4 2017. doi: 10.1184/R1/6721379.v1. URL https://kilthub.cmu.edu/articles/Safe_Efficient_and_Robust_Predictive_Control_of_Constrained_Nonlinear_Systems/6721379.
 Droniou et al. [2012] A. Droniou, S. Ivaldi, V. Padois, and O. Sigaud. Autonomous online learning of velocity kinematics on the icub: A comparative study. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 3577–3582. IEEE, 2012. URL https://ieeexplore.ieee.org/document/6385674.
 Faessler et al. [2017] M. Faessler, D. Falanga, and D. Scaramuzza. Thrust mixing, saturation, and bodyrate control for accurate aggressive quadrotor flight. IEEE Robotics and Automation Letters, 2(2):476–482, 2017. doi: 10.1109/LRA.2016.2640362. URL https://ieeexplore.ieee.org/document/7784809.
 Faessler et al. [2018] M. Faessler, A. Franchi, and D. Scaramuzza. Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of highspeed trajectories. IEEE Robotics and Automation Letters, 3:620–626, 2018. doi: 10.1109/LRA.2017.2776353. URL https://ieeexplore.ieee.org/document/8118153.
 Ferrin et al. [2011] J. Ferrin, R. Leishman, R. Beard, and T. McLain. Differential flatness based control of a rotorcraft for aggressive maneuvers. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2688–2693, September 2011. doi: 10.1109/IROS.2011.6095098. URL https://ieeexplore.ieee.org/document/6095098.
 Florez et al. [2011] J. Florez, D. Bellot, and G. Morel. LWPRmodel based predictive force control for serial comanipulation in beating heart surgery. In 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pages 320–326, July 2011. doi: 10.1109/AIM.2011.6027055. URL https://ieeexplore.ieee.org/document/6027055.
 Gijsberts and Metta [2011] A. Gijsberts and G. Metta. Incremental learning of robot dynamics using random features. In 2011 IEEE International Conference on Robotics and Automation, pages 951–956. IEEE, May 2011. ISBN 9781612843865. doi: 10.1109/ICRA.2011.5980191. URL https://ieeexplore.ieee.org/document/5980191/.
 Gijsberts and Metta [2013] A. Gijsberts and G. Metta. Realtime model learning using Incremental Sparse Spectrum Gaussian Process Regression. Neural Networks, 41:59 – 69, 2013. ISSN 08936080. doi: https://doi.org/10.1016/j.neunet.2012.08.011. URL https://www.sciencedirect.com/science/article/pii/S0893608012002249.
 Grollman and Jenkins [2008] D. Grollman and O. C. Jenkins. Sparse incremental learning for interactive robot control policy estimation. In 2008 IEEE International Conference on Robotics and Automation, pages 3315–3320, Pasadena, CA, USA, May 2008. IEEE. ISBN 9781424416462. doi: 10.1109/ROBOT.2008.4543716. URL https://ieeexplore.ieee.org/document/4543716/.
 Hehn and DâAndrea [2015] M. Hehn and R. DâAndrea. Realtime trajectory generation for quadrocopters. IEEE Transactions on Robotics, 31(4):877–892, August 2015. ISSN 15523098. doi: 10.1109/TRO.2015.2432611. URL https://ieeexplore.ieee.org/document/7128399.
 Jones et al. [2001–] E. Jones, T. Oliphant, P. Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. URL https://www.scipy.org/.
 Kim et al. [2013] S. Kim, S. Choi, and H. J. Kim. Aerial manipulation using a quadrotor with a two DOF robotic arm. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4990–4995, November 2013. doi: 10.1109/IROS.2013.6697077. URL https://ieeexplore.ieee.org/document/6697077.
 Lee et al. [2009] D. Lee, H. Jin Kim, and S. Sastry. Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter. International Journal of Control, Automation and Systems, 7(3):419–428, June 2009. ISSN 15986446. doi: 10.1007/s1255500903118. URL https://link.springer.com/10.1007/s1255500903118.
 Lee et al. [2010] T. Lee, M. Leoky, and N. H. McClamroch. Geometric tracking control of a quadrotor UAV on SE(3). In 49th IEEE Conference on Decision and Control (CDC), pages 5420–5425, December 2010. doi: 10.1109/CDC.2010.5717652. URL https://ieeexplore.ieee.org/document/5717652.
 Li et al. [2017] Q. Li, J. Qian, Z. Zhu, X. Bao, M. Helwa, and A. Schoellig. Deep neural networks for improved, impromptu trajectory tracking of quadrotors. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 5183–5189, May 2017. doi: 10.1109/ICRA.2017.7989607. URL https://ieeexplore.ieee.org/document/7989607.
 Li and Todorov [2004] W. Li and E. Todorov. Iterative linear quadratic regulator design for nonlinear biological movement systems. In ICINCO (1), pages 222–229, 2004. URL https://homes.cs.washington.edu/~todorov/papers/LiICINCO04.pdf.
 Lupashin et al. [2010] S. Lupashin, A. Schoellig, M. Sherback, and R. D’Andrea. A simple learning strategy for highspeed quadrocopter multiflips. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 1642–1648. IEEE, 2010. URL https://ieeexplore.ieee.org/document/5509452/.
 Manchester and Kuindersma [2017] Z. Manchester and S. Kuindersma. DIRTREL: Robust trajectory optimization with ellipsoidal disturbances and lqr feedback. In Robotics: Science and Systems (RSS), 2017. URL https://doi.org/10.15607/RSS.2017.XIII.057.
 McKinnon and Schoellig [2017] C. McKinnon and A. Schoellig. Learning multimodal models for robot dynamics online with a mixture of gaussian process experts. 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 322–328, 2017. URL https://ieeexplore.ieee.org/document/7989041.
 Mellinger [2012] D. Mellinger. Trajectory generation and control for quadrotors. Publicly Accessible Penn Dissertations. 547., 2012. URL https://repository.upenn.edu/edissertations/547.
 Mellinger and Kumar [2011] D. Mellinger and V. Kumar. Minimum snap trajectory generation and control for quadrotors. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 2520–2525. IEEE, 2011. URL https://ieeexplore.ieee.org/document/5980409/.
 Michael et al. [2012] N. Michael, S. Shen, K. Mohta, Y. Mulgaonkar, V. Kumar, K. Nagatani, Y. Okada, S. Kiribayashi, K. Otake, K. Yoshida, K. Ohno, E. Takeuchi, and S. Tadokoro. Collaborative mapping of an earthquakedamaged building via ground and aerial robots. Journal of Field Robotics, 29(5):832–841, September 2012. ISSN 15564959. doi: 10.1002/rob.21436. URL https://doi.wiley.com/10.1002/rob.21436.
 Morrell et al. [2018] B. Morrell, M. Rigter, G. Merewether, R. Reid, R. Thakker, T. Tzanetos, V. Rajur, and G. Chamitoff. Differential flatness transformations for aggressive quadrotor flight. In 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018. doi: 10.1109/ICRA.2018.8460838. URL https://ieeexplore.ieee.org/document/8460838.
 NguyenTuong and Peters [2011] D. NguyenTuong and J. Peters. Model learning for robot control: a survey. Cognitive Processing, 12(4):319–340, November 2011. ISSN 16124782, 16124790. doi: 10.1007/s1033901104041. URL https://link.springer.com/10.1007/s1033901104041.
 Rahimi and Recht [2008] A. Rahimi and B. Recht. Random features for largescale kernel machines. In Advances in Neural Information Processing Systems, pages 1177–1184, 2008. URL https://papers.nips.cc/paper/3182randomfeaturesforlargescalekernelmachines.pdf.
 Richter et al. [2016] C. Richter, A. Bry, and N. Roy. Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. In Robotics Research: The 16th International Symposium ISRR, volume 114, pages 649–666. Springer International Publishing, Cham, 2016. ISBN 9783319288727. doi: 10.1007/9783319288727˙37. URL https://doi.org/10.1007/9783319288727_37.
 Rivera and Sawodny [2010] G. Rivera and O. Sawodny. Flatnessâbased tracking control and nonlinear observer for a micro aerial quadcopter. AIP Conference Proceedings, 1281(1):386, September 2010. ISSN 0094243X. doi: 10.1063/1.3498483. URL https://aip.scitation.org/doi/10.1063/1.3498483.
 Schaal et al. [2002] S. Schaal, C. Atkeson, and S. Vijayakumar. Scalable techniques from nonparametric statistics for real time robot learning. Applied Intelligence, 17(1):49–60, Jul 2002. ISSN 15737497. doi: 10.1023/A:1015727715131. URL https://doi.org/10.1023/A:1015727715131.
 Svacha et al. [2017] J. Svacha, K. Mohta, and V. Kumar. Improving quadrotor trajectory tracking by compensating for aerodynamic effects. In 2017 International Conference on Unmanned Aircraft Systems (ICUAS), pages 860–866, June 2017. doi: 10.1109/ICUAS.2017.7991501. URL https://ieeexplore.ieee.org/document/7991501.
 Tal and Karaman [2018] E. Tal and S. Karaman. Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. In 2018 IEEE Conference on Decision and Control (CDC), pages 4282–4288, 12 2018. doi: 10.1109/CDC.2018.8619621. URL https://ieeexplore.ieee.org/document/8619621.
 Van Nieuwstadt and Murray [1998] M. Van Nieuwstadt and R. Murray. Realtime trajectory generation for differentially flat systems. Int. J. Robust Nonlinear Control, 8(11):995–1020, September 1998. ISSN 10991239. doi: 10.1002/(SICI)10991239(199809)8:11¡995::AIDRNC373¿3.0.CO;2W. URL https://onlinelibrary.wiley.com/doi/10.1002/(SICI)10991239(199809)8:11<995::AIDRNC373>3.0.CO;2W/abstract.
 Vijayakumar et al. [2005] S. Vijayakumar, A. D’Souza, and S. Schaal. Incremental online learning in high dimensions. Neural Computation, 17(12):2602–2634, 2005. doi: 10.1162/089976605774320557. URL https://doi.org/10.1162/089976605774320557.