Cascaded Gaussian Processes for
Data-efficient Robot Dynamics Learning
Motivated by the recursive Newton-Euler formulation, we propose a novel cascaded Gaussian process learning framework for the inverse dynamics of robot manipulators. This approach leads to a significant dimensionality reduction which in turn results in better learning and data efficiency. We explore two formulations for the cascading: the inward and outward, both along the manipulator chain topology. The learned modeling is tested in conjunction with the classical inverse dynamics model (semi-parametric) and on its own (non-parametric) in the context of feed-forward control of the arm. Experimental results are obtained with Jaco 2 six-DOF and SARCOS seven-DOF manipulators for randomly defined sinusoidal motions of the joints in order to evaluate the performance of cascading against the standard GP learning. In addition, experiments are conducted using Jaco 2 on a task emulating a pouring maneuver. Results indicate a consistent improvement in learning speed with the inward cascaded GP model and an overall improvement in data efficiency and generalization.
Accurate inverse dynamics enables accurate control and thus several of robotics researchers have considered this problem. When the robot’s rigid-body parameters are known precisely, such as through System Identification [wu2010overview], the manipulator dynamics equations can be formed, allowing us to express the inverse dynamics as a linear algebraic problem [featherstone2016dynamics]. Recursive Newton-Euler (NE) algorithms have been shown to accurately and efficiently solve the inverse dynamics problem in this case. However, robots today are only imperfectly calibrated, and tend to change dynamics frequently as they interact with the world (e.g., carrying loads, using new end effectors, interacting with humans).
Learning approaches, in principle, allow the quick adaptation of inverse dynamics to observed conditions, but present many obstacles such as the need to capture sufficient data and computational expense. Gaussian Processes (GP) have frequently been used in this context, as their non-parametric nature provides strong performance across a variety of robots with little human knowledge required [nguyen2009model, nguyen2011model, vinogradska2017stability]. It is feasible to learn during live operation of robots and adapt to changing situations [deisenroth2015gaussian]. The primary downside of GPs for use in robotics is their computational scaling as a function of training data size. Although many methods exist to reduce this complexity, such as sparse pseudo-inputs [snelson2006sparse], sparse spectrum GPs [quia2010sparse] and Bayesian Neural Networks [galMCDropout], each of these approaches still scales with the underlying task complexity. A more intrinsically efficient representation of the inverse dynamics learning problem allows for a simpler learning and requires less data.
This paper adapts GPs to learn the inverse dynamics of a robotic arm from data in a fashion that respects our knowledge of the underlying topology by cascading the learning along the kinematic chain with the same factorization used in a NE solver, as illustrated in Figure 1. Specifically, the torque output by the GP at each joint is used as an input for the subsequent joint, capturing the dynamic coupling from all previous joints. This cascaded approach utilizes many fewer input dimensions, reducing computation, but more importantly, it re-uses the results and requires learning simpler functions at each joint. Following the forward and backwards recursion of NE, we attempt both inwards and outwards cascaded versions of our GP learner.
Rather than discarding the classical inverse dynamics solution, we follow many previous approaches and combine parametric models with non-parametric models, resulting in a semi-parametric modeling framework [nguyen2010using, um2014independent, romeres2016online, reinhart2017hybrid, dallasemi, wu2012semi]. Specifically, we utilize the classical inverse dynamics solution as our learner’s mean function. This approach tends to achieve better generalization, data-efficiency, and faster learning due to its utilization of prior knowledge. Finally, we employ the recently-proposed derivative-free features [romeres2019derivative] as alternative to the standard derivative-based variables (velocity and acceleration) as inputs to the inverse dynamics model. We evaluate our proposed approach with extensive experiments carried out on a Kinova Jaco 2 six-DOF arm and using the public SARCOS dataset [vijayakumar2002statistical].
We continue with Section II, which provides a background on model-based torque controllers and inverse dynamics learning. Section III describes the cascaded Gaussian process formulation while Section IV evaluates this method by presenting the details behind the experiments and the obtained results. Finally, section V summarizes our work.
Ii-a Model-based Torque Control
Model-free control schemes (e.g. independent-joint control and PID control) are only capable of controlling a robotic arm through set-point regulation. When used for trajectory tracking applications, these methods require intermediate set points, resulting in considerable delays and jerky motions. On the other hand, model-based controllers can make use of the inverse dynamics model to cancel out non-linearities and to achieve zero-error tracking performance [chung2016motion].
The dynamics model of a robotic manipulator with DOF is described by Lagrange’s equation of motion [featherstone2016dynamics]:
where is the column vector of joint positions, is the inertia matrix, represents Coriolis and centrifugal forces, denotes gravity forces, is the column vector of joint control inputs to be computed, and is the additional torque components due to friction and disturbances.
Feedforward torque control aims to cancel the nonlinear terms and decouple the dynamics of each link based on the theory of feedback linearization. The control signal, Equation (2), consists of a feedforward term computed using the inverse dynamics model and a feedback term typically on joint positions and velocities [khosla1988experimental]:
where and are the desired joint positions and velocities.
A major drawback of this parametric algorithm is the considerable degradation of controller performance in the case of an imprecise inverse dynamics model. The limitations of parametric approaches to capture friction, nonlinearities, input disturbances and robot interactions strongly motivate the need for a data-driven learning scheme.
Ii-B Gaussian Processes for Model Learning
Following the notation used in [rasmussen2004gaussian], a non-parametric model represented by Gaussian process regression is written as , where , and are respectively the input, mean and covariance.
The kernel is responsible for the general shape of the sampled functions from the GP distribution and it has a large impact on the performance of the model learning. The Matern kernel is often recommended in robotics applications for two reasons. First, unlike the Squared Exponential kernel, it does not suffer from concentration of measure for high dimensional inputs and second, it is not infinitely differentiable [rasmussen2004gaussian] which makes it a more realistic approach in robotics applications. The hyperparameters of the GP, lengthscale (), variance () and the noise variance are obtained by maximizing the log marginal likelihood using gradient based optimization algorithms.
While a non-parametric inverse dynamics model is a direct map from joint states () to joint torques, a semi-parametric model can be thought of as a combination of a rigid body dynamics (RBD) and an error term:
Here, the parametric component, left side of Equation (1), provides a baseline for the predicted torque while the non-parametric term is responsible for unmodeled dynamics of the system and the interactions of the robot.
Assuming the robot has degrees of freedom and predicts the torque for joint , the standard non-parametric model developed by previous authors uses as the input and as the output. On the other hand, the non-parametric part of a semi-parametric model keeps the same input but employs as the output.
Proposed in [nguyen2010using], a straightforward method to integrate the RBD function into the GP framework is by setting it as the mean function, thus making the GP biased towards the prior knowledge available from the RBD model. It is often recommended to use a zero mean function for the GP, since addition of a non-zero constant to the GP posterior would negatively affect the refined uncertainty predictions [rasmussen2004gaussian]. Nevertheless, we believe that the semi-parametric modeling framework has other advantages in the context of robot learning that justifies the use of non-zero mean functions. Using this approach, the torque of joint for query point can be predicted by:
Equation (4) indicates the fact that the GP is mainly responsible for modeling the error between the RBD model and actual torque measurements, because if this error is close to zero, the second term would become negligible and the predicted torque is based on the RBD function. Furthermore, according to Equation (4), if the query point is far from the training data , the resulting covariance matrix will become almost zero and the torque prediction will be a function of the RBD model only. This is in fact the reason for better generalization of semi-parametric models compared to non-parametric ones [nguyen2010using].
Iii Cascaded Gaussian Processes for Inverse Dynamics Learning
Iii-a Cascaded Formulation
We propose cascaded Gaussian processes as a novel data-efficient approach for inverse dynamics learning. The basic idea is to reduce the dimensions of the regression problem recursively by using the output of a GP trained on one joint as a substitute for all previous joint states, thus reducing the dimensions of the feature space. The formulation of our cascaded GP is inspired by the recursive NE algorithm for computing the inverse dynamics of robotic manipulators. The recursive NE method is comprised of two main steps; first, an outward recursion from the base of the robot to compute link velocities and accelerations based on robot kinematics and joint states. Total forces on each link are also calculated in this step using the NE equations of motion. Second, during an inward recursion, values for joint torques are computed by balancing the forces acting on each link [featherstone2016dynamics].
Figure 2 shows the free body diagram of link ; is the angular velocity of link , is the axis of joint , and and are respectively the force and moment applied from link to link . As is calculated during the outward recursion and and are available from the inward recursion step, one can solve for and using the Euler equations of motion:
Standard GP models (Figure 3c), discussed in Section II-B, are trained with the states of all joints, thus resulting in a high dimensional feature space. In contrast, the proposed cascaded approach uses joint torques as a proxy for joint states. The cascading GP formulation can be implemented in either non-parametric or semi-parametric modeling paradigm.
We investigate two embodiments of the cascaded GP formulation, referred to as inward and outward cascaded GP. Our inward cascaded GP (Figure 3a) is based on Equation (5) in which during an inward recursion, the torque of joint predicted by its corresponding GP can be utilized instead of the states of the outward joints to , allowing for a reduced feature space .
The outward cascaded GP (Figure 3b), on the other hand, starts from the base of the robot and reduces the dimensions of the feature space during an outward recursion procedure. Thus, the reduced feature space for joint becomes .
The cascaded GP formulation has two main advantages over the standard inverse dynamics learning schemes. First, our method is able to generalize more effectively. According to Equation (5), the force and torque can summarize the impact of all the outward joints; however, measuring these values requires installing force/torque sensors on each joint of a robot. Therefore, we hypothesize that a single joint torque value, which is readily available on most of the robots, can summarize the impact of all the outward joints to some extent; giving our model access to the estimate of this torque avoids the need to re-discover this pattern from training data. Compared to the regression in joint states used by standard methods, which requires observing a complete set of states covering most of the workspace, the cascaded approach better matches the topology of the arm; hence, the learning is more data-efficient and the learned model can generalize better.
Second, as a result of the reduced dimensionality of the GP inputs in the cascaded approach, several computational tasks are more efficient. During prediction, the computation of the GP scales with the number of state dimensions so our cascaded approach requires less computation. Further, optimization of the hyperparamaters becomes a significantly simpler task in lower dimensions and when the input states are more uniformly correlated to the outputs. Specifically, computing the GP’s marginal likelihood and its derivative scales as , with the number of training points, so the data efficiency gained by our method translates into faster learning.
Iii-B Derivative-Free Features
Utilizing derivative-free features in the context of inverse dynamics has been proposed in [romeres2019derivative] as a means to address the noisy nature of numerical differentiation, an inevitable step to obtain joint velocities and accelerations. Assuming we have access to the previous joint positions, there are numerous ways to incorporate the state history as a feature. In a study reported in [romeres2019derivative], the authors concluded that derivative-free features with reduced rank provide well-rounded performance. In this method, a smaller number of features is chosen to compress the information within the position history. Physics suggests that 3 elements (position, velocity and acceleration) suffice to define a state; in addition, according to the empirical results in [romeres2019derivative], is the optimal choice for inverse dynamics learning. Therefore, we have set in this paper for the implementation of derivative-free features:
where is the reduced rank derivative-free feature of joint , is the history of joint positions, and is a fully parameterised matrix. Guidelines for designing can be found in [romeres2019derivative], but in the present implementation, we set and .
|\clineB2-203||Model||Joint 1||Joint 2||Joint 3||Joint 4||Joint 5||Joint 6|
The values report an average of error over a 1-second interval (e.g. 1-2 second, 4-5 second, 9-10 second) of training data. The yellow highlighted regions present the lowest error among the models of the same category (non-parametric and semi-parametric), and the light yellow highlighted regions indicate an almost similar performance (up to 5 percent difference) among the models.
In this section, we evaluate the performance of the proposed cascaded Gaussian process approach in terms of data-efficiency, generalization and learning efficiency. Two evaluation scenarios are considered here: first, a quantitative evaluation with a standard training-testing procedure on subsets of a dataset. Second, a qualitative evaluation using task-based training episodes in which the robot attempts to learn the dynamics of various tasks while the data is transferred from one to another.
The experiments are carried out on Kinova Jaco 2 robot, a lightweight six-DOF robotic manipulator with a maximum payload of 1.6 kg and a reach of 90 cm. The robot is equipped with joint encoders and joint torque sensors, and it can be monitored and commanded through a ROS (Robot Operating System) [quigley2009ros] package. The proposed algorithms as well as the feedforward torque controller have been implemented in Python as a ROS package on top of Kinova API. Due to the connectivity limitations of the robot, the torque controller runs at 40 Hz. Prior to applying the controller on the actual robot, it has been thoroughly tested in Gazebo, an open-source simulation environment [koenig2004design].
For implementation of our cascaded Gaussian process framework, we used GPy111https://github.com/SheffieldML/GPy, an open-source Gaussian process library in Python. In addition, PyBullet222https://github.com/bulletphysics/bullet3, a Python implementation of Bullet Physics Engine, has been utilized for parametric inverse dynamics modeling of Kinova Jaco 2 arm. Our model learning framework learns offline but predicts the torques online.
Iv-B General Evaluation
The aim of this set of experiments is to provide a quantitative evaluation of data-efficiency and accuracy of our proposed cascaded method. Two datasets have been used:
Our own dataset collected on Kinova Jaco 2 arm and referred to as the Jaco dataset.
The publicly available SARCOS dataset 333SARCOS dataset is available at http://www.gaussianprocess.org/gpml/data/ [vijayakumar2002statistical].
Every model variant has been evaluated on the Jaco dataset. However, due to the discrepancies found in some of the results, SARCOS dataset was also used to evaluate the non-parametric models.
In the Jaco dataset, the training set consists of 600 seconds of velocity controlled robot motion subsampled at 10 Hz and divided into 10 subsets. The motion for each subset has been generated by commanding each joint rate as a summation of multiple sinusoidal functions with random amplitudes, frequencies and phase angles. Therefore, the training set ensures a wide variety of joint states and robot configurations. Moreover, the test dataset consists of three one-minute motions, distinct from the training set, each collected with randomly sampled amplitudes, frequencies and phase angles.
The SARCOS dataset consists of 50,000 data points collected at 50 Hz from various movement patterns on the seven-DOF anthropomorphic SARCOS robot. The test set includes 10 percent of the data. For consistency with the Jaco dataset, we have subsampled the SARCOS dataset at 10 Hz and divided the training set into 9 subsets.
Since our main goal is to improve the data-efficiency of dynamics learning, we train our model on a series of ever-increasing datasets, evaluating each resulting learned model on the test set. This allows evaluation of how much data each method requires before achieving low error.
Table I shows the evaluated normalized root-mean-square error (nRMSE = RMSE/range) of torque predictions on the Jaco dataset of all models for three time stamps (averaged on ranges 1-2s, 4-5s, and 9-10s) of training time, whereas Figures 4 and 5 present the nRMSE of torque prediction as a function of data acquisition duration for the first 50 seconds of the data acquisition for non-parametric and semi-parametric models, evaluated on the Jaco dataset. Finally, Figure 6 reports the results obtained from non-parametric models on the SARCOS dataset.
The results show that in the case of non-parametric models (Figures 4 and 6), cascaded models in general outperform standard models in terms of data efficiency, even though these models have access to a far smaller set of robot states. Overall, the inward cascaded model, which is consistent with the direction of the recursive NE method solution for the inverse dynamics torques, provides more accurate predictions compared to the outward cascaded.
Nonetheless, outward cascaded model outperforms its inward counterpart on joint 4 in the Jaco dataset and the last joint in both of the datasets. The discrepancy of joint 4 can be explained by the intrinsic properties of the Jaco 2 arm. As Figure 7 suggests, due to the strong coupling between joints 4 and 5, the torque of each joint is more correlated to the other rather that its own states; therefore, not having access to states of joint 5, the inward cascaded model yields less accurate predictions for joint 4. The fact that this discrepancy is not present in the results obtained from the SARCOS dataset (Figure 6), confirms the data correlation analysis.
The discrepancy of the last joint can be explained by the fact that the torque on the wrist of the robot is only a function of its own states, as it is the last joint of the open-chain robot arm topology. Hence, the outward cascaded model, which has access only to the states of this joint and torque of the previous one, learns the torque in a much smaller feature space, resulting in more accurate and data-efficient predictions.
The same general trends can be observed in the case of semi-parametric models (Figure 5). However, the improvement provided by cascaded modeling is less significant as the rigid body dynamics model used as the mean function already reduces learning complexity to some extent. The comments made earlier regarding joints 4 and 6 on the Jaco dataset and joint 7 on the SARCOS dataset apply to semi-parametric models as well.
Finally, as already addressed in [romeres2019derivative], derivative-free features (Table I) help with the transient response of the GP models (i.e., the prediction accuracy of the model within the first seconds of training data), as they do not rely on the noisy and delayed nature of numerical differentiation. However, this improvement is limited to the first few seconds of data acquisition and as the number of data points increases, derivative-free features lack the generalization of derivative-based features.
Iv-C Learning Efficiency
The hyperparameters of a Gaussian process are most often obtained by maximizing the log marginal likelihood. Gradient based methods are usually used to solve this non-convex optimization problem. In high-dimensional systems the solution may converge to a local optimum that is far from the ideal solution. In order to prevent this, multiple restarts of the optimization procedure are required. Furthermore, since computing both the log marginal likelihood and its gradient has the computational complexity of , solving for the optimal hyperparameters requires a significant amount of time for large datasets.
First, the data-efficiency of the cascaded approach helps to decrease the size of the dataset required to learn a reliable model and thus decreases the training time significantly. Second, as a result of the reduced dimensionality of the cascaded GP, the total number of hyperparameters is decreased, leading to a simpler optimization problem. Consequently, log marginal likelihood is maximized with significantly fewer iterations. Figure 8 compares the average number of iterations to train cascaded and standard GP models with 500 data points. As expected, cascaded models converge quicker on the joints with smaller feature state: joints closer to the base for the inward cascaded and joints closer to the end-effector for the outward cascaded model.
Iv-D Task Based Evaluation
As a qualitative evaluation of our proposed cascaded modeling framework, we have applied it to a series of “pouring tasks” in which the torque controlled robot is required to move a bottle and tip it to allow marbles to roll out of the bottle into a jar. Each task attempts to explore a different part of the workspace while the data and the learned model is transferred between the tasks. As the parametric model is not aware of the unknown load carried by the robot, the baseline performance is limited. However, cascaded GP modeling proposed here is capable of learning the dynamics of each task within the first few attempts and it can generalize well to the tasks it has already observed. The results can be seen in the supplementary video 444https://youtu.be/il-wWuyRZIY.
In this paper, we introduced the cascaded Gaussian process as a novel approach for inverse dynamics learning. The proposed method actively reduces the number of dimensions of the feature space by substituting joint states with joint torques. We explored two formulations of the cascaded Gaussian process, namely inward and outward models. In general, the inward models outperform the outward models as suggested by their consistency with the recursive Newton-Euler method. Overall, cascaded models provide a more data-efficient and faster training compared to the standard models; this improvement is more evident in the case of non-parametric models. Moreover, experimental results for a pouring task confirm the data-efficiency of the cascaded modeling.
As a final wrap-up of the study on the variations of inverse dynamic model learning schemes, we would suggest semi-parametric cascaded GPs with derivative-based features. The semi-parametric variant and cascaded approach both help with the generalization and data-efficiency of the model. Although derivative-free features improve the consistency of the results by omitting the inherent noise present in the differentiation procedure, this improvement is limited to the first few seconds of data acquisition and comes with the cost of reducing the generalization of the learned model.
This work was supported by the National Sciences and Engineering Research Council (NSERC) Canadian Robotics Network (NCRN).