A Learning from Demonstration Approach fusing Torque Controllers
Abstract
Torque controllers have become commonplace in the new generation of robots, allowing for complex robot motions involving physical contact with the surroundings in addition to task constraints at Cartesian and joint levels. When learning such skills from demonstrations, one is often required to think in advance about the appropriate task representation (usually either operational or configuration space). We here propose a probabilistic approach for simultaneously learning and synthesizing control commands which take into account task, joint space and force constraints. We treat the problem by considering different torque controllers acting on the robot, whose relevance is learned from demonstrations. This information is used to combine the controllers by exploiting the properties of Gaussian distributions, generating torque commands that satisfy the important features of the task. We validate the approach in two experimental scenarios using 7DoF torquecontrolled manipulators, with tasks requiring the fusion of multiple controllers to be properly executed.
I Introduction
The field of Learning from Demonstration (LfD) [1] aims for a userfriendly and intuitive humanrobot skill transfer. Generally, when modeling demonstrated skills, one must think in advance about the relevant variables to encode. The selection of these variables, strongly depends on the task requirements, with the representation of the skills usually being in either operational or configuration spaces. The a priori definition of the relevant space may require considerable reasoning or trialanderror to achieve successful movement synthesis, which contradicts the LfD concept. This process becomes even more cumbersome when the robot is required to physically interact with the environment, thus introducing additional task constraints such as contact forces. Consider the example shown in Fig. 1, where a robot is first required to learn how to apply a contact force with the endeffector, and then must perform a configuration space movement. In this case, encoding demonstrations in either operational or configuration spaces alone will not result in proper execution.
We here propose an approach for simultaneously learning different types of task constraints and generating torque control commands that encapsulate the important features of the task. Figure 2 gives an overview of the approach. We treat the problem by considering different torque controllers acting on the robot, with each one being responsible for the fulfillment of a particular type of constraint (e.g. desired contact forces, Cartesian/joint positions and/or velocities). We discuss such controllers in Section III. From demonstrations of a task, we propose to learn the importance of each controller using probabilistic representations of the collected data (Section V). We then exploit a set of linear operators, defined for each individual controller, that take into account the state of the robot and contact with the environment to transform the control references into torque commands, with associated importance. Finally, we combine all the constraints, represented as independent Gaussiandistributed torque references, through a fusion of controllers, carried out by a product of Gaussians (Section IV). We hence obtain a final torque reference that is used to control the robot.
Our contribution with respect to the state of the art is threefold:

A probabilistic formulation for jointly learning torque controllers from demonstrations, by exploiting the properties of Gaussian distributions.

The learning of forcebased tasks in operational space, in addition to Cartesian and joint constraints.

An approach that is compatible with various probabilistic learning algorithms that generate Gaussian distributed references or trajectories.
The proposed approach is evaluated in two scenarios with 7DoF torquecontrolled robots (Section VI). In the first case, we use a cocktail shaking task, employing force control, to demonstrate that the approach can accommodate both force and position/velocitybased skills. The second scenario shows that the approach can also be used to combine partial demonstrations, allowing for demonstrating each controller subtask independently, in different regions of the workspace.
Ii Related Work
The problem of combining controllers can be broadly divided into two types of approaches. In [2, 3, 4], the authors use a weighted combination of individual torque controllers in humanoid robots, with each controller responsible for a particular subtask (e.g. balance, manipulation, joint limit avoidance). Other works frame the problem as a multilevel prioritization [5, 6], where lower importance tasks are executed without compromising more important ones, typically in a hierarchical manner with a null space formulation. As a result, tasks with low importance are only executed if they do not affect high priority ones, potentially requiring platforms with a high number of degrees of freedom. Both kinds of approaches have their own merits, with the former allowing for a more flexible organization of tasks as well as smooth transitions between them (according to their weight profiles) and the latter ensuring that high priority tasks are always executed.
In contrast to manually setting weights [2], in this paper we are interested in learning them from human demonstrations. Learning controller importance has been addressed in different manners, from reinforcement learning (RL) [3, 4, 7] to LfD [8, 9]. The main differences between these two branches lie on the type of prior knowledge, with RL requiring a priori information in the form of reward or cost functions – which can be hard to formulate in some cases – and LfD approaches demanding task demonstrations. The present work shares connections with [8, 9], where the problem of combining constraints in task and joint space is addressed. The first important difference is that such approaches exploit velocity controllers, which take into account only kinematic constraints. In this work, however, we consider torque controllers, allowing for: (i) a straightforward consideration of contact forces at the endeffector, which facilitates the transfer of skills that involve physical humanrobot interaction, and (ii) the exploitation of the compliance capabilities of modern robots. The second relevant difference is that [8, 9] model data using Gaussian Mixture Models (GMM), while here, although GMM are used as an example, we generalize the problem to a wider range of probabilistic modeling approaches, by exploiting the particularities of each approach. In particular, we show that the probabilistic combination of controllers can be generalized to any trajectory modeling technique that generates Gaussiandistributed outputs (e.g. Gaussian Process Regression (GPR) [10], Probabilistic Movement Primitives (ProMP) [11]).
One relevant and recent work in learning torque controllers is that of Deniša et al. [12], who introduced the concept of Compliant Movement Primitives. Such primitives consist of a tuple comprised of Dynamic Movement Primitives (DMP) [13] associated with the task (operational or configuration space trajectories) and with the task dynamics torques (e.g. related to object mass). Here we consider torque controllers that track trajectories in either joint or task space (both positions and forces) and intentionally overlook the task dynamics. Moreover, the probabilistic nature of our method provides essential information for controller fusion, in the form of covariance matrices, which is unavailable in DMP.
Iii Torque Controllers – configuration and operational space
Inspired by works in which a combination of torque controllers results in a flexible importance assignment and smooth transitions between different tasks [2, 3, 4], we propose a strategy where the controller combination is learned from demonstrations. In this section we define the individual controllers that we exploit for configuration and operational space control. Formally, we follow a modelbased approach to control the robot using torques, by assuming a rigidbody system with joints whose dynamics are given by , where denote joint angles, velocities and accelerations, and , correspond to the inertia matrix, Coriolis and gravity terms, respectively. The total torques acting on each joint are given by .
Robot control is achieved using a torque command , formed from a taskrelated term and a term that compensates for the dynamics of the robot , i.e.,
(1) 
In this work we are interested in fusing controllers that fulfill different task requirements, thus we focus on the term . Here, when referring to tasks, we are concerned with the tracking of reference trajectories (e.g. positions, forces).
The definition of depends on the space where tasks are represented. For instance, when a task requires the manipulation of an object (e.g. pick and place), must be defined such that position and orientation constraints at the endeffector are fulfilled with respect to the object or other landmarks in the robot’s workspace. If, additionally, manipulation requires physical contact (e.g. object insertion, cooperative handling of objects), must also accommodate desired interaction forces. In other applications, where gestures or specific configurations of the kinematic chain are required, is more adequately formulated as a configuration space controller. We now describe the controllers that we exploit for the different types of tasks, denoting simply by .
Iiia Configuration Space Controller
Configuration space controllers are used to track joint positions and velocities. Here we exploit proportionalderivative (PD) controllers of the form
(2) 
where are joint stiffness and damping gain matrices, and are the current and desired joint positions and velocities. An additional feedforward term is often added to (2), for improved tracking performance, as in [14]. As we shall see, it is straightforward to accommodate this term in our approach, if required.
IiiB Position Controller in Operational Space
Operational space controllers are aimed at tracking Cartesian poses with the endeffector of the robot. Here, we consider the case of tracking position references, but the approach remains valid for the consideration of orientations. We assume that the endeffector of the robot is driven by a force, that is proportional to the output of a PD controller,
(3) 
where is the Cartesian inertia matrix of the endeffector, whose positions and velocities (current and desired) are, respectively, denoted by (with as the dimension of the operational space). The Jacobian matrix , gives the differential kinematics of the robot’s endeffector and are Cartesian stiffness and damping gain matrices. The endeffector force is converted to joint torques as in [14],
(4) 
IiiC Force Controller
In this case we consider a proportional controller that tracks a desired force at the endeffector (see [15], Ch. 11):
(5) 
where are current and desired contact forces (measured using a F/T sensor at the endeffector), and (4) is used to map the force command at the endeffector to joint torques
Iv Probabilistic Torque Controllers
In this section, we formalize the fusion of torque controllers as an optimization problem and lay out the probabilistic treatment of control commands. Let us consider a robot employing controllers – as those defined in Section III – at any given moment, corresponding to different subtasks that can be executed in series or in parallel. Each controller generates a torque command . Also, let us assume we have access to a precision matrix (which will be explained in Section IVB), denoted by , providing information about the respective importance of the different controllers. We formalize the problem of fusing control commands as the optimization
(6) 
whose objective function corresponds to a weighted sum of quadratic error terms, with the weight of each command given by full matrices . The solution and error residuals of (6) can be computed analytically, and correspond to the mean and covariance matrix of a Gaussian distribution given by the product of Gaussians, with means and covariance matrices ,
(7) 
where . The connection between the solution of (6) and the product of Gaussians (7) allows us to exploit the structure of the controllers defined in Section III to fuse torque control commands, given Gaussiandistributed references. In particular, this is achieved by taking advantage of the linearity of the controllers (Section IVA) in combination with the linear properties of Gaussians (Section IVB).
Iva Linear controller structure
Control commands (2)–(5) are linear with respect to the reference trajectories. The controller equations can thus be rewritten in a way that highlights this linear structure. For the joint space torque controller (2) we obtain
(8) 
where and . Similarly, the Cartesian position and force controllers (4)–(5) can be formulated as , with , , and , with and . Note that linearity also applies if feedforward terms are included in the controllers, e.g. , in which case these terms simply need to be included in the reference vector and can be extended with the identity matrix, e.g. and , for a configuration space controller.
IvB From probabilistic references to probabilistic torques
Gaussian distributions are popular in robot learning and control due to their properties of product, conditioning and linear transformation. Here, we consider Gaussiandistributed control references and exploit the previously defined linear operators to formulate probabilistic torque controllers. Let us first consider a configuration space controller, with desired joint state , where and are the mean and covariance matrix of a Gaussian, modeling the probability distribution of joint positions and velocities. Per the linear properties of Gaussian distributions, the configuration space controller (8) yields a new Gaussian with mean and covariance given by
(9) 
Similarly, for and , we obtain
(10) 
and
(11) 
respectively. This type of controller has a probabilistic nature as the torque commands are generated from Gaussian distributions and result in new Gaussians. We therefore refer to them as probabilistic torque controllers (PTC).
A generic PTC, , is thus fully specified by
(12)  
where denotes a generic control reference. Note that the set of linear parameters is permanently updated, for each controller, during execution, as it depends on the state of the robot and its interaction with the environment through , and .
A probabilistic representation of trajectories using Gaussian distributions (12) has the advantage of modeling the second moment of the data in the form of covariance matrices. This is exploited here to express the importance of each controller as a function of the covariance matrix of the corresponding reference trajectory , through
(13) 
Note that is typically nonsquared. This operator maps constraints from spaces with different dimensions (e.g. configuration and operational spaces) into a common space, that of torque commands.
With the definition of in (13), torque commands can be combined using (7). The problem of learning control commands and their respective importance is thus framed as the learning of reference trajectories as Gaussian distributions , and generating Gaussiandistributed torque commands , which encapsulate the control reference and its importance with respect to other controllers. In previous work, controller weights are either set empirically [2] or optimized through reinforcement learning [3, 4]. In contrast to these works, we employ probabilistic regression algorithms to learn , and consequently , from human demonstrations.
V Learning control references from demonstrations
In Section IV, we formalized our approach for combining controllers. Here we show how the Gaussian modeling of trajectories can be learned from demonstrations. Several regression methods exist for this purpose, each offering different advantages; see [16] for a review. Two popular approaches are GMM, combined with Gaussian Mixture Regression [17], and GPR [10]. We now review these two techniques, and expand on their use in the context of PTC.
Va Gaussian Mixture Model/Gaussian Mixture Regression (GMM/GMR)
We consider demonstration datasets comprised of datapoints organized in a matrix . Each datapoint is represented with input/output dimensions indexed by , , so that with . It can for example represent a concatenation of time stamps with endeffector poses, joint angles or measured forces. A GMM, encoding the joint probability distribution with states and parameters (respectively the prior, mean and covariance matrix of each state ), can be estimated from such a dataset through ExpectationMaximization (EM) [17]. After a GMM is fitted to a given dataset, GMR can subsequently be used to synthesize new behaviors, for new inputs , by means of the conditional probability , yielding a normallydistributed output ; see [17] for details.
We exploit GMM/GMR to estimate desired trajectories for each controller through the mean , as well as their importance through the covariance matrix . In GMM/GMR, covariance matrices model the variability in the data, in addition to the correlation between the variables. Figure 2(a) illustrates this aspect, where we see that the variance regressed by GMR (shown as an envelope around the mean) reflects the datapoint distribution in the original dataset. In the context of PTCs, high variability in the demonstrations of the th controller results in large covariance matrices . From (13), it follows that the corresponding controller precision matrix will be small and, thus, the control reference will be tracked less accurately. GMM/GMR is, hence, an appropriate technique to select relevant controllers based on the regularities observed in each part of the task throughout the different demonstrations.
VB Gaussian Process Regression (GPR)
A Gaussian Process (GP) is a distribution over functions, with a Gaussian prior on observations given by , where is a vectorvalued function yielding the mean of the process, denotes its covariance matrix and is a concatenation of observed inputs. The covariance matrix is computed from a kernel function evaluated at the inputs, with elements . Several types of kernel functions exist; see e.g., [10].
Standard GPR allows the prediction of a scalar function . In robotics, one typically requires multidimensional outputs, thus GPR is often employed separately for each output of a given problem. Here we follow this approach to probabilistically model multidimensional reference trajectories, such as those of joint angles or Cartesian positions. For each input point , the prediction of each output dimension is thus given by
(14)  
(15) 
where is the observed th output dimension, , , , , , and is an additional hyperparameter modeling noise in the observations (which acts as a regularization term). We concatenate the predictions into one single multivariate Gaussian with mean and covariance matrix given by
(16) 
Since output dimensions are modeled separately, GPR predictions are, in the standard case, uncorrelated, which is evident from the structure of in (16). In contrast to GMR, the estimated variance in GPR is a measure of prediction uncertainty. Figure 2(b) illustrates this aspect, with the variance increasing with the absence of training data (). This provides a way of assigning importance to predictions, when different observations of a task occur. We propose to exploit GPR if demonstration data is incomplete or scarce and, in particular, for partially demonstrating a task to each controller as separate subtasks.
The overall approach is summarized in Algorithm 1 for GMM or GP as trajectory modeling techniques.
Vi Evaluation
We assess the performance of the proposed framework in two different tasks. In one case, we exploit the variability in the demonstrations, while, in the other, we consider the prediction uncertainty. The experiments are conducted in two different 7DoF manipulators, enabled with torque control. Videos of both experiments can be found at
https://youtu.be/bfxegGiqQ9s.
Via Learning cocktail shaking skills with force constraints
We start our evaluation with a cocktail shaking task where force and configuration space control are employed. For this task we use the torquecontrolled KUKA lightweight robot. The task is comprised of two subtasks (Fig. 1): a forcebased subtask, where a contact force (measured with a F/T sensor mounted on the endeffector) must be tracked in order to successfully close a cocktail shaker, and a configuration space subtask, through which the robot performs a shake using rhythmic joint movements. We aim to extract the activation of each subtask from the variability in the demonstrations, thus both force and joint demonstrations are encoded in GMMs.
We collected 4 demonstrations of this task by kinesthetically guiding the robot arm (gravitycompensated) to first close the shaker and, second, to perform the shake with a rhythmic motion of its 6th joint. For , the force controller, we have , with datapoints encoding time and sensed forces (force directions as indicated in Fig. 1). In the case of the joint space controller, , we have with datapoints , where and denote the position and velocity of joint at time step . The recorded trajectories were filtered and subsampled to 200 points each, yielding a dataset with datapoints for each controller. Additionally, the joint space trajectories were aligned using Dynamic Time Warping, in order to capture the consistent shaking patterns in all demonstrations. Finally, GMMs were fitted to the dataset of each controller, with and states, respectively, chosen empirically.
Figures 4 and 5 show the force and joint space datapoints, together with the corresponding GMM states, for (force along the endeffector axis) and joint . For illustration purposes, the GMM states are depicted as ellipses with a width of one standard deviation. The negative sign in the force measurements indicates that the applied force is in opposite direction to the positive axis, which is expected due to the closing of the shaker occurring along that direction. From these plots we conclude that both the collected contact forces and joint angles have periods of high and low variability. The periods of low variability mark the regions where each subtask should be predominant. In the case of , this happens at the beginning, where the force is zero, and between and , where the contact force is applied to close the shaker. On the other hand, the consistent rhythmic patterns after in Fig. 5, mark the shaking subtask. Notably, in both cases, the GMM encoding is able to capture this consistency, in the form of narrow Gaussians. Figure 6 shows the retrieved control references using GMR. Here, the contour around the thick lines corresponds to the predicted variance at each input point. In both cases, the combination of GMM/GMR allows for a proper encoding and retrieval of both mean control reference and variance.
The torque commands that were generated by each controller during one reproduction of the task, as well as the optimal torque, are shown in Fig. 7. The latter is obtained from the former two from (6), as described in Section IV. We focus our analysis on joint 6, the one which performs the shake. For each subtask, we used diagonal control gain matrices, chosen empirically based on the desired controller performance. In particular, we used , and . The linear operators were constructed according to Section IVB as and , for the contact force controller, and and , for the configuration space controller. Notice the sign change in the force operators, compared to those in Section IVB. This is due to the encoded forces having an opposite sign to the desired direction of endeffector movement. Figure 7 shows that the commanded torque closely matches the torque from each of the individual controllers, in the corresponding regions of low variance. This is evident in the beginning of the task, where the torques generated by the force controller strongly influence the torques sent to the robot, and from , where the shaking torques are favored. This results in a reproduction where the complete task is properly executed by, first, applying the desired contact force and, second, performing the shaking movement. The accompanying video shows the demonstration and reproduction of the task.
ViB Learning painting skills from separate demonstrations
In a second experiment, we consider the scenario where a robot assists a user to perform a painting task. We divide the complete task into two subtasks: 1) a handover, where the user gives the paint roller to the robot (Fig. 8left), and 2) painting, where, in a different region of the workspace, the robot helps the user paint a wooden board by applying painting strokes (Fig. 8right). This task is an instance of general humanrobot interaction scenarios where a robot needs to perform different subtasks (potentially employing different controllers), depending on the user’s needs. Here, we consider an operational space controller (4) for the handover and a configuration space controller (2) for the painting.
Teaching controllers separately implies a trajectory modeling technique that yields high variances when far from each controller training region, thus we exploit GPR. The 3dimensional position of the user right hand is, in this case, used as an input to GPR, as opposed to time. Training datapoints have the form for the handover subtask and for the painting subtask. Here, are the human and robot hand positions at time and is the joint space configuration of the manipulator. The reference trajectories of each subtask are thus 3 and 7dimensional, respectively. In this experiment we consider zero velocity references for both controllers, , , and thus we used linear operators , and and . One demonstration was collected for each subtask, as shown in Fig. 8. Notice that the right hand position of the human collaborator (tracked with an optical marker) never overlaps between the demonstrations of the two subtasks. For each output, we used a Gaussian Process with a Matérn kernel with (see e.g., Chapter 4 in [10]), as it yielded smooth predictions, a convenient feature for our setup where the person may move in an unpredictable manner. Hyperparameters were optimized by minimizing the negative log marginal likelihood of the observations [10]. Moreover, we exploit the process mean to define a prior on the robot’s behavior, in particular to have the robot keep a safe posture outside of the region where demonstrations are provided. We define this neutral pose manually as a joint space configuration but it could alternatively be demonstrated. Each element , …, defines the mean of each of the 7 joint space GPs. The means of the task space GPs , which are also constant, are given by the endeffector position yielded by the configuration .
After hyperparameter estimation, we exploit GPR predictions to fuse the torques from each controller and reproduce the complete task. Notice that, during movement synthesis, the system will observe different input data than that used for training, as the user may move in regions where demonstrations were not provided. One expects the robot to stay in the predefined safe posture in those regions and execute the demonstrated subtasks where they were shown. Moreover, this should occur with smooth transitions between torque commands when tasks change. Figure 9 shows one reproduction of the complete task. The user starts by filing a wooden board, in a region of the workspace with no demonstration data (top, left). One can see that the robot remains in the preselected neutral pose. As the user hands the paint roller to the robot, the endeffector moves to grasp it (top, right). Finally, the user grasps the board and moves to a spacious region to perform the painting. As his right hand moves up and down, the robot applies painting strokes in the opposite direction. The robot is therefore capable of identifying which controller should be active at any moment, by exploiting the information contained in the data.
Figure 10 provides a quantitative analysis of the performance of our method in this scenario, by showing the torques involved in one reproduction. We focus the analysis on the second joint of the robot (see Fig. 9, bottom left) since it is highly important for this task. Even though we did not consider a timedriven regression, we plot torques against time, in order to have a clear and continuous view of how the task evolved. The plot in Fig. 10 shows a clear separation between different moments of the task. Time intervals , , , correspond to regions of the workspace where no training data was provided and, thus, the variance of both controllers is high and roughly constant, as predictions are simultaneously uncertain. The interval (first highlighted region) corresponds to the execution of the handover subtask. Notice the decrease in the variance of the torques for this task (green envelope) and how these torques are matched by the optimal torque. Finally, the second highlighted time frame coincides with the execution of the painting task. Here one can see a decrease in the variance of the joint space controller (red envelope), which is closely matched by the optimal torque, in particular during the two strokes (two oscillations around and ). All other joints yielded equivalent observations.
For visualization purposes, in Fig. 11 we zoom in on the torques that are used for each subtask. In the leftmost plot we see that the torques that are generated by the task space controller (green line) are closely matched by the optimal torque. Here, positive torques lower the endeffector to a below posture for the handover (until ), while negative torques raise it to an above posture after the handover (). We observe an analogous result in the rightmost plot, where the joint space controller torques coincide. Here, positive torques apply vertical strokes from top to bottom, and negative torques move the paint roller back to the initial configuration. The accompanying video shows the demonstrations that we used for this task as well as one complete reproduction.
Vii Conclusions and Future Work
We presented a novel probabilistic framework for fusing torque controllers based on human demonstrations. The main contributions are the consideration of forcebased tasks, in addition to joint and task space ones, and the possibility to exploit different probabilistic trajectory modeling techniques. The experimental validation showed that the approach allows robots to successfully reproduce tasks that require the fulfillment of different types of constraints, enforced by controllers acting on different spaces. The probabilistic encoding of demonstrations proved to be crucial, by providing information about the importance of each constraint, through the second moment of the data. This aspect is not present in deterministic trajectory modeling approaches, which thus fall short in application scenarios where multiple constraints need to be fulfilled.
The results presented here open up several future research challenges. One, connected to Section V, concerns the formulation of a probabilistic modeling technique that can simultaneously encode and synthesize uncertainty and variability in the observed data. Works like [18] are a potential step in this direction. Another promising research direction pertains to the design of the individual controllers. While in this paper we fixed the control gains of each individual controller, works like [19, 20] estimate these gains from demonstrations by formulating the tracking problem as a LQR. Combining the proposed approach with that technique could potentially enhance compliance and safety capabilities during the execution of the demonstrated constraints, while alleviating the need for control gain design.
Footnotes
 In the remainder of the paper we drop dependencies on , e.g. , etc.
References
 A. G. Billard, S. Calinon, and R. Dillmann, “Learning from humans,” in Handbook of Robotics, B. Siciliano and O. Khatib, Eds. Secaucus, NJ, USA: Springer, 2016, ch. 74, pp. 1995–2014, 2nd Edition.
 F. L. Moro, M. Gienger, A. Goswami, and N. G. Tsagarakis, “An attractorbased wholebody motion control (WBMC) system for humanoid robots,” in Proc. IEEERAS Intl Conf. on Humanoid Robots (Humanoids), Atlanta, GA, USA, October 2013, pp. 42–49.
 V. Modugno, G. Neumann, E. Rueckert, G. Oriolo, J. Peters, and S. Ivaldi, “Learning soft task priorities for control of redundant robots,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Stockholm, Sweden, May 2016, pp. 221–226.
 N. Dehio, R. F. Reinhart, and J. J. Steil, “Multiple task optimization with a mixture of controllers for motion generation,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015, pp. 6416–6421.
 O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE Journal on Robotics and Automation, vol. 3, no. 1, pp. 43–53, 1987.
 L. Sentis and O. Khatib, “Control of FreeFloating Humanoid Robots Through Task Prioritization,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Barcelona, Spain, April 2005, pp. 1718–1723.
 R. Lober, V. Padois, and O. Sigaud, “Variance modulated task prioritization in wholebody control,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Sep. 2015, pp. 3944–3949.
 S. Calinon and A. G. Billard, “Statistical learning by imitation of competing constraints in joint space and task space,” Advanced Robotics, vol. 23, no. 15, pp. 2059–2076, 2009.
 J. Silvério, S. Calinon, L. Rozo, and D. G. Caldwell, “Learning Competing Constraints and Task Priorities from Demonstrations of Bimanual Skills,” arXiv:1707.06791 [cs.RO], July 2017.
 C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. Cambridge, MA, USA: MIT Press, 2006.
 A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., 2013, pp. 2616–2624.
 M. Deniša, A. Gams, A. Ude, and T. Petrič, “Learning Compliant Movement Primitives Through Demonstration and Statistical Generalization,” IEEE/ASME Transactions on Mechatronics, vol. 21, no. 5, pp. 2581–2594, 2016.
 A. Ijspeert, J. Nakanishi, P. Pastor, H. Hoffmann, and S. Schaal, “Dynamical movement primitives: Learning attractor models for motor behaviors,” Neural Computation, no. 25, pp. 328–373, 2013.
 J. Nakanishi, R. Cory, M. Mistry, J. Peters, and S. Schaal, “Operational space control: A theoretical and empirical comparison,” International Journal of Robotics Research, vol. 27, no. 6, pp. 737–757, 2008.
 K. Lynch and F. Park, Modern Robotics: Mechanics, Planning, and Control. Cambridge University Press, 2017.
 F. Stulp and O. Sigaud, “Many regression algorithms, one unified model  a review,” Neural Networks, vol. 69, pp. 60–79, Sept. 2015.
 S. Calinon, “A tutorial on taskparameterized movement learning and retrieval,” Intelligent Service Robotics, vol. 9, no. 1, pp. 1–29, January 2016.
 J. Umlauft, Y. Fanger, and S. Hirche, “Bayesian uncertainty modeling for Programming by Demonstration,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), May 2017.
 J. R. Medina, D. Lee, and S. Hirche, “Risksensitive optimal feedback control for haptic assistance,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), May 2012, pp. 1025–1031.
 L. Rozo, D. Bruno, S. Calinon, and D. G. Caldwell, “Learning optimal controllers in humanrobot cooperative transportation tasks with position and force constraints,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, Sept.Oct. 2015, pp. 1024–1030.