A Learning from Demonstration Approach fusing Torque Controllers

A Learning from Demonstration Approach fusing Torque Controllers


Torque controllers have become commonplace in the new generation of robots, allowing for complex robot motions involving physical contact with the surroundings in addition to task constraints at Cartesian and joint levels. When learning such skills from demonstrations, one is often required to think in advance about the appropriate task representation (usually either operational or configuration space). We here propose a probabilistic approach for simultaneously learning and synthesizing control commands which take into account task, joint space and force constraints. We treat the problem by considering different torque controllers acting on the robot, whose relevance is learned from demonstrations. This information is used to combine the controllers by exploiting the properties of Gaussian distributions, generating torque commands that satisfy the important features of the task. We validate the approach in two experimental scenarios using 7-DoF torque-controlled manipulators, with tasks requiring the fusion of multiple controllers to be properly executed.

I Introduction

The field of Learning from Demonstration (LfD) [1] aims for a user-friendly and intuitive human-robot skill transfer. Generally, when modeling demonstrated skills, one must think in advance about the relevant variables to encode. The selection of these variables, strongly depends on the task requirements, with the representation of the skills usually being in either operational or configuration spaces. The a priori definition of the relevant space may require considerable reasoning or trial-and-error to achieve successful movement synthesis, which contradicts the LfD concept. This process becomes even more cumbersome when the robot is required to physically interact with the environment, thus introducing additional task constraints such as contact forces. Consider the example shown in Fig. 1, where a robot is first required to learn how to apply a contact force with the end-effector, and then must perform a configuration space movement. In this case, encoding demonstrations in either operational or configuration spaces alone will not result in proper execution.

Fig. 1: Example of a task that demands the utilization of two different controllers. First, the robot should insert the cap on a shaker (left), by applying a force along , a skill that requires force control. Subsequently, the robot must shake the bottle with its wrist joint (right), for which a configuration space controller is desirable.

We here propose an approach for simultaneously learning different types of task constraints and generating torque control commands that encapsulate the important features of the task. Figure 2 gives an overview of the approach. We treat the problem by considering different torque controllers acting on the robot, with each one being responsible for the fulfillment of a particular type of constraint (e.g. desired contact forces, Cartesian/joint positions and/or velocities). We discuss such controllers in Section III. From demonstrations of a task, we propose to learn the importance of each controller using probabilistic representations of the collected data (Section V). We then exploit a set of linear operators, defined for each individual controller, that take into account the state of the robot and contact with the environment to transform the control references into torque commands, with associated importance. Finally, we combine all the constraints, represented as independent Gaussian-distributed torque references, through a fusion of controllers, carried out by a product of Gaussians (Section IV). We hence obtain a final torque reference that is used to control the robot.

Fig. 2: Diagram of the proposed approach. Demonstrations of a task are given to the robot, while recording different types of data, such as positions, velocities and interaction forces. To each type of data, an individual controller is assigned, and the corresponding references are modeled as Gaussian distributions, encapsulating each controller’s importance. During task execution, linear operators and , which depend on the chosen controllers as well as the robot’s state and the interaction forces, transform the references into probabilistic torque commands. These torques are combined by taking into account their variance, through the product of Gaussians, whose result is then fed to the robot as a torque that satisfies the important task features.

Our contribution with respect to the state of the art is three-fold:

  1. A probabilistic formulation for jointly learning torque controllers from demonstrations, by exploiting the properties of Gaussian distributions.

  2. The learning of force-based tasks in operational space, in addition to Cartesian and joint constraints.

  3. An approach that is compatible with various probabilistic learning algorithms that generate Gaussian distributed references or trajectories.

The proposed approach is evaluated in two scenarios with 7-DoF torque-controlled robots (Section VI). In the first case, we use a cocktail shaking task, employing force control, to demonstrate that the approach can accommodate both force- and position/velocity-based skills. The second scenario shows that the approach can also be used to combine partial demonstrations, allowing for demonstrating each controller sub-task independently, in different regions of the workspace.

Ii Related Work

The problem of combining controllers can be broadly divided into two types of approaches. In [2, 3, 4], the authors use a weighted combination of individual torque controllers in humanoid robots, with each controller responsible for a particular sub-task (e.g. balance, manipulation, joint limit avoidance). Other works frame the problem as a multi-level prioritization [5, 6], where lower importance tasks are executed without compromising more important ones, typically in a hierarchical manner with a null space formulation. As a result, tasks with low importance are only executed if they do not affect high priority ones, potentially requiring platforms with a high number of degrees of freedom. Both kinds of approaches have their own merits, with the former allowing for a more flexible organization of tasks as well as smooth transitions between them (according to their weight profiles) and the latter ensuring that high priority tasks are always executed.

In contrast to manually setting weights [2], in this paper we are interested in learning them from human demonstrations. Learning controller importance has been addressed in different manners, from reinforcement learning (RL) [3, 4, 7] to LfD [8, 9]. The main differences between these two branches lie on the type of prior knowledge, with RL requiring a priori information in the form of reward or cost functions – which can be hard to formulate in some cases – and LfD approaches demanding task demonstrations. The present work shares connections with [8, 9], where the problem of combining constraints in task and joint space is addressed. The first important difference is that such approaches exploit velocity controllers, which take into account only kinematic constraints. In this work, however, we consider torque controllers, allowing for: (i) a straightforward consideration of contact forces at the end-effector, which facilitates the transfer of skills that involve physical human-robot interaction, and (ii) the exploitation of the compliance capabilities of modern robots. The second relevant difference is that [8, 9] model data using Gaussian Mixture Models (GMM), while here, although GMM are used as an example, we generalize the problem to a wider range of probabilistic modeling approaches, by exploiting the particularities of each approach. In particular, we show that the probabilistic combination of controllers can be generalized to any trajectory modeling technique that generates Gaussian-distributed outputs (e.g. Gaussian Process Regression (GPR) [10], Probabilistic Movement Primitives (ProMP) [11]).

One relevant and recent work in learning torque controllers is that of Deniša et al. [12], who introduced the concept of Compliant Movement Primitives. Such primitives consist of a tuple comprised of Dynamic Movement Primitives (DMP) [13] associated with the task (operational or configuration space trajectories) and with the task dynamics torques (e.g. related to object mass). Here we consider torque controllers that track trajectories in either joint or task space (both positions and forces) and intentionally overlook the task dynamics. Moreover, the probabilistic nature of our method provides essential information for controller fusion, in the form of covariance matrices, which is unavailable in DMP.

Iii Torque Controllers – configuration and operational space

Inspired by works in which a combination of torque controllers results in a flexible importance assignment and smooth transitions between different tasks [2, 3, 4], we propose a strategy where the controller combination is learned from demonstrations. In this section we define the individual controllers that we exploit for configuration and operational space control. Formally, we follow a model-based approach to control the robot using torques, by assuming a rigid-body system with joints whose dynamics are given by , where denote joint angles, velocities and accelerations, and , correspond to the inertia matrix, Coriolis and gravity terms, respectively. The total torques acting on each joint are given by .

Robot control is achieved using a torque command , formed from a task-related term and a term that compensates for the dynamics of the robot , i.e.,


In this work we are interested in fusing controllers that fulfill different task requirements, thus we focus on the term . Here, when referring to tasks, we are concerned with the tracking of reference trajectories (e.g. positions, forces).

The definition of depends on the space where tasks are represented. For instance, when a task requires the manipulation of an object (e.g. pick and place), must be defined such that position and orientation constraints at the end-effector are fulfilled with respect to the object or other landmarks in the robot’s workspace. If, additionally, manipulation requires physical contact (e.g. object insertion, cooperative handling of objects), must also accommodate desired interaction forces. In other applications, where gestures or specific configurations of the kinematic chain are required, is more adequately formulated as a configuration space controller. We now describe the controllers that we exploit for the different types of tasks, denoting simply by .

Iii-a Configuration Space Controller

Configuration space controllers are used to track joint positions and velocities. Here we exploit proportional-derivative (PD) controllers of the form


where are joint stiffness and damping gain matrices, and are the current and desired joint positions and velocities. An additional feed-forward term is often added to (2), for improved tracking performance, as in [14]. As we shall see, it is straightforward to accommodate this term in our approach, if required.

Iii-B Position Controller in Operational Space

Operational space controllers are aimed at tracking Cartesian poses with the end-effector of the robot. Here, we consider the case of tracking position references, but the approach remains valid for the consideration of orientations. We assume that the end-effector of the robot is driven by a force, that is proportional to the output of a PD controller,


where is the Cartesian inertia matrix of the end-effector, whose positions and velocities (current and desired) are, respectively, denoted by (with as the dimension of the operational space). The Jacobian matrix , gives the differential kinematics of the robot’s end-effector and are Cartesian stiffness and damping gain matrices. The end-effector force is converted to joint torques as in [14],


Iii-C Force Controller

In this case we consider a proportional controller that tracks a desired force at the end-effector (see [15], Ch. 11):


where are current and desired contact forces (measured using a F/T sensor at the end-effector), and (4) is used to map the force command at the end-effector to joint torques1. Finally, is a proportional gain matrix.

Iv Probabilistic Torque Controllers

In this section, we formalize the fusion of torque controllers as an optimization problem and lay out the probabilistic treatment of control commands. Let us consider a robot employing controllers – as those defined in Section III – at any given moment, corresponding to different sub-tasks that can be executed in series or in parallel. Each controller generates a torque command . Also, let us assume we have access to a precision matrix (which will be explained in Section IV-B), denoted by , providing information about the respective importance of the different controllers. We formalize the problem of fusing control commands as the optimization


whose objective function corresponds to a weighted sum of quadratic error terms, with the weight of each command given by full matrices . The solution and error residuals of (6) can be computed analytically, and correspond to the mean and covariance matrix of a Gaussian distribution given by the product of Gaussians, with means and covariance matrices ,


where . The connection between the solution of (6) and the product of Gaussians (7) allows us to exploit the structure of the controllers defined in Section III to fuse torque control commands, given Gaussian-distributed references. In particular, this is achieved by taking advantage of the linearity of the controllers (Section IV-A) in combination with the linear properties of Gaussians (Section IV-B).

Iv-a Linear controller structure

Control commands (2)–(5) are linear with respect to the reference trajectories. The controller equations can thus be re-written in a way that highlights this linear structure. For the joint space torque controller (2) we obtain


where and . Similarly, the Cartesian position and force controllers (4)–(5) can be formulated as , with , , and , with and . Note that linearity also applies if feed-forward terms are included in the controllers, e.g. , in which case these terms simply need to be included in the reference vector and can be extended with the identity matrix, e.g. and , for a configuration space controller.

Iv-B From probabilistic references to probabilistic torques

Gaussian distributions are popular in robot learning and control due to their properties of product, conditioning and linear transformation. Here, we consider Gaussian-distributed control references and exploit the previously defined linear operators to formulate probabilistic torque controllers. Let us first consider a configuration space controller, with desired joint state , where and are the mean and covariance matrix of a Gaussian, modeling the probability distribution of joint positions and velocities. Per the linear properties of Gaussian distributions, the configuration space controller (8) yields a new Gaussian with mean and covariance given by


Similarly, for and , we obtain




respectively. This type of controller has a probabilistic nature as the torque commands are generated from Gaussian distributions and result in new Gaussians. We therefore refer to them as probabilistic torque controllers (PTC).

A generic PTC, , is thus fully specified by


where denotes a generic control reference. Note that the set of linear parameters is permanently updated, for each controller, during execution, as it depends on the state of the robot and its interaction with the environment through , and .

A probabilistic representation of trajectories using Gaussian distributions (12) has the advantage of modeling the second moment of the data in the form of covariance matrices. This is exploited here to express the importance of each controller as a function of the covariance matrix of the corresponding reference trajectory , through


Note that is typically non-squared. This operator maps constraints from spaces with different dimensions (e.g. configuration and operational spaces) into a common space, that of torque commands.

With the definition of in (13), torque commands can be combined using (7). The problem of learning control commands and their respective importance is thus framed as the learning of reference trajectories as Gaussian distributions , and generating Gaussian-distributed torque commands , which encapsulate the control reference and its importance with respect to other controllers. In previous work, controller weights are either set empirically [2] or optimized through reinforcement learning [3, 4]. In contrast to these works, we employ probabilistic regression algorithms to learn , and consequently , from human demonstrations.

V Learning control references from demonstrations

In Section IV, we formalized our approach for combining controllers. Here we show how the Gaussian modeling of trajectories can be learned from demonstrations. Several regression methods exist for this purpose, each offering different advantages; see [16] for a review. Two popular approaches are GMM, combined with Gaussian Mixture Regression [17], and GPR [10]. We now review these two techniques, and expand on their use in the context of PTC.

V-a Gaussian Mixture Model/Gaussian Mixture Regression (GMM/GMR)

We consider demonstration datasets comprised of datapoints organized in a matrix . Each datapoint is represented with input/output dimensions indexed by , , so that with . It can for example represent a concatenation of time stamps with end-effector poses, joint angles or measured forces. A GMM, encoding the joint probability distribution with states and parameters (respectively the prior, mean and covariance matrix of each state ), can be estimated from such a dataset through Expectation-Maximization (EM) [17]. After a GMM is fitted to a given dataset, GMR can subsequently be used to synthesize new behaviors, for new inputs , by means of the conditional probability , yielding a normally-distributed output ; see [17] for details.

We exploit GMM/GMR to estimate desired trajectories for each controller through the mean , as well as their importance through the covariance matrix . In GMM/GMR, covariance matrices model the variability in the data, in addition to the correlation between the variables. Figure 2(a) illustrates this aspect, where we see that the variance regressed by GMR (shown as an envelope around the mean) reflects the datapoint distribution in the original dataset. In the context of PTCs, high variability in the demonstrations of the -th controller results in large covariance matrices . From (13), it follows that the corresponding controller precision matrix will be small and, thus, the control reference will be tracked less accurately. GMM/GMR is, hence, an appropriate technique to select relevant controllers based on the regularities observed in each part of the task throughout the different demonstrations.

(a) GMR: The variance models the variability in the dataset.
(b) GPR: The variance models the uncertainty of the estimate (depending on the presence/absence of training datapoints in the neighborhood).
Fig. 3: For a given set of datapoints (black dots), GMR and GPR compute different and complementary notions of variance. The red line is the regressed function, while the light red contour represents the computed variance around the prediction.
1:1. Initialization
2:Select relevant controllers (Section III) based on tasks
3:Select appropriate regression algorithm (GMR, GPR)
4:Collect demonstrations for each controller
1:2. Model training
2:for  do
3:     if regression algorithm is GMR then
4:         Choose GMM state number and estimate
5:     else if regression algorithm is GPR then
6:         Choose kernel function
7:     end if
8:end for
1:3. Movement synthesis
2:for  do
3:     for  do
4:         Compute , through GMR or GPR
5:         Update based on the type of controller
6:         Compute torque distribution
7:     end for
8:     Compute from (7) and from (1)
9:end for
Algorithm 1 Fusion of probabilistic torque controllers

V-B Gaussian Process Regression (GPR)

A Gaussian Process (GP) is a distribution over functions, with a Gaussian prior on observations given by , where is a vector-valued function yielding the mean of the process, denotes its covariance matrix and is a concatenation of observed inputs. The covariance matrix is computed from a kernel function evaluated at the inputs, with elements . Several types of kernel functions exist; see e.g., [10].

Standard GPR allows the prediction of a scalar function . In robotics, one typically requires multi-dimensional outputs, thus GPR is often employed separately for each output of a given problem. Here we follow this approach to probabilistically model multi-dimensional reference trajectories, such as those of joint angles or Cartesian positions. For each input point , the prediction of each output dimension is thus given by


where is the observed -th output dimension, , , , , , and is an additional hyperparameter modeling noise in the observations (which acts as a regularization term). We concatenate the predictions into one single multivariate Gaussian with mean and covariance matrix given by


Since output dimensions are modeled separately, GPR predictions are, in the standard case, uncorrelated, which is evident from the structure of in (16). In contrast to GMR, the estimated variance in GPR is a measure of prediction uncertainty. Figure 2(b) illustrates this aspect, with the variance increasing with the absence of training data (). This provides a way of assigning importance to predictions, when different observations of a task occur. We propose to exploit GPR if demonstration data is incomplete or scarce and, in particular, for partially demonstrating a task to each controller as separate sub-tasks.

The overall approach is summarized in Algorithm 1 for GMM or GP as trajectory modeling techniques.

Vi Evaluation

We assess the performance of the proposed framework in two different tasks. In one case, we exploit the variability in the demonstrations, while, in the other, we consider the prediction uncertainty. The experiments are conducted in two different 7-DoF manipulators, enabled with torque control. Videos of both experiments can be found at

Vi-a Learning cocktail shaking skills with force constraints

We start our evaluation with a cocktail shaking task where force and configuration space control are employed. For this task we use the torque-controlled KUKA light-weight robot. The task is comprised of two sub-tasks (Fig. 1): a force-based sub-task, where a contact force (measured with a F/T sensor mounted on the end-effector) must be tracked in order to successfully close a cocktail shaker, and a configuration space sub-task, through which the robot performs a shake using rhythmic joint movements. We aim to extract the activation of each sub-task from the variability in the demonstrations, thus both force and joint demonstrations are encoded in GMMs.

We collected 4 demonstrations of this task by kinesthetically guiding the robot arm (gravity-compensated) to first close the shaker and, second, to perform the shake with a rhythmic motion of its 6th joint. For , the force controller, we have , with datapoints encoding time and sensed forces (force directions as indicated in Fig. 1). In the case of the joint space controller, , we have with datapoints , where and denote the position and velocity of joint at time step . The recorded trajectories were filtered and sub-sampled to 200 points each, yielding a dataset with datapoints for each controller. Additionally, the joint space trajectories were aligned using Dynamic Time Warping, in order to capture the consistent shaking patterns in all demonstrations. Finally, GMMs were fitted to the dataset of each controller, with and states, respectively, chosen empirically.

Fig. 4: Dataset of demonstrated contact forces along (lines) and estimated GMM states (blue ellipses).
Fig. 5: Dataset from joint of the 7-DoF manipulator as a function of time (lines). Red ellipses are the GMM states which model the joint probability distribution between joint angles and time.

Figures 4 and 5 show the force and joint space datapoints, together with the corresponding GMM states, for (force along the end-effector -axis) and joint . For illustration purposes, the GMM states are depicted as ellipses with a width of one standard deviation. The negative sign in the force measurements indicates that the applied force is in opposite direction to the positive -axis, which is expected due to the closing of the shaker occurring along that direction. From these plots we conclude that both the collected contact forces and joint angles have periods of high and low variability. The periods of low variability mark the regions where each sub-task should be predominant. In the case of , this happens at the beginning, where the force is zero, and between and , where the contact force is applied to close the shaker. On the other hand, the consistent rhythmic patterns after in Fig. 5, mark the shaking sub-task. Notably, in both cases, the GMM encoding is able to capture this consistency, in the form of narrow Gaussians. Figure 6 shows the retrieved control references using GMR. Here, the contour around the thick lines corresponds to the predicted variance at each input point. In both cases, the combination of GMM/GMR allows for a proper encoding and retrieval of both mean control reference and variance.

Fig. 6: GMR performed on the mixture models depicted in Figs. 4 and 5, with solid lines representing the retrieved profiles and the semi-transparent contours depicting the prediction variance. Top: Retrieved contact force profile . Bottom: Predicted reference for .
Fig. 7: Generated torque commands for joint 6 during one reproduction of the task. Red and blue curves show the torques generated by each individual controller, with corresponding variance, obtained from the probabilistic controller formulation in Section IV. The optimal torque, used by the robot, is depicted in black.

The torque commands that were generated by each controller during one reproduction of the task, as well as the optimal torque, are shown in Fig. 7. The latter is obtained from the former two from (6), as described in Section IV. We focus our analysis on joint 6, the one which performs the shake. For each sub-task, we used diagonal control gain matrices, chosen empirically based on the desired controller performance. In particular, we used , and . The linear operators were constructed according to Section IV-B as and , for the contact force controller, and and , for the configuration space controller. Notice the sign change in the force operators, compared to those in Section IV-B. This is due to the encoded forces having an opposite sign to the desired direction of end-effector movement. Figure 7 shows that the commanded torque closely matches the torque from each of the individual controllers, in the corresponding regions of low variance. This is evident in the beginning of the task, where the torques generated by the force controller strongly influence the torques sent to the robot, and from , where the shaking torques are favored. This results in a reproduction where the complete task is properly executed by, first, applying the desired contact force and, second, performing the shaking movement. The accompanying video shows the demonstration and reproduction of the task.

Vi-B Learning painting skills from separate demonstrations

In a second experiment, we consider the scenario where a robot assists a user to perform a painting task. We divide the complete task into two sub-tasks: 1) a handover, where the user gives the paint roller to the robot (Fig. 8-left), and 2) painting, where, in a different region of the workspace, the robot helps the user paint a wooden board by applying painting strokes (Fig. 8-right). This task is an instance of general human-robot interaction scenarios where a robot needs to perform different sub-tasks (potentially employing different controllers), depending on the user’s needs. Here, we consider an operational space controller (4) for the handover and a configuration space controller (2) for the painting.

Fig. 8: Two persons demonstrate the painting task to the robot. Left: The robot is shown how to receive the paint roller. Right: One person drives the robot to demonstrate the painting strokes, while the other holds the board.

Teaching controllers separately implies a trajectory modeling technique that yields high variances when far from each controller training region, thus we exploit GPR. The 3-dimensional position of the user right hand is, in this case, used as an input to GPR, as opposed to time. Training datapoints have the form for the handover sub-task and for the painting sub-task. Here, are the human and robot hand positions at time and is the joint space configuration of the manipulator. The reference trajectories of each sub-task are thus 3- and 7-dimensional, respectively. In this experiment we consider zero velocity references for both controllers, , , and thus we used linear operators , and and . One demonstration was collected for each sub-task, as shown in Fig. 8. Notice that the right hand position of the human collaborator (tracked with an optical marker) never overlaps between the demonstrations of the two sub-tasks. For each output, we used a Gaussian Process with a Matérn kernel with (see e.g., Chapter 4 in [10]), as it yielded smooth predictions, a convenient feature for our setup where the person may move in an unpredictable manner. Hyperparameters were optimized by minimizing the negative log marginal likelihood of the observations [10]. Moreover, we exploit the process mean to define a prior on the robot’s behavior, in particular to have the robot keep a safe posture outside of the region where demonstrations are provided. We define this neutral pose manually as a joint space configuration but it could alternatively be demonstrated. Each element , …, defines the mean of each of the 7 joint space GPs. The means of the task space GPs , which are also constant, are given by the end-effector position yielded by the configuration .

Fig. 9: Reproduction of the painting task. Top: The user works on a wooden board, while the robot keeps a safe posture (left). The paint roller is handed over to the robot (right). Bottom: The robot applies painting strokes, as the user’s right hand moves up and down with the board.

After hyperparameter estimation, we exploit GPR predictions to fuse the torques from each controller and reproduce the complete task. Notice that, during movement synthesis, the system will observe different input data than that used for training, as the user may move in regions where demonstrations were not provided. One expects the robot to stay in the pre-defined safe posture in those regions and execute the demonstrated sub-tasks where they were shown. Moreover, this should occur with smooth transitions between torque commands when tasks change. Figure 9 shows one reproduction of the complete task. The user starts by filing a wooden board, in a region of the workspace with no demonstration data (top, left). One can see that the robot remains in the pre-selected neutral pose. As the user hands the paint roller to the robot, the end-effector moves to grasp it (top, right). Finally, the user grasps the board and moves to a spacious region to perform the painting. As his right hand moves up and down, the robot applies painting strokes in the opposite direction. The robot is therefore capable of identifying which controller should be active at any moment, by exploiting the information contained in the data.

Fig. 10: Torques from the 2nd joint during the painting task and their variance. The first shaded area highlights the handover part of the movement, where the optimal torques match those computed by the end-effector position controller. The second shaded area highlights the task torques during two painting strokes.

Figure 10 provides a quantitative analysis of the performance of our method in this scenario, by showing the torques involved in one reproduction. We focus the analysis on the second joint of the robot (see Fig. 9, bottom left) since it is highly important for this task. Even though we did not consider a time-driven regression, we plot torques against time, in order to have a clear and continuous view of how the task evolved. The plot in Fig. 10 shows a clear separation between different moments of the task. Time intervals , , , correspond to regions of the workspace where no training data was provided and, thus, the variance of both controllers is high and roughly constant, as predictions are simultaneously uncertain. The interval (first highlighted region) corresponds to the execution of the handover sub-task. Notice the decrease in the variance of the torques for this task (green envelope) and how these torques are matched by the optimal torque. Finally, the second highlighted time frame coincides with the execution of the painting task. Here one can see a decrease in the variance of the joint space controller (red envelope), which is closely matched by the optimal torque, in particular during the two strokes (two oscillations around and ). All other joints yielded equivalent observations.

For visualization purposes, in Fig. 11 we zoom in on the torques that are used for each sub-task. In the leftmost plot we see that the torques that are generated by the task space controller (green line) are closely matched by the optimal torque. Here, positive torques lower the end-effector to a below posture for the handover (until ), while negative torques raise it to an above posture after the handover (). We observe an analogous result in the rightmost plot, where the joint space controller torques coincide. Here, positive torques apply vertical strokes from top to bottom, and negative torques move the paint roller back to the initial configuration. The accompanying video shows the demonstrations that we used for this task as well as one complete reproduction.

Fig. 11: Close up view of the handover and painting torques. Left: Optimal torque (black) and operational space controller torque (green). Right: Optimal and joint space controller torques (black and red).

Vii Conclusions and Future Work

We presented a novel probabilistic framework for fusing torque controllers based on human demonstrations. The main contributions are the consideration of force-based tasks, in addition to joint and task space ones, and the possibility to exploit different probabilistic trajectory modeling techniques. The experimental validation showed that the approach allows robots to successfully reproduce tasks that require the fulfillment of different types of constraints, enforced by controllers acting on different spaces. The probabilistic encoding of demonstrations proved to be crucial, by providing information about the importance of each constraint, through the second moment of the data. This aspect is not present in deterministic trajectory modeling approaches, which thus fall short in application scenarios where multiple constraints need to be fulfilled.

The results presented here open up several future research challenges. One, connected to Section V, concerns the formulation of a probabilistic modeling technique that can simultaneously encode and synthesize uncertainty and variability in the observed data. Works like [18] are a potential step in this direction. Another promising research direction pertains to the design of the individual controllers. While in this paper we fixed the control gains of each individual controller, works like [19, 20] estimate these gains from demonstrations by formulating the tracking problem as a LQR. Combining the proposed approach with that technique could potentially enhance compliance and safety capabilities during the execution of the demonstrated constraints, while alleviating the need for control gain design.


  1. In the remainder of the paper we drop dependencies on , e.g. , etc.


  1. A. G. Billard, S. Calinon, and R. Dillmann, “Learning from humans,” in Handbook of Robotics, B. Siciliano and O. Khatib, Eds.   Secaucus, NJ, USA: Springer, 2016, ch. 74, pp. 1995–2014, 2nd Edition.
  2. F. L. Moro, M. Gienger, A. Goswami, and N. G. Tsagarakis, “An attractor-based whole-body motion control (WBMC) system for humanoid robots,” in Proc. IEEE-RAS Intl Conf. on Humanoid Robots (Humanoids), Atlanta, GA, USA, October 2013, pp. 42–49.
  3. V. Modugno, G. Neumann, E. Rueckert, G. Oriolo, J. Peters, and S. Ivaldi, “Learning soft task priorities for control of redundant robots,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Stockholm, Sweden, May 2016, pp. 221–226.
  4. N. Dehio, R. F. Reinhart, and J. J. Steil, “Multiple task optimization with a mixture of controllers for motion generation,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015, pp. 6416–6421.
  5. O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE Journal on Robotics and Automation, vol. 3, no. 1, pp. 43–53, 1987.
  6. L. Sentis and O. Khatib, “Control of Free-Floating Humanoid Robots Through Task Prioritization,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), Barcelona, Spain, April 2005, pp. 1718–1723.
  7. R. Lober, V. Padois, and O. Sigaud, “Variance modulated task prioritization in whole-body control,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Sep. 2015, pp. 3944–3949.
  8. S. Calinon and A. G. Billard, “Statistical learning by imitation of competing constraints in joint space and task space,” Advanced Robotics, vol. 23, no. 15, pp. 2059–2076, 2009.
  9. J. Silvério, S. Calinon, L. Rozo, and D. G. Caldwell, “Learning Competing Constraints and Task Priorities from Demonstrations of Bimanual Skills,” arXiv:1707.06791 [cs.RO], July 2017.
  10. C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning.   Cambridge, MA, USA: MIT Press, 2006.
  11. A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in Neural Information Processing Systems (NIPS).   Curran Associates, Inc., 2013, pp. 2616–2624.
  12. M. Deniša, A. Gams, A. Ude, and T. Petrič, “Learning Compliant Movement Primitives Through Demonstration and Statistical Generalization,” IEEE/ASME Transactions on Mechatronics, vol. 21, no. 5, pp. 2581–2594, 2016.
  13. A. Ijspeert, J. Nakanishi, P. Pastor, H. Hoffmann, and S. Schaal, “Dynamical movement primitives: Learning attractor models for motor behaviors,” Neural Computation, no. 25, pp. 328–373, 2013.
  14. J. Nakanishi, R. Cory, M. Mistry, J. Peters, and S. Schaal, “Operational space control: A theoretical and empirical comparison,” International Journal of Robotics Research, vol. 27, no. 6, pp. 737–757, 2008.
  15. K. Lynch and F. Park, Modern Robotics: Mechanics, Planning, and Control.   Cambridge University Press, 2017.
  16. F. Stulp and O. Sigaud, “Many regression algorithms, one unified model - a review,” Neural Networks, vol. 69, pp. 60–79, Sept. 2015.
  17. S. Calinon, “A tutorial on task-parameterized movement learning and retrieval,” Intelligent Service Robotics, vol. 9, no. 1, pp. 1–29, January 2016.
  18. J. Umlauft, Y. Fanger, and S. Hirche, “Bayesian uncertainty modeling for Programming by Demonstration,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), May 2017.
  19. J. R. Medina, D. Lee, and S. Hirche, “Risk-sensitive optimal feedback control for haptic assistance,” in Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), May 2012, pp. 1025–1031.
  20. L. Rozo, D. Bruno, S. Calinon, and D. G. Caldwell, “Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), Hamburg, Germany, Sept.-Oct. 2015, pp. 1024–1030.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description