Learning Task Constraints from Demonstration for Hybrid Force/Position Control
We present a novel method for learning hybrid force/position control from demonstration for multi-phase tasks. We learn a dynamic constraint frame aligned to the direction of desired force using Cartesian Dynamic Movement Primitives. Our approach allows tracking of desired forces while activating only one dimension of the constraint frame for force control. We find that controlling with respect to our learned constraint frame provides compensation for frictional forces while sliding without any explicit modeling of friction. We additionally propose extensions to the Dynamic Movement Primitive (DMP) formulation in order to robustly transition from free-space motion to in-contact motion in spite of environment uncertainty. We incorporate force feedback and a dynamically shifting goal into the DMP to reduce forces applied to the environment and retain stable contact when enabling force control. Our methods exhibit low impact forces on contact and low steady-state tracking error.
Many tasks, such as buffing the hood of a car, scrubbing a floor, and sliding an object across a table require motion along a surface while maintaining a desired force. In order to automate such constrained-motion tasks, robots must be able to control force and position simultaneously. Though forces can be applied to an object using only position control, such an approach will result in poor tracking of desired forces in the absence of force feedback. Additionally, excessively large forces can be imposed on the object in the presence of estimation errors. Controlling forces relative to desired motion is essential for ensuring completion of a constrained-motion task without risking damage to the environment or the robot.
Hybrid force/position control is a popular control scheme for constrained-motion tasks  since it allows for both position and force control objectives to be tracked without conflict. Control is performed with respect to a (possibly time-varying) Cartesian coordinate system , denoted the constraint frame, that may be arbitrarily located relative to the robot. Common choices for the constraint frame include the world frame, the robot’s tool frame, or a frame attached to an object of interest . Task constraints in the context of this paper determine which dimensions of the constraint frame are controlled for position and which dimensions are controlled for force. They are typically defined by a diagonal binary selection matrix where a value of activates position control for Cartesian dimension , while a value of enables force control.
Specifying an appropriate constraint frame and task constraints is difficult and prone to error for complex or multi-phase tasks. Improper constraint specification is especially problematic for transitioning from free-space motion to being in contact with a surface, as large forces may be applied to the surface if constraints are enabled too soon or too late [3, 4]. Even when constraints are properly specified, small perturbations in the environment configuration or perceptual estimation errors can interfere with the timing of the constraints. It is therefore desirable for a robot to learn the constraints of a task and to adapt them to environment uncertainty online.
While learning from demonstration has proven successful for learning task constraints [5, 6, 7, 8], existing approaches focus on learning axis-aligned constraints with respect to a chosen fixed frame. This is a limitation when desired forces are time-varying and span multiple dimensions of the constraint frame, as the robot loses a degree of freedom for motion with each degree of freedom devoted to force control. Further, the problem of robustly transitioning from free-space motion to constrained motion in the presence of environment uncertainty has not been adequately addressed.
We contribute a solution to fill these apparent gaps in the literature for learning task constraints for hybrid force/position control from demonstration. We consider the scenario where a robot needs to move through free-space, make contact with a surface, and perform a motion along the surface while applying a desired force. We highlight two challenges that arise in this task setting and specify our contributions that overcome them:
Proper specification of a constraint frame and task constraints is difficult for tasks with rapidly changing force constraints. Improper specification impedes the robot’s ability to meet force/position control objectives. We present a method using Cartesian Dynamic Movement Primitives (CDMPs) for learning a dynamic constraint frame aligned to desired forces for which force constraints are trivially specified. Aligning the constraint frame to the direction of desired force minimizes the degrees of freedom used for force control, thereby maximizing freedom of motion orthogonal to the applied force. We embed the learned constraint frame in an operational space hybrid force/position controller to simultaneously meet position and force control objectives without the burden of manually defining the constraint frame and task constraints.
The surface may not be exactly at the anticipated position due to perceptual error or the surface being perturbed from its initial position. We provide novel extensions to the Dynamic Movement Primitive (DMP) framework that allow for robust transition from free-space motion to constrained motion. Our extensions incorporate force feedback and contact awareness to reduce contact forces and gradually transition into tracking desired forces. We also define a dynamically changing goal that transitions as a function of the robot’s contact with the environment.
We structure the remainder of the paper as follows. We provide a description of related work and explain the novelty of our method over the prior art in Section II. In Section III we present the base methods from the prior art we utilize in our framework. The details of our novel methods are provided in Section IV. We describe our experimental setup for testing multi-phase tasks requiring simultaneous tracking of force and position in Section V. We present the associated results in Section VI. Section VII concludes with a brief discussion of our methods and directions for future work.
Ii Related Work
We review two general areas of related research. The first covers learning simultaneous control of force and position. The second area includes methods that incorporate force feedback into Dynamic Movement Primitives.
Ii-a Learning Force/Position Control
The literature in LfD for simultaneous control of position and force has focused on 1) learning which dimensions of the constraint frame should be selected for position or force control [5, 8, 9, 10] and, to a lesser extent, 2) learning the best constraint frame to control with respect to [6, 8]. A key insight that has motivated constraint selection methods is that dimensions of the constraint frame that consistently exhibit high variance over time in force and low variance over time in position should favor force control, and position control otherwise . In  a criterion based on trajectory variance is defined that modulates a stiffness parameter of a Cartesian impedance controller, allowing force tracking when stiffness is low. Impedance stiffness is set to zero in  for compliant dimensions orthogonal to the dimension of highest variance in motion. A series of boolean checks in  over variance in force and position variables determines which axes of the robot’s tool frame is enabled for PI force control or Cartesian impedance control. In , binary constraint selection for a hybrid force/position controller is made by enabling position control when the computed position variance is found to be greater than the force variance.
Constraint frames are often chosen manually based on the requirements of the task. Common choices include the world frame [10, 11], surface normals , the robot’s tool frame [5, 7, 12, 13], or frames attached to objects of interest in the environment . The robot selects an appropriate constraint frame from a collection of pre-defined candidate frames in  based on the observed trajectory variance over multiple demonstrations. In  candidate frames include the start and end frames of a human-robot collaboration task, and a Gaussian Mixture Model selects the appropriate frame over the course of a trajectory. However, methods that use a fixed constraint frame cannot be used for a task in which desired forces span all three dimensions of the constraint frame, since it would require all dimensions to be enabled for force control, thereby preventing simultaneous motion. A careful choice of constraint frame can mitigate this problem, but for tasks in which desired forces vary in a complex manner, fixed frame selection is not feasible. Our learned dynamic constraint frame overcomes this problem.
Estimating task constraints and null space projections thereof can be used for generalizing a task to different environment configurations [12, 14]. The task constraint matrix and null space projection are estimated from motion data in  and incorporated into an operational space controller.  does not, however, consider task constraints for the purpose of force control. In , the robot estimates task constraints to command a policy learned from demonstration in the task null space. While  uses a force/torque sensor to align the robot end-effector to the normal of a curved surface for generalizing a learned planar task, it does not consider explicit task constraints for force control and assumes the robot is already in contact with the surface before initiating the task.
Ii-B Force Feedback for Dynamic Movement Primitives
Dynamic Movement Primitives (DMPs) are a widely used policy representation for learning robot motion that afford real-time obstacle avoidance , dynamic goal changing , and can be learned from demonstration using standard regression techniques . Various features of DMPs have been used to augment motion trajectories with force information. Position trajectories and force profiles can be synchronized with the DMP canonical system, a monotonically decreasing phase variable that decouples the system from an explicit time dependence, even when forces are demonstrated separately through a haptic input device . In , force error is incorporated into the phase variable to aid in assembly tasks learned from demonstration. Temporal coupling terms in  provide pose disturbance detection when executing tasks that repeatedly make and break contact with a surface. Compliant Movement Primitives  encode both motion and task-specific joint torques as a DMP which allows for low feedback gains that reduce contact forces during unexpected collisions. Velocity in periodic DMPs is modulated based on a passivity criterion in  to efficiently perform wiping tasks in a stable manner. Having both motion trajectories and force profiles encoded as DMPs allows standard reinforcement learning methods such as to be readily applied in order to learn the optimal forces needed for completing a task [11, 22].
Kober et al.  present a method similar ours, in which a robot learns DMPs for individual segments of a multi-phase task. The robot achieves force and position tracking with the use of hybrid force/position controller. However,  selects a fixed constraint frame based on convergence metrics of the DMPs, whereas our method uses a dynamic constraint frame learned from forces observed during demonstration. Several complementary works to ours use force information to guide transitions between primitives [24, 25, 26], but they do not address the problem of robustly transitioning between free-space motion and in-contact task phases. Steinmetz et al.  handle the case of ensuring contact when an expected contact is not satisfied, but require switching between multiple controllers, which is known to suffer stability issues . Additionally,  cannot adapt to contacts made sooner than expected. Our extensions to the DMP framework enable robust transitions from free-space motion to constrained motion using a single unified controller.
In this section, we present the base methods we employ in our framework. We first define the hybrid force/position control law we use in Section III-A, and then present a standard formulation of DMPs in Section III-B. Our novel contributions will be presented in Section IV.
Iii-a Hybrid Force/Position Controller
We utilize the operational space hybrid force/position controller defined in  which we present here for clarity. The controller we use has the form
where and are joint torques corresponding to position and force control laws, respectively, is an arbitrary joint space control law to be commanded in the null space of hybrid force/position control, and is gravity compensation in joint space. is the -dimensional identity matrix where is the number of robot joints, is the analytic Jacobian, and is the generalized Jacobian pseudo-inverse derived in  as
where is the joint space inertia matrix and is the inertia matrix reflected into task space defined by
We use the null space projection to command a low-gain PD controller in joint space that tracks a desired posture keeping the robot away from joint limits when possible.
We use a Cartesian inverse dynamics controller defined as
where are desired Cartesian poses, velocities, and accelerations, are actual poses and velocities, are joint velocities, and are positive semi-definite gain matrices. is a block tensor transformation that performs selection for position control in the constraint frame defined as
for the rotation matrix from the base frame to the constraint frame and the diagonal selection matrix defined in Section I. Note that and are generally time-varying but we drop the subscripts here to be consistent with the controller definition.
We control forces using the following PI control law
where are positive semi-definite gain matrices, are desired and actual forces, and is the window of error accumulation. is the force control selection matrix where . Force tracking occurs for each dimension of the constraint frame that has . We can achieve pure free-space motion by setting .
In Section VI-A1 we experimentally compare against PI force control with Integral Error Scaling (IES) presented in . This technique attenuates the integral error when it opposes the desired direction of force in order to mitigate the chance of the end-effector breaking contact with the surface. For IES, the integral error term in Equation (6) switches to for when .
Iii-B Dynamic Movement Primitives
We learn DMPs for position trajectories and force profiles following the formulation of  characterized by the following set of equations:
Equations (7) and (8) define a first order critically damped dynamical system for an appropriate choice of where is the state variable being tracked, is the initial state, is the goal, and a forcing function. Equation (9) specifies the evolution of a phase variable that decouples the system from explicit time. Equation (10) defines the forcing function as a normalized linear combination of basis functions. We use Gaussian basis functions as is common in the literature , where Equation (11) parameterizes them with centers and widths . Each degree of freedom receives its own DMP which are synchronized by the common phase variable .
We now present the details of our novel contributions. We describe our novel approach to learning task constraints with a dynamic constraint frame in Section IV-A, and our novel extensions to the DMP framework in Section IV-B that allow for robust transition from free-space to in-contact motion.
Iv-a Learning Time-Varying Task Constraints
For the in-contact task phase, instead of learning the selection matrix for a fixed constraint frame (as in, e.g. [5, 8], [9, 10]), we propose learning a dynamic constraint frame for which can be specified in a canonical way. A key insight is that we can align a principal axis of the constraint frame to the direction of desired force, thereby requiring only one degree of freedom for force control. We set the -axis to be the axis aligned to desired forces observed during demonstration, resulting in selection matrix values of and otherwise. This corresponds to force control along the -axis of the constraint frame and position control on all other axes. The choice of is arbitrary.
For every timestep of the in-contact task phase, we learn a Cartesian coordinate frame with its -axis aligned to the direction of desired force . We create the input to the learning from the forces111We employ an online low-pass filter on the force sensor with a cutoff frequency of 1.5Hz. This is a lower cutoff frequency than typically used, but the added noise reduction is beneficial for learning from the sensor readings. The filter adds a small time delay on the order of 20 milliseconds, which is insignificant given the 1000Hz sampling rate of the sensor. observed during demonstration by defining the -axis at each time step to be the force vector read from the force sensor normalized to unit length. We construct the other axes by selecting the end effector -axis as a candidate orthogonal axis and use cross products to create a valid right-handed coordinate system. We learn a single CDMP (described in Section III-B) from the constructed input data using ridge regression. The output is a smoothly varying trajectory for the constraint frame with a -axis that tracks the direction of desired force. We obtain a smoothly varying estimate of the magnitude of desired forces to be applied along the -axis of learned constraint frame by learning a DMP from . Our method inherits the generalization benefits of DMPs well known in the literature . Thus, any modulations applied to the robot’s motion DMPs (e.g. temporal modulation of the trajectory) can also be applied to the learned constraint frame and desired forces, ensuring motion and force objectives remain in sync as specified by the user demonstration.
We show in our experiments in Section VI-C that controlling with respect to our learned constraint frame allows desired forces to be tracked using one degree of freedom for force control, even when desired forces span multiple dimensions of fixed reference frames such as the world or tool frames. We also show in Section VI-B2 that we achieve compensation of frictional forces while sliding without explicitly modeling frictional properties of the robot or the environment. This improves upon the typical hybrid force/position control paradigm that makes the simplifying assumption of frictionless contact . Previous approaches for learning hybrid force/position control from demonstration (e.g. [7, 9], ) do not discover these forces and rely on low-friction environments to demonstrate their methods.
Iv-B Extended DMPs for Making Stable Contact
We propose to augment the DMP framework for the purpose of robustly transitioning between position control and force control when making contact with a surface.
Iv-B1 Halt DMP at Surface Contact
To bring the system to a halt when the robot detects contact, we modify Equation (8):
where is the sensed force in the same task space dimension as and determines how sensitive the system is to contact forces. We define the contact classifier as
where is the mean value of over a sliding window of size and is the mean value of with no external force applied to the sensor, i.e. the noise inherent to the force sensor. We use a sliding window approach in favor of an instantaneous threshold (e.g. ) to provide robustness to small transient disturbances that might be caused by estimation errors in the gravity compensation of the end-effector. We demonstrate in Section VI-A1 that our method affords lower impact forces when contacting a surface earlier than anticipated.
The right-hand side of Equation (14) has a similar form to a term proposed in  for halting a DMP system when pose error accumulates and in  when force error accumulates. However, in  and  the terms are applied to the phase variable and not the transformation system. We apply our term directly to the transformation system velocity as it allows us to selectively decouple the halting behavior of different dimensions. We show in our experiments (Section VI-A2) that the robot can halt motion in a dimension with an expected contact, while the remaining unconstrained dimensions continue to converge to their desired goal states. This could not be achieved if the term were used in the shared phase variable, as it synchronizes control across all dimensions.
Iv-B2 Change in Goal Based on Contact Conditions
If we assume the robot made the intended contact, but at an earlier time (see Section VII for a further discussion of this assumption), then the modification in Equation (14) alone does not suffice for completing the task. The DMP will remain in a halted state until the force disappears. This will not happen when the sensed forces come from an intended contact and not a transient disturbance. We instead desire the free-space DMP system to gracefully terminate its execution and transition into the in-contact phase of the task. We achieve this by allowing the goal to dynamically change determined by
where is the original goal, is the current goal, the current DMP state, and the contact classifier in Equation (15).
Equation (16) smoothly moves the current goal to coincide with the robot’s current state when the robot detects stable contact. Once the goal and state coincide, the robot ends the free-space task phase and transitions to the in-contact phase. If a disturbance caused the sensed force and it disappears before the transition occurs, Equation (16) affords a smooth transition back to the original goal and the phase proceeds from that point as it would if no contact had been made. Parameters control the rate of goal transition.
When the surface is farther than expected, the pose DMPs will converge to their respective goals before making contact with the surface. At goal convergence, each term in Equation (7) approaches zero, but we can still incite movement toward the desired contact by moving the goal in the direction of the desired contact by a small amount . This moves the end-effector at a constant velocity towards the desired contact, achieving similar behavior to  and . Our method is advantageous over these methods as we do not require controller switching , and we only require a single demonstration as opposed to hundreds of real-robot trials .
Iv-B3 Incremental Force Control on Contact
When the robot first makes contact with the surface, an initial impact force will be applied to the surface that depends on the velocity at impact; a higher approach velocity results in a higher impact force. Though we mitigate impact forces with the DMP feedback in Equation (14), we still desire to enable force control when in contact in order to avoid sustained application of high impact forces and to gracefully transition into the constrained motion phase of the task. However, when the force error is large, enabling force control instantaneously can make retaining stable contact with the surface difficult, particularly for a stiff environment [3, 4].
We propose to overcome this difficulty by incrementally enabling force control for the desired dimension by leveraging the gradual goal transition of Equation (16). Instead of a strictly binary selection matrix for the hybrid force/position controller, we allow the Cartesian dimension transitioning to force control to continuously vary from 1 to 0 determined by
where is the system state at the time of contact, is the current system state, is the DMP goal at the time of contact, and is the current DMP goal. This term initializes to 1 when the robot initially makes contact, and converges to 0 as the goal converges to the current system state . This allows the controller to smoothly transition from position control to force control as runs through convex combinations of the two control laws. We show in Section VI-A1 that this technique affords stable contact and steady-state tracking when making contact at different approach velocities.
V Experimental Setup
We validate our methods on a Baxter robot equipped with a 6-axis Optoforce HEX-E-200N force-torque sensor at the wrist. Both the robot state and the force-torque sensor state are sampled at 1000Hz. Robot controllers operate at a rate of 1000Hz. The end-effector is a hard plastic sphere threaded to the tip of a steel shaft which affords a point contact that can vary easily over the course of the trajectory. Experiments were performed using an Intel Core i7-4770 CPU @ 3.40GHz x 8 computer with 8GB of RAM running Ubuntu 14.04 and ROS Indigo. Software is available at https://goo.gl/8WEoxH and data is available at https://goo.gl/QjyDG7.
We provide kinesthetic demonstrations by manually moving the robot arm in gravity-compensation mode. Once recorded, the system autonomously segments the demonstrations using the contact classifier in Equation (15) into three phase types: making-contact, in-contact, and breaking-contact. The system presents the segmentation points in a graphical interface for operator review and adjustment if desired. We found that user modification was rarely needed. Desired goal forces for making contact are equal to the initial desired forces for the sliding phase. A DMP is learned for each DOF in each task phase as described in Sections III-B and IV-A. DMP parameters were set according to guidance in the prior art . All DMP and controller parameters are kept the same in all experiments unless otherwise stated in the text. We now lay out an overview of our experimental protocol; we present the associated results in Section VI.
V-a Making contact with a surface
We first show the ability of our approach to stably make contact with a surface. We first examine making contact when the end-effector moves straight down from its initial position above a table to a desired contact point on the table. We perform tests for making stable contact at varying table heights and approach velocities, when given a single demonstration for the table at a nominal height. We compare to using the standard DMP formulation with no force feedback (“open-loop” below) and to using PI force control with and without Integral Error Scaling. We next examine making contact when the end-effector moves at an angled approach from above the table to a desired contact point on the table. These experiments illustrate the advantage of putting our DMP feedback on the transformation system instead of the canonical system.
V-B Sliding on a flat surface.
In the second set of experiments, the robot must first lower itself to the table to make stable contact, and then slide the end-effector along the surface to a desired position while maintaining a desired force. The desired force profile and pose trajectory are determined by the provided demonstration, which in our experiments was to apply a small force into the table while keeping the end-effector approximately perpendicular to the table surface. We conduct experiments across both low and high-friction surfaces. We compare force tracking using our learned dynamic constraint frame against controlling for desired forces in the fixed world frame. We demonstrate that our method actively applies the forces that the user implicitly applied during demonstration to account for the effect of friction in sliding while keeping the end-effector perpendicular to the table.
V-C Sliding on a curved surface.
Our final experiments focus on the end-effector sliding along the curved, inside surface of a large mixing bowl. These experiments demonstrate that our learned constraint frame can track desired forces that sweep through all three dimensions of a fixed constraint frame using only one degree of freedom in task space for force control.
We now present our experimental results following the same structure outlined in Section V.
Vi-a Making contact with a surface
We first isolate the case of making contact with a table for which the height may be higher or lower compared to the height observed in demonstration. This is the first phase of the multi-phase sliding task we consider in Section VI-B below.
Vi-A1 Straight down approach
We initialized the robot end-effector to hover 20cm in the world -axis above a table at a nominal height of 77cm measured from the ground to the table surface. We recorded a demonstration that moved the end effector from its initial position to a desired contact point on the table. The start and end poses of the trajectory can be seen in Figure 1(a). We then vary the height of the table to 74cm and 80cm in order to illustrate the efficacy of our DMP extensions for making stable contact. These heights were chosen to be large enough to clearly illustrate the benefits of our methods while still allowing for open-loop position trajectories to be executed for reference without applying unsafe forces.
For the lower height of 74cm, open-loop position control leaves the end-effector hovering approximately 3cm above the desired contact point. We use our DMP extension described in Section IV-B2 to slowly change the goal in the direction of the desired contact. We chose a value of to move the goal, as this value generates a slow enough speed to easily make stable contact. Once the contact classifier detects contact, the robot enables force control and tracks the desired initial sliding-phase force of approximately 2N.
For making contact at the higher height of 80cm, we compare our method of DMP force feedback with incremental force control against PI force control with and without Integral Error Scaling (IES) described in Section III-A. We test three different execution speeds by varying the DMP temporal scaling parameter which approximately corresponds to trajectory duration in seconds. We chose values for . For each method we use the same control gains which were empirically found to exhibit good steady-state tracking once already in contact.
We found that PI force control alone could not make stable contact at any speed using these control gains; control immediately went unstable and had to be terminated for safety. By introducing IES with a value of , stable contact was retained at each speed. However, as seen in the top of Figure 1(b), there is steady-state tracking error of approximately 1.5N for the case of . The results for our method are shown in the bottom of Figure 1(b). We achieve stable contact, steady-state tracking, and reduce impact forces in all cases.
Vi-A2 Angled approach
Results for this case are pictured in Figure 1(a). The end-effector was initialized to be approximately 25cm above a table height of 74cm. A demonstration was recorded that moved the end-effector at an angled approach to the table along a straight line trajectory to a desired contact point on the table. We compare two different behaviors possible with our DMP feedback term defined in Equation (14) on a table height of 86cm to make the difference apparent. When Equation (14) is activated for all task space dimensions (equivalent to applying the change on the canonical system as previously proposed) using , the end-effector halts as soon as contact is detected and moves no further. The end-effector reached the goal in the direction222Based on our change in goal technique, the goal converges to the current pose in the direction when stable contact has been retained sufficiently long., but cannot reach the goal even though those directions are unconstrained. To achieve full goal convergence, we activate Equation (14) only for , the dimension in which contact is expected. In this case the end-effector makes contact, halts in the direction but continues to converge to the position goal in other directions.
Vi-B Sliding on a flat surface.
We now demonstrate how our learned dynamic constraint frame is used to perform hybrid force/position control using only one task space degree of freedom for force control. We test two surfaces with drastically different friction properties and demonstrate that controlling with respect to our learned constraint frame results in compensation of frictional forces despite having no explicit model of friction.
Vi-B1 Low-friction surface
A demonstration for making contact similar to in Section VI-A1 was recorded, but once in contact, the end-effector was slid across the table while applying a small force as pictured in Figure 2(a). The trajectory was then executed using DMP playback. We compare our method for controlling with respect to a learned dynamic constraint frame described in Section IV-A against controlling desired forces in the -axis of the fixed world frame which is orthogonal to the table surface. Figure 4 compares the pose error and force profiles referenced to the world frame. The force profiles for both methods are similar and adhere closely to desired forces. However, the L2-norm of the pose error is noticeably higher for controlling with respect to the world frame. This is because unmodeled friction between the end-effector and the table drags down the sliding motion. Our learned constraint frame is less influenced by this effect since it is aligned to the forces observed in demonstration, including compensation forces due to friction.
Vi-B2 High-friction surface
We performed the same sliding experiment on a wooden board covered in 150 grit sand paper. As seen in Figure 2(a), the -axis of the learned constraint frame points primarily into the table where desired forces dominate, but it also points slightly in the direction of motion. This is due to the learned constraint frame aligning not only to the forces explicitly imposed by the user during demonstration, but also the compensation forces the user was implicitly applying to overcome friction while maintaining a desired pose during sliding.
The pose error for controlling with respect to the world frame is exacerbated, as seen in Figures 2(a) and 4 (and our supplementary video). The pose error for controlling with respect to our learned constraint frame remains low, and is in fact lower than for the smooth table due to the overall lower forces being applied to the sand paper surface. Figure 4 shows the force profiles observed in each case. Both methods show good tracking in the dimension. Our method also exhibits good tracking of the compensation forces for friction in the and dimensions. We highlight that we achieve this without modeling friction, and by using only one dimension of the constraint frame for force control. Interestingly, the and forces for controlling with respect to the world frame reach a similar magnitude, but at a delayed time. We suggest this is because the frictional forces are being passively reacted to, as opposed to being actively commanded as our method does.
Vi-C Sliding on a curved surface.
A more complex force profile is achieved by sliding the end-effector along the inside of a mixing bowl as pictured in Figure 2(a). In this case desired forces over the course of the trajectory vary in a non-trivial manner over all axes of the world and tool frames, both commonly chosen constraint frames . We compare the desired and actual force profiles for the trajectory in the world, tool, and learned constraint frames in Figure 2(b) when the trajectory is controlled with respect to the learned constraint frame. Desired forces in each frame are tracked in all three dimensions despite using only one degree of freedom for force control in the learned constraint frame.
We attempted to compare against controlling with respect to a fixed constraint frame. However, we were unable to perform the experiments safely. Enabling force control for only one dimension, for example in the world frame, would work as long as motion was primarily orthogonal to that direction. However, as soon as the end-effector started moving along that axis, control became unpredictable and had to be terminated. Using a fixed frame for this task requires very precise timing of constraint specification. We will in future work seek a reasoned criterion for determining this specification.
Vii Discussion and Future Work
We presented a novel solution to learning hybrid force/position control for multi-phase tasks. Our experimental results demonstrate that using a dynamic constraint frame aligned to the direction of desired force allows three-dimensional forces to be controlled accurately using only one degree of freedom in the constraint frame. We additionally found that controlling with respect to our learned constraint frame compensates for frictional forces without any explicit modeling of friction, thereby reducing pose deviation over controlling with respect to a fixed frame. An interesting avenue for future work is to learn to adapt to surfaces with higher or lower friction than was observed in demonstration. Reinforcement learning may be one promising approach to achieve this sort of generalization .
Our novel extensions to the DMP framework were shown to provide robust transition from free-space motion to surface-constrained motion in spite of environment uncertainty. Our method affords reduced impact forces and better steady-state tracking on higher velocity impacts than other comparable methods. As indicated in Section IV-B2, we assume an early contact is the intended contact, as opposed to an undesired collision. We make this assumption since the robot only uses a wrist force/torque sensor to classify contacts. In most cases the robot could avoid observed obstacles using collision avoidance techniques for DMPs . When unintended contact cannot be avoided, other perceptual modalities such as visual and tactile feedback can allow for more robust classification of intended and unintended contacts. We leave multi-sensory, robust contact classification as a direction for future work.
-  V. Ortenzi, R. Stolkin, J. Kuo, and M. Mistry, “Hybrid motion/force control: a review,” Advanced Robotics, vol. 31, pp. 1102–1113, 2017.
-  B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo, “Robotics: modelling, planning and control,” Springer, 2009.
-  N. Mandal and S. Payandeh, “Experimental evaluation of the importance of compliance for robotic impact control,” in Control Applications, Second IEEE Conference on. IEEE, 1993, pp. 511–516.
-  L. S. Wilfinger, “A comparison of force control algorithms for robots in contact with flexible environments,” 1992.
-  L. Peternel, L. Rozo, D. Caldwell, and A. Ajoudani, “A method for derivation of robot task-frame control authority from repeated sensory observations,” RAL, vol. 2, pp. 719–726, 2017.
-  L. Rozo, S. Calinon, and D. G. Caldwell, “Learning force and position constraints in human-robot cooperative transportation,” in RO-MAN. IEEE, 2014, pp. 619–624.
-  F. Steinmetz, A. Montebelli, and V. Kyrki, “Simultaneous kinesthetic teaching of positional and force requirements for sequential in-contact tasks,” in Humanoids. IEEE, 2015, pp. 202–209.
-  A. L. P. Ureche, K. Umezawa, Y. Nakamura, and A. Billard, “Task parameterization using continuous constraints extracted from human demonstrations,” Transactions on Robotics, vol. 31, 2015.
-  Z. Deng, J. Mi, Z. Chen, L. Einig, C. Zou, and J. Zhang, “Learning human compliant behavior from demonstration for force-based robot manipulation,” in ROBIO. IEEE, 2016, pp. 319–324.
-  M. Suomalainen and V. Kyrki, “Learning compliant assembly motions from demonstration,” in IROS. IEEE, 2016, pp. 871–876.
-  M. Hazara and V. Kyrki, “Reinforcement learning for improving imitated in-contact skills,” in Humanoids. IEEE, 2016, pp. 194–201.
-  L. Armesto, J. Moura, V. Ivan, A. Salas, and S. Vijayakumar, “Learning constrained generalizable policies by demonstration,” in RSS, 2017.
-  M. Racca, J. Pajarinen, A. Montebelli, and V. Kyrki, “Learning in-contact control strategies from demonstration,” in IROS, 2016.
-  H.-C. Lin, P. Ray, and M. Howard, “Learning task constraints in operational space formulation,” in ICRA. IEEE, 2017, pp. 309–315.
-  D.-H. Park, H. Hoffmann, P. Pastor, and S. Schaal, “Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields,” in Humanoids. IEEE, 2008, pp. 91–98.
-  P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” in ICRA. IEEE, 2009, pp. 763–768.
-  A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: learning attractor models for motor behaviors,” Neural Computation, vol. 25, no. 2, pp. 328–373, 2013.
-  P. Kormushev, S. Calinon, and D. G. Caldwell, “Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input,” Advanced Robotics, vol. 25, no. 5, pp. 581–603, 2011.
-  F. J. Abu-Dakka, B. Nemec, J. A. Jørgensen, T. R. Savarimuthu, N. Krüger, and A. Ude, “Adaptation of manipulation skills in physical contact with the environment to reference force profiles,” Autonomous Robots, vol. 39, no. 2, pp. 199–217, 2015.
-  M. Deniša, T. Petrič, A. Gams, and A. Ude, “A review of compliant movement primitives,” in Robot Control. InTech, 2016.
-  E. Shahriari, A. Kramberger, A. Gams, A. Ude, and S. Haddadin, “Adapting to contacts: Energy tanks and task energy for passivity-based dynamic movement primitives,” in Humanoids, Nov 2017, pp. 136–142.
-  M. Kalakrishnan, L. Righetti, P. Pastor, and S. Schaal, “Learning force control policies for compliant manipulation,” in IROS, 2011.
-  J. Kober, M. Gienger, and J. J. Steil, “Learning movement primitives for force interaction tasks,” in ICRA. IEEE, 2015, pp. 3192–3199.
-  D. Kappler, P. Pastor, M. Kalakrishnan, M. Wuthrich, and S. Schaal, “Data-driven online decision making for autonomous manipulation,” in RSS, Rome, Italy, July 2015.
-  O. Kroemer, C. Daniel, G. Neumann, H. Van Hoof, and J. Peters, “Towards learning hierarchical skills for multi-phase manipulation tasks,” in ICRA. IEEE, 2015, pp. 1503–1510.
-  P. Pastor, M. Kalakrishnan, L. Righetti, and S. Schaal, “Towards associative skill memories,” in Humanoids. IEEE, 2012, pp. 309–315.
-  D. Drieß, P. Englert, and M. Toussaint, “Constrained bayesian optimization of combined interaction force/task space controllers for manipulations,” in ICRA. IEEE, 2017, pp. 902–907.
-  O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE Journal on Robotics and Automation, vol. 3, no. 1, pp. 43–53, 1987.
-  P. Pastor, L. Righetti, M. Kalakrishnan, and S. Schaal, “Online movement adaptation based on previous sensor experiences,” in IROS. IEEE, 2011, pp. 365–371.
-  A. Ude, B. Nemec, T. Petrić, and J. Morimoto, “Orientation in cartesian space dynamic movement primitives,” in ICRA, 2014, pp. 2997–3004.