A Framework for Robot Manipulation:
Skill Formalism, Meta Learning and Adaptive Control
Abstract
In this paper we introduce a novel framework for expressing and learning forcesensitive robot manipulation skills. It is based on a formalism that extends our previous work on adaptive impedance control with meta parameter learning and compatible skill specifications. This way the system is also able to make use of abstract expert knowledge by incorporating process descriptions and quality evaluation metrics. We evaluate various stateoftheart schemes for the meta parameter learning and experimentally compare selected ones. Our results clearly indicate that the combination of our adaptive impedance controller with a carefully defined skill formalism significantly reduces the complexity of manipulation tasks even for learning peginhole with submillimeter industrial tolerances. Overall, the considered system is able to learn variations of this skill in under 20 minutes. In fact, experimentally the system was able to perform the learned tasks faster than humans, leading to the first learningbased solution of complex assembly at such realworld performance.
I Introduction
Typically, robot manipulation skills are introduced as more or less formal representations of certain sets of predefined actions or movements. Already, there exist several approaches to programming with skills, e.g. [1, 2, 3]. A common drawback is, however, the need for laborious and complex parameterization resulting in a manual tuning phase to find satisfactory parameters for a specific skill. Depending on the particular situation various parameters need to be adapted in order to account for different environment properties such as rougher surfaces or different masses of involved objects. Within given boundaries of certainty they could be chosen such that the skill is fulfilled optimally with respect to a specific cost function. This cost function and additional constraints are usually defined by human experts optimizing e.g. for low contact forces, short execution time or low power consumption. Typically, manual parameter tuning is a very laborious task, thus autonomous parameter selection without complex preknowledge about the task other than the task specification and the robot prior abilities is highly desirable. However, such automatic tuning of control and other task parameters in order to find feasible, ideally even optimal parameters, in the sense of a cost function is still a significant open problem in robot skill design
So far, several approaches were proposed to tackle this problem. In [4], e.g. learning motor skills by demonstration is described. In [5] a Reinforcement Learning (RL) based approach for acquiring new motor skills from demonstration is introduced. [6, 7] employ RL methods to learn motor primitives that represent a skill. In [8] supervised learning by demonstration is used in combination with dynamic movement primitives (DMP) to learn bipedal walking in simulation. An early approach utilizing a stochastic realvalued RL algorithm in combination with a nonlinear multilayer artificial neural network for learning robotic skills can be found in [9]. In [10], guided policy search was applied to learn various manipulation tasks based on a neural network policy. The drawbacks of many existing approaches are their high demand for computational power and memory, e.g. in form of GPUs and computing clusters, and with a few exceptions they require a large amount of learning time already in simulation.
Reference  Principle  Motion strategy  Underlying controller  Adaptation / Learning principle  Learning / Insertion speed  Difficulty 
[11, 12, 13, 14, 15]  Geometric approaches, analysis of peginhole and force/moment guidance  Reference trajectory  Impedance, admittance, force control  none  Insertion times vary between  s  Round pegs with tolerances between  mm. 
[16]  Combination of visual perception and compliant manipulation strategies to approach a difficult peginhole problem  Trajectory generator  Impedance control  none  Insertion time was about  s  Multiple pieces of different forms were used with tolerances of about mm. 
[17, 9]  Peginhole skill is learned via reinforcement learning.  Policy in terms of trajectory is represented by neural networks  Admittance, force control  Reinforcement learning  Learning took about  trials, insertion time is between  s after learning  2Dproblem with tolerance of mm in simulation and a round peg with a tolerance of mm in experiments.) 
[18, 19]  DMPs are adapted based on measurements of the external wrench to adapt demonstrated solutions of a peginhole task  Trajectory from DMP  Impedance, Admittance, Force  Trajectory adaptation  Adaptation requires about cycles. Insertion requires about  s based on human demonstration.  Several basic shapes and and varying tolerances of  mm. 
[20]  Several controllers (with and without adaptation) were compared for a peginhole task. The initial strategy is learned from human demonstration  (Initial) reference trajectory  Impedance control  Adaptation of orientation via random sampling and Gaussian mixture models  Average insertion times of  s were achieved, trials were used for sampling. Data acquisition by human demonstration initially is necessary.  Round peg and hole made of steel and a wooden peg with a rubber hole were used. 
[10, 21, 22]  Policies represented by convolutional neural networks are learned via reinforcement learning based on joint states and visual feedback as an endtoend solution. Among other tasks peginhole has been learned mostly in simulation but also in real world experiments.  Policies directly generate joint torques  linearGaussian controllers / neural network control policies  Reinforcement learning  Depending of the task learning took  trials in simulation and realworld experiments. Additional computational requirements of about minutes are mentioned  Tasks such as bottle opening and simpler variations of peginhole were learned. 
In order to execute complex manipulation tasks such as inserting a peg into a hole soft robotics controlled systems like Franka Emika’s Panda [23] or the LWR III [24] are the system class of choice. Early work in learning impedance control, the most wellknown softrobotics control concept, is for example introduced in [25]. More recent work can be found in [26, 27]. Here, the first one focuses on humanrobot interaction, while the second one makes use of RL to learn controller parameters for simple realworld tasks. In [28] basic concepts from human motor control were transferred to impedance adaptation.
Contribution
Our approach bases on several concepts and connects them in a novel way. Soft robotics [29] is considered the basis in terms of hardware and fundamental capabilities, i.e. the embodiment, enabling us in conjunction with impedance control [30] to apply the idea of learning robot manipulation to rather complex problems. We further extend this by making use of the adaptive impedance controller introduced in [31]. Both Cartesian stiffness and feedforward wrench are adapted during execution, depending on the adaptive motion error and based on four interpretable meta parameters per task coordinate. From this follows the question how to choose these meta parameters with respect to the environment and the concrete problem at hand. To unify all these aspects we introduce a novel robot manipulation skill formalism that acts as a meaningful interpretable connection between problem definition and real world. Its purpose is to reduce the complexity of the solution space of a given manipulation problem by applying a well designed, however still highly flexible structure, see Fig. 1. When applied to a realworld task this structure is supported by an expert skill design and a given quality metric for evaluating the skill’s execution. The reduction of complexity is followed by the application of machine learning methods such as CMAES [32], Particle Swarm Optimization [33] and Bayesian optimization [34] to solve the problem not directly on motor control level but in the meta parameter space.
In summary, we put forward the hypothesis that learning manipulation is much more efficient and versatile if we can make use of local intelligence of system components such as the adaptive impedance controller and a well structured skill formalism, essentially encoding abstract expert knowledge. These are the physics grounded computational elements that receive more abstract commands such as goal poses and translate them into basic joint torque behaviors. In this, we draw inspiration from the way humans manipulate their surroundings, i.e. not by consciously determining the muscle force at every time step but rather making use of more complex computational elements [35].
As a particular realworld example to support our conceptual work, we address the wellknown and researched, however, in general still unsolved [20] peginhole problem. Especially speed requirements and accuracy still pose significant challenges even when programmed by experts. Different approaches to this problem class were devised. Table I depicts a representative selection of works across literature aiming to solve peginhole and categorizes them. As can be seen, insertion times greatly depend on the problem difficulty, although, modern control methodologies are clearly beneficial compared to older approaches. Learning performance has significantly increased over time, however, part of this improvement may have been bought with the need for large computational power e.g. GPUs and computing clusters which might be disadvantageous in more autonomous settings. The difficulty of the considered problem settings in terms of geometry and material varies from industrial standards to much more simple everyday objects.
In summary, our contributions are as follows.

Extension of the adaptive impedance controller from [31] to Cartesian space and full feedforward tracking.

A novel meta parameter design for the adaptive controller from [31] based on suitable realworld constraints of impedance controlled systems.

A novel graphbased skill formalism to describe robot manipulation skills and bridge the gap between highlevel specification and lowlevel adaptive interaction control. Many existing approaches have a more practical and highlevel focus on the problem and lack a rigid formulation that is able to directly connect planning with manipulation [1, 2, 3].

The performance of the proposed framework is showcased for three different (industrially relevant) peginhole problems. We show that the used system can learn tasks complying with industrial speed and accuracy requirements^{1}^{1}1In the accompanying video it is demonstrated that on average the used robot is even able to perform the task faster than humans..

The proposed system is able to learn complex manipulation tasks in a short amount of time of 520 minutes depending on the problem while being extremely efficient in terms of raw computational power and memory consumption. In particular, our entire framework can run on a small computer such as the Intel NUC while maintaining a realtime interface to the robot and at the same time running a learning algorithm.
The remainder of the paper is organized as follows. Section II describes the adaptive impedance controller, which behavior can be changed fundamentally by adapting its meta parameters. Section III introduces our skill definition and defines the formal problem at hand. In Sec. IV the learning algorithms applied to the problem definition are investigated. In Section V we apply our approach to the wellknown peginhole problem. Finally, Sec. VI concludes the paper.
Ii Adaptive Impedance Controller
Consider the standard rigid robot dynamics
(1) 
where is the symmetric, positive definite mass matrix, the Coriolis and centrifugal torques, the gravity vector and the vector of external linkside joint torques. The adaptive impedance control law is defined as
(2) 
where denotes the adaptive feedforward wrench. is an optional time dependent feedforward wrench trajectory, the stiffness matrix, the damping matrix and the Jacobian. The position and velocity error are and , respectively. ”” denotes the desired motion command. The dynamics compensator can be defined in multiple ways, see for example [31]. The adaptive tracking error [36] is
(3) 
with . The adaptive feedforward wrench and stiffness are
(4) 
where and denote the initial values. The controller adapts feedforward wrench and stiffness by
(5)  
(6) 
The positive definite matrices , , , are the learning rates for feedforward commands, stiffness and the respective forgetting factors. The learning rates and determine stiffness and feedforward adaptation speed. and are responsible for slowing down the adaptation process, which is the main dynamical process for low errors. Cartesian damping is designed according to [37] and denotes the sample time of the controller. Reordering and inserting (5) and (6) into (2) leads to the overall control policy
(7) 
denotes the percept vector containing the current pose, velocity, forces etc.
Iia Meta Parameter Constraints
In order to constrain the subsequent meta learning problem we make use of the following reasonable constraints that characterize essentially every physical realworld system. For better readability we discuss the scalar case, which generalizes trivially to the multidimensional one. The first constraint of an adaptive impedance controller is the upper bound of stiffness adaptation speed
(8) 
If we now assume that and we may define as the error at which holds. Then, the maximum value for can be written as
(9) 
Furthermore, the maximum decrease of stiffness occurs for and , where denotes the maximum stiffness, also being an important constraint for any realworld impedance controlled robot. Thus, we may calculate an upper bound for as
(10) 
Finding the constraints of the adaptive feedforward wrench may be done analogously. In conclusion, we relate the upper limits for , , and to the inherent system constraints , , and .
Iii Manipulation Skill
In this section we introduce a mathematical formalism to describe robot manipulation skills. A manipulation skill is defined as the directed graph consisting of nodes and edges . A node is also called a manipulation primitive (MP), while an edge is also called transition. The transitions are activated by conditions that mark the success of the preceding MP. A single MP consists of a parameterized twist and feed forward wrench trajectory
where is the desired twist and the feed forward wrench skill command. These commands are described with respect to a task frame . The percept vector is required since information about the current pose or external forces may in general be needed by the MPs. is the set of parameters used by node to generate twist commands while is used to generate the wrench commands. Moreover, and , where is the set of all parameters. Furthermore, we divide into two different subsets and . The parameters in are entirely determined by the context of the task e.g. geometric properties of involved objects. is the subset of all parameters that are to be learned. They are chosen from a domain which is also determined by the task context or system capabilities.
Figure 2 shows a principle depiction of the graph with an additional initial and terminal node. The transition coming from the initial node is triggered by the precondition while the transition leading to the terminal node is activated by the success condition. Furthermore, every MP has an additional transition that is activated by an error condition and leads to a recovery node. When a transition is activated and the active node switches from to the switching behavior should be regulated.
Note that, although it is possible, we do not consider backwards directed transitions in this work since this would introduce another layer of complexity to the subsequent learning problem that is out of the scope of this paper. Clearly, this would rather become a highlevel planning problem that requires more sophisticated and abstract knowledge.
Conditions
There exist three condition types involved in the execution of skills: preconditions , error conditions and success conditions . They all share the same basic definition, yet their application is substantially different. In particular, their purpose is to define the limits of the skill from start to end. The precondition states the conditions under which the skill can be initialized. The error condition stops the execution when fulfilled and returns a negative result. The success condition also stops the skill and returns positive. Note, that we also make use of the success condition definition to describe the transitions between MPs.
Definition 1 (Condition)
Let be a closed set and a function where . A condition holds iff . Note that the mapping itself obviously depends on the specific definition type.
Definition 2 (Precondition)
denotes the chosen set for which the precondition defined by holds. The condition holds, i.e. , iff . denotes the time at start of the skill execution.
Definition 3 (Error Condition)
denotes the chosen set for which the error condition holds, i.e. . This follows from .
Definition 4 (Success Condition)
denotes the chosen set for which the success condition defined by holds, i.e. iff .
Evaluation
Lastly, a learning metric is used to evaluate the skill execution in terms of a success indicator and a predefined cost function.
Definition 5 (Learning Metric)
denotes the set of all tuples with and the result indicator where denotes failure and success. Let be the cost function of the skill.
Note that the learning metric is designed according to the task context and potential process requirements. Examples for the peginhole skill would be the insertion time or the average contact forces during insertion.
Iiia PeginHole
In the following, the wellknown, however, still challenging peginhole skill is described with the help of the above formalism. Figure 3 shows the graph of the skill including the manipulation primitives. The parameters are the estimated hole pose , the region of interest around the hole ROI, the depth of the hole , the maximum allowed velocity and acceleration for translation and rotation and , the initial tilt of the object relative to the hole , the force , the speed factor , the amplitude of translational and rotational oscillations , and their frequencies , .
Manipulation Primitives
The MPs for peginhole are defined as follows:

: The robot moves to the approach pose.
generates a trapezoidal velocity profile to move from the current pose to a goal pose while considering a given velocity and acceleration.

: The robot moves towards the surface with the hole and establishes contact.

: The object is moved laterally in direction until it is constrained.

: The object is rotated into the estimated orientation of the hole while keeping contact with the hole.

: The object is inserted into the hole.
Iv Parameter Learning
Figure 4 shows how the controller and the skill formalism introduced in Sec. II and III are connected to a given learning method to approach the problem of meta learning, i.e., finding the right (optimal) parameters for solving a given task. The execution of a particular learning algorithm until it terminates is named an experiment throughout the paper. A single evaluation of parameters is called a trial.
Iva Requirements
A potential learning algorithm will be applied to the system defined in Sections II and III. In particular, it has to be able to evaluate sets of parameters per trial in a continuous experiment. Since we apply it to a realworld physical manipulation problem the algorithm will face various challenges that result in specific requirements.

Generally, no feasible analytic solution

Gradients are usually not available

Real world problems are inherently stochastic

No assumptions possible on minima or cost function convexity

Violation of safety, task or quality constraints

Significant process noise and many repetitions

Low desired total learning time
Thus, suitable learning algorithms will have to fulfill subsequent requirements. They must provide a numerical blackbox optimization and cannot rely on gradients. Stochasticity must be regarded and the method has to be able to optimize globally. Furthermore, it should handle unknown and noisy constraints and must provide fast convergence rates.
IvB Comparison
Table II lists several groups of stateoftheart optimization methods and compares them with respect to above requirements. In this, we follow and extend the reasoning introduced in [42]. It also shows that most algorithms do not fulfill the necessary requirements. Note that for all algorithms there exist extensions for the stochastic case. However, comparing all of them is certainly out of the scope of the paper. Therefore, we focus on the most classical representatives of the mentioned classes.
Method  NG  SA  GO  UC 

Grid Search  
Pure Random Search  
Gradientdescent family  
Evolutionary Algorithms  
Particle Swarm  
Bayesian Optimization 
Generally, gradientdescent based algorithms require a gradient to be available, which obviously makes this class unsuitable. Grid search, pure random search and evolutionary algorithms typically do not assume stochasticity and cannot handle unknown constraints very well without extensive knowledge about the problem they optimize, i.e. make use of wellinformed barrier functions. The latter aspect applies also to particle swarm algorithms. Only Bayesian optimization (BO) in accordance to [43] is capable of explicitly handling unknown noisy constraints during optimization.
Although it seems that Bayesian optimization is the most suited method to cope with our problem definition some of the other algorithm classes might also be capable of finding solutions, maybe even faster. Thus, in addition, we select Covariance Matrix Adaptation Evolutionary Strategy (CMAES) [32] and Particle Swarm Optimization (PSO) [33] for comparison. Furthermore, we utilize Latin Hypercube Sampling (LHS) [44] (an uninformed sampler) to gain insights into the difficulty of the problems at hand.
IvB1 Bayesian Optimization
For Bayesian optimization we made use of the spearmint software package [43, 45, 46, 47]. In general, BO finds the minimum of an unknown objective function on some bounded set by developing a statistical model of . Apart from the cost function, it has two major components, which are the prior and the acquisition function.

Prior: We use a Gaussian process as prior to derive assumptions about the function being optimized. The Gaussian process has a mean function and a covariance function . As a kernel we use the automatic relevance determination (ARD) Matérn kernel
with
This kernels results in twicedifferentiable sample functions which makes it suitable for practical optimization problems as stated in [45]. It has hyperparameters in dimensions, i.e. one characteristic length scale per dimension, the covariance amplitude , the observation noise and a constant mean . These kernel hyperparameters are integrated out by applying Markov chain Monte Carlo (MCMC) via slice sampling [48].

Acquisition function: We use predictive entropy search with constraints (PESC) as a means to select the next parameters to explore, as described in [49].
IvB2 Latin Hypercube Sampling
Latin hypercube sampling (LHS) [44] is a method to sample a given parameter space in a nearly random way. In contrast to pure random sampling LHS generates equally distributed random points in the parameter space. It might indicate whether complexity reduction of the manipulation task was successful when it is possible to achieve feasible solutions by sampling.
IvB3 Covariance Matrix Adaptation
The Covariance Matrix Adaptation Evolutionary Strategy (CMAES) is an optimization algorithm from the class of evolutionary algorithms for continuous, nonlinear, nonconvex blackbox optimization problems [32, 50].
The algorithm starts with an initial centroid , a population size , an initial stepsize , an initial covariance matrix and isotropic and anisotropic evolution paths and . , and are chosen by the user. Then the following steps are executed until the algorithm terminates.

Evaluation of individuals sampled from a normal distribution with mean and covariance matrix .

Update of centroid , evolution paths and , covariance matrix and stepsize based on the evaluated fitness.
IvB4 Particle Swarm Optimization
Particle swarm optimization usually starts by initializing all particle’s positions and velocities with a uniformly distributed random vector, i.e. and with being the uniform distribution. The particles are evaluated at their initial positions and their personal best and the global best are set. Then, until a termination criterion is met, following steps are executed:

Update particle velocity:
where and are diagonal matrices with random numbers generated from a uniform distribution in and are acceleration constants usually in the range of .

Update the particle position:
(11) 
Evaluate the fitness of the particle .

Update each and global best if necessary.
V Experiments
In our experiments we investigate the learning methods selected in Sec. IV and compare them for three different peginhole variations for the introduced skill formalism and controller, see Sections II and III. The experimental setup consists of a Franka Emika Panda robot [23] that executes the following routine:

The robot grasps the object to be inserted.

A human supervisor teaches the hole position which is fixed with respect to the robot. The teaching accuracy was below mm.

A learning algorithm is selected and executed until it terminates after a specified number of trials.

For every trial the robot evaluates the chosen parameters with four slightly different (depending on the actual problem) initial positions in order to achieve a more robust result.
We investigated three variations of peginhole as shown in Fig. 5, a key, a puzzle piece and an aluminum peg. The metaparameters and skill parameters were learned with the four methods introduced in Sec. IV.
The parameter domain (see Tab. III) is the same for most of the parameters that are learned in the experiments and can be derived from system and safety limits which are the maximum stiffness , the maximum stiffness adaptation speed , the maximum allowed feed forward wrench and wrench adaptation of the controller and , the maximum error and the maximum velocity .
The domain of the learned parameters is derived from these limits and shown in Tab. III.
Parameter  Min  Max 

N  N  
m  m  
Hz  Hz  
Hz  Hz  
rad  rad 
In the following the specifics of the three tasks shown in Fig. 5 are explained.

Puzzle: The puzzle piece is an equilateral triangle with a side length of m. The maximum rotational amplitude of the oscillatory motion is given by rad. The hole has a depth of m. The tolerances between puzzle piece and hole are mm and there are no chamfers.

Key: The key has a depth of m. Since the hole and the key itself have chamfers to make the initial insertion very easy we omit the learning of the initial alignment and set it to rad. The maximum rotational amplitude of the oscillatory motion is given by rad. Due to the very small hole no deliberate initial pose deviation was applied.

Peg: The aluminum peg has a diameter of m and the hole has a depth of m. The tolerances are mm and there is a mm chamfer on the peg. The maximum rotational amplitude of the oscillatory motion is given by rad. The hole has no walls which results in a higher chance of getting stuck during insertion further increasing the difficulty.
The learning algorithms are configured as follows.

LHS: The parameter space is sampled at points.

CMAES: The algorithm ran for generations with a population of individuals and . The initial centroid was set in the middle of the parameter space.

PSO: We used particles and let the algorithm run for episodes. The acceleration constants were set to and .

BO: The algorithm is initialized with equally distributed random samples from the parameter space.
As cost function for all three problems we used the execution time measured from first contact to full insertion. A maximum execution time of s was part of the skill’s error condition. In case the error condition was triggered we added the achieved distance towards the goal pose in the hole to the maximum time multiplied by a factor.
Va Results
The results can be seen in Fig. 6. The blue line is the mean of the execution time per trial averaged over all experiments. The grey area indicates the % confidence interval.
The results indicate that all four algorithms are suited to a certain degree to learn easier variations of the peginhole task such as the puzzle and the key. However, it must be noted that Bayesian optimization on average finds solutions not as good as the other methods. Furthermore, the confidence interval is notably larger. It also terminates early into the experiment since the model was at some point not able to find further suitable candidates. This might indicate a solution space with high noise and discontinuities that is difficult to model.
The comparison with LHS also indicates that the complexity reduction of our formal approach to manipulation skills makes it possible to find solutions with practically relevant execution times by sampling rather then explicit learning.
At the bottom of Fig. 6 the results for the most difficult peginhole variation are shown. We do not include the results of BO since it was not able to find a solution in the given time frame. The plot showing the LHS method indicates a very hard problem. Random sampling leads to feasible solutions, however, the confidence interval is too large to conclude any assurance. PSO achieves better solutions yet also has a very high confidence interval. CMAES outperforms both methods and is able to find a solution that is better in terms of absolute cost and confidence.
Considering the best performing algorithm CMAES a feasible solution for any of the tasks was already found after minutes and optimized after minutes depending on the task, significantly outperforming existing approaches for learning peginhole, see Tab. I. Note also that with the exception of BO no noteworthy computation time was necessary.
Vi Conclusion
In this paper we introduced an approach to learning robotic manipulation that is based on a novel skill formalism and meta learning of adaptive controls. Overall, the proposed framework is able to solve highly complex problems such as learning peginhole with submillimeter tolerances in a low amount of time, making it even applicable for industrial use and outperforming existing approaches in terms of learning speed and resource demands. Remarkably, the used robot was even able to outperform humans in execution time.
Summarizing the results we conclude that the application of complexity reduction via adaptive impedance control and simple manipulation strategies (skills) in combination with the right learning method to complex manipulation problems is feasible. The results might further indicate that methods supported by a certain degree of random sampling are possibly better suited for learning this type of manipulation skills than those more relying on learning a model of the cost function. These conclusions will be investigated more thoroughly in future work.
Overall, our results show that, in contrast to purely datadriven learning/control approaches that do not make use of task structures nor the concept of (adaptive) impedance, our approach of combining sophisticated adaptive control techniques with stateoftheart machine learning makes even such complex problems tractable and applicable to the real physical world and its highly demanding requirements. Typically, other existing manipulation learning schemes require orders of magnitude more iterations already in simulation while it is known that nowadays simulations cannot realistically (if at all) capture realworld contacts.
Clearly, the next steps we intend to take are the application of our framework to other manipulation problem classes and the thorough analysis of the generalization capabilities of the system to similar, however, yet unknown problems.
References
 [1] M. R. Pedersen, L. Nalpantidis, R. S. Andersen, C. Schou, S. Bøgh, V. Krüger, and O. Madsen, “Robot skills for manufacturing: From concept to industrial deployment,” Robotics and ComputerIntegrated Manufacturing, 2015.
 [2] U. Thomas, G. Hirzinger, B. Rumpe, C. Schulze, and A. Wortmann, “A new skill based robot programming language using uml/p statecharts,” in Robotics and Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2013, pp. 461–466.
 [3] R. H. Andersen, T. Solund, and J. Hallam, “Definition and initial casebased evaluation of hardwareindependent robot skills for industrial robotic coworkers,” in ISR/Robotik 2014; 41st International Symposium on Robotics; Proceedings of. VDE, 2014, pp. 1–7.
 [4] P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” in Robotics and Automation, 2009. ICRA’09. IEEE International Conference on. IEEE, 2009, pp. 763–768.
 [5] P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, and S. Schaal, “Skill learning and task outcome prediction for manipulation,” in Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011, pp. 3828–3834.
 [6] J. Kober and J. Peters, “Learning motor primitives for robotics,” in Robotics and Automation, 2009. ICRA’09. IEEE International Conference on. IEEE, 2009, pp. 2112–2118.
 [7] J. Kober and J. R. Peters, “Policy search for motor primitives in robotics,” in Advances in neural information processing systems, 2009, pp. 849–856.
 [8] S. Schaal, J. Peters, J. Nakanishi, and A. Ijspeert, “Learning movement primitives,” in Robotics Research. The Eleventh International Symposium. Springer, 2005, pp. 561–572.
 [9] V. Gullapalli, J. A. Franklin, and H. Benbrahim, “Acquiring robot skills via reinforcement learning,” IEEE Control Systems, vol. 14, no. 1, pp. 13–24, 1994.
 [10] S. Levine, N. Wagener, and P. Abbeel, “Learning contactrich manipulation skills with guided policy search,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 156–163.
 [11] S. R. Chhatpar and M. S. Branicky, “Search strategies for peginhole assemblies with position uncertainty,” in Intelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International Conference on, vol. 3. IEEE, 2001, pp. 1465–1470.
 [12] J. F. Broenink and M. L. Tiernego, “Peginhole assembly using impedance control with a 6 dof robot,” 1996.
 [13] T. LozanoPerez, M. T. Mason, and R. H. Taylor, “Automatic synthesis of finemotion strategies for robots,” The International Journal of Robotics Research, vol. 3, no. 1, pp. 3–24, 1984.
 [14] H. Bruyninckx, S. Dutre, and J. De Schutter, “Pegonhole: a model based solution to peg and hole alignment,” in Robotics and Automation, 1995. Proceedings., 1995 IEEE International Conference on, vol. 2. IEEE, 1995, pp. 1919–1924.
 [15] W. S. Newman, Y. Zhao, and Y.H. Pao, “Interpretation of force and moment signals for compliant peginhole assembly,” in Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE International Conference on, vol. 1. IEEE, 2001, pp. 571–576.
 [16] A. Stemmer, G. Schreiber, K. Arbter, and A. AlbuSchäffer, “Robust assembly of complex shaped planar parts using vision and force,” in Multisensor Fusion and Integration for Intelligent Systems, 2006 IEEE International Conference on. IEEE, 2006, pp. 493–500.
 [17] V. Gullapalli, R. A. Grupen, and A. G. Barto, “Learning reactive admittance control,” in Robotics and Automation, 1992. Proceedings., 1992 IEEE International Conference on. IEEE, 1992, pp. 1475–1480.
 [18] B. Nemec, F. J. AbuDakka, B. Ridge, A. Ude, J. A. Jorgensen, T. R. Savarimuthu, J. Jouffroy, H. G. Petersen, and N. Kruger, “Transfer of assembly operations to new workpiece poses by adaptation to the desired force profile,” in Advanced Robotics (ICAR), 2013 16th International Conference on. IEEE, 2013, pp. 1–7.
 [19] A. Kramberger, A. Gams, B. Nemec, C. Schou, D. Chrysostomou, O. Madsen, and A. Ude, “Transfer of contact skills to new environmental conditions,” in Humanoid Robots (Humanoids), 2016 IEEERAS 16th International Conference on. IEEE, 2016, pp. 668–675.
 [20] K. Kronander, E. Burdet, and A. Billard, “Task transfer via collaborative manipulation for insertion assembly,” in Workshop on HumanRobot Interaction for Industrial Manufacturing, Robotics, Science and Systems, 2014.
 [21] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “Endtoend training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016.
 [22] C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine, “Learning modular neural network policies for multitask and multirobot transfer,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 2169–2176.
 [23] S. Haddadin, S. Haddadin, and S. Parusel. (2017) Franka emika gmbh. [Online]. Available: www.franka.de
 [24] G. Hirzinger, N. Sporer, A. AlbuSchäffer, M. Hahnle, R. Krenn, A. Pascucci, and M. Schedl, “Dlr’s torquecontrolled light weight robot iiiare we reaching the technological limits now?” in Robotics and Automation, 2002. Proceedings. ICRA’02. IEEE International Conference on, vol. 2. IEEE, 2002, pp. 1710–1716.
 [25] C.C. Cheah and D. Wang, “Learning impedance control for robotic manipulators,” IEEE Transactions on Robotics and Automation, vol. 14, no. 3, pp. 452–465, Jun 1998.
 [26] E. Gribovskaya, A. Kheddar, and A. Billard, “Motion learning and adaptive impedance for robot control during physical interaction with humans,” in 2011 IEEE International Conference on Robotics and Automation, May 2011, pp. 4326–4332.
 [27] J. Buchli, F. Stulp, E. Theodorou, and S. Schaal, “Learning variable impedance control,” The International Journal of Robotics Research, vol. 30, no. 7, pp. 820–833, 2011.
 [28] G. Ganesh, A. AlbuSchäffer, M. Haruno, M. Kawato, and E. Burdet, “Biomimetic motor behavior for simultaneous adaptation of force, impedance and trajectory in interaction tasks,” in Robotics and Automation (ICRA), 2010 IEEE International Conference on. IEEE, 2010, pp. 2705–2711.
 [29] A. AlbuSchäffer, O. Eiberger, M. Grebenstein, S. Haddadin, C. Ott, T. Wimbock, S. Wolf, and G. Hirzinger, “Soft robotics,” IEEE Robotics & Automation Magazine, vol. 15, no. 3, 2008.
 [30] N. Hogan, “Impedance control: An approach to manipulation,” in American Control Conference, 1984. IEEE, 1984, pp. 304–313.
 [31] C. Yang, G. Ganesh, S. Haddadin, S. Parusel, A. AlbuSchäeffer, and E. Burdet, “Humanlike adaptation of force and impedance in stable and unstable interactions,” Robotics, IEEE Transactions on, vol. 27, no. 5, pp. 918–930, 2011.
 [32] N. Hansen and A. Ostermeier, “Completely derandomized selfadaptation in evolution strategies,” Evolutionary computation, vol. 9, no. 2, pp. 159–195, 2001.
 [33] R. Poli, J. Kennedy, and T. Blackwell, “Particle swarm optimization,” Swarm intelligence, vol. 1, no. 1, pp. 33–57, 2007.
 [34] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, “Taking the human out of the loop: A review of bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2016.
 [35] R. S. Johansson and J. R. Flanagan, “Coding and use of tactile signals from the fingertips in object manipulation tasks,” Nature Reviews Neuroscience, vol. 10, no. 5, p. 345, 2009.
 [36] J.J. E. Slotine, W. Li, et al., Applied nonlinear control. Prenticehall Englewood Cliffs, NJ, 1991, vol. 199, no. 1.
 [37] A. AlbuSchäffer, C. Ott, U. Frese, and G. Hirzinger, “Cartesian impedance control of redundant robots: Recent results with the DLRlightweightarms,” in IEEE Int. Conf. on Robotics and Automation, vol. 3, 2003, pp. 3704–3709.
 [38] B. Finkemeyer, T. Kröger, and F. M. Wahl, “Executing assembly tasks specified by manipulation primitive nets,” Advanced Robotics, vol. 19, no. 5, pp. 591–611, 2005.
 [39] T. Kröger, B. Finkemeyer, and F. M. Wahl, “Manipulation primitivesâa universal interface between sensorbased motion control and robot programming,” in Robotic Systems for Handling and Assembly. Springer, 2010, pp. 293–313.
 [40] I. Weidauer, D. Kubus, and F. M. Wahl, “A hierarchical extension of manipulation primitives and its integration into a robot control architecture,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, 2014, pp. 5401–5407.
 [41] C. Pek, A. Muxfeldt, and D. Kubus, “Simplifying synchronization in cooperative robot tasksan enhancement of the manipulation primitive paradigm,” in Emerging Technologies and Factory Automation (ETFA), 2016 IEEE 21st International Conference on. IEEE, 2016, pp. 1–8.
 [42] R. Calandra, A. Seyfarth, J. Peters, and M. P. Deisenroth, “An experimental comparison of bayesian optimization for bipedal locomotion,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, 2014, pp. 1951–1958.
 [43] J. Snoek, “Bayesian optimization and semiparametric models with applications to assistive technology,” Ph.D. dissertation, University of Toronto, 2013.
 [44] M. D. McKay, R. J. Beckman, and W. J. Conover, “Comparison of three methods for selecting values of input variables in the analysis of output from a computer code,” Technometrics, vol. 21, no. 2, pp. 239–245, 1979.
 [45] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in neural information processing systems, 2012, pp. 2951–2959.
 [46] E. Brochu, V. M. Cora, and N. De Freitas, “A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,” arXiv preprint arXiv:1012.2599, 2010.
 [47] K. Swersky, J. Snoek, and R. P. Adams, “Multitask bayesian optimization,” in Advances in neural information processing systems, 2013, pp. 2004–2012.
 [48] R. M. Neal, “Slice sampling,” Annals of statistics, pp. 705–741, 2003.
 [49] J. M. HernándezLobato, M. A. Gelbart, M. W. Hoffman, R. P. Adams, and Z. Ghahramani, “Predictive entropy search for bayesian optimization with unknown constraints.” in ICML, 2015, pp. 1699–1707.
 [50] N. Hansen, “The cma evolution strategy: a comparing review,” Towards a new evolutionary computation, pp. 75–102, 2006.