# Efficient Model Identification for Tensegrity Locomotion

###### Abstract

This paper aims to identify in a practical manner unknown physical parameters, such as mechanical models of actuated robot links, which are critical in dynamical robotic tasks. Key features include the use of an off-the-shelf physics engine and the Bayesian optimization framework. The task being considered is locomotion with a high-dimensional, compliant Tensegrity robot. A key insight, in this case, is the need to project the model identification challenge into an appropriate lower dimensional space for efficiency. Comparisons with alternatives indicate that the proposed method can identify the parameters more accurately within the given time budget, which also results in more precise locomotion control.

## I Introduction

This paper presents an approach for model identification by exploiting the availability of off-the-shelf physics engines used for simulating dynamics of robots and objects they interact with. There are many examples of popular physics engines that are becoming increasingly efficient [1, 2, 3, 4, 5, 6]. These physics engines receive as input mechanical and mesh models of the robots in a particular scene, in addition to controls (force, torque, velocity, etc.) applied to them, and return a prediction of the robot’s dynamical response.

The accuracy of the prediction depends on several factors. The first one is the limitation of the mathematical model used by the engine (e.g., the Coulomb approximation). The second factor is the accuracy of the numerical algorithm used for solving the equations of motion. Finally, the prediction depends heavily on the accuracy of the physical parameters of the robots, such as mass, friction, and elasticity. In this work, we focus on the last factor and propose a method to improve the accuracy of the physical parameters used in the physics engine.

In the context of compliant locomotion systems, the Tensegrity robot of Figure 1 is a structurally compliant platform that can distribute forces into linear elements as pure compression or tension [9]. This robot’s tensile elements can be actuated, enabling it to effectively adapt to complex contact dynamics in unstructured terrains. A policy for a rolling locomotive gait of the platform has been learned from simulated data [10].

Tensegrity robots are inherently high-dimensional, highly-dynamic systems, and providing a predictive model requires a physics-based simulator [11]. The accuracy of such a solution critically depends upon physical parameters of the robot, such as the density of its rigid elements and the elasticity of the tensile elements. While a manual process can be followed to tune a simulation to match the behavior of a real prototype [12], it is highly desirable to conduct this calibration using as few observed trajectories as possible.

In this work, trajectories generated by a simulation manually tuned to a prototypical robotic platform are used to identify the parameters of a physics engine for tensegrity modeling. Given the high-dimensionality of the parameter space, this is a challenging problem. This work proposes the mapping of the model identification process to a lower dimensional space of parameters. Methods used for dimensionality reduction include Random Embedding (REMBO) [13] as well as Variational Auto Encoder (VAE) [14].

Furthermore, this work proposes to tie the dimensionality reduction process with the task performance by first learning a simplified dynamics model, then utilizing it to train an auto-encoder in the parameter space. Bayesian optimization is then conducted in the encoded space, avoiding much of the burden of high dimensionality. The proposed method is able to efficiently identify the parameters that produce a simulation that most closely matches the observed ground-truth trajectories of this exciting locomotive platform.

## Ii Foundations and Contributions

Two high-level approaches exist for learning robotic tasks with unknown dynamical models: model-free and model-based ones. Model-free methods search for a policy that best solves the task without explicitly learning the system dynamics [15, 16, 17, 18]. Model-free methods are accredited with the recent success stories of reinforcement learning in video games [19]. For robot learning, a relative entropy policy search has been used [20] to successfully train a robot to play table tennis. The PoWER algorithm [21] is another model-free policy search approach widely used in robotics.

Model-free methods, however, do not easily generalize to unseen regions of the state-action space. To learn an effective policy, features of state-actions in learning and testing should be sampled from distributions that share the same support. This is rather dangerous in robotics, as poor performance in testing could lead to irreversible damage.

Model-based approaches explicitly learn the dynamics of the system and search for an optimal policy using standard simulation, planning, and actuation control loops for the learned parameters. There are many examples of model-based approaches for robotic manipulation [22, 23, 24, 25, 26], some of which have used physics-based simulation to predict the effects of pushing flat objects on a smooth surface [22]. A nonparametric approach was employed for learning the outcome of pushing large objects (furniture) [24]. A Markov Decision Process (MDP) has been applied to model interactions between objects; however, only simulation results on pushing were reported [25]. For general-purpose model-based reinforcement learning, the PILCO algorithm has been proven efficient in utilizing a small amount of data to learn dynamical models and optimal policies [27].

Bayesian Optimization is a popular framework for data-efficient black-box optimization [28]. In robotics, some recent applications include learning controllers for bipedal locomotion [29], gait optimization [30] and transfer policies from simulation to real world [31].

Traditional system identification builds a dynamics model by minimizing prediction error (e.g., using least squares) [32, 33]. There have been attempts to combine parametric rigid body dynamics models with nonparametric model learning for approximating the inverse dynamics [34]. In contrast to such methods, this work uses a physics engine, and concentrates on identifying mechanical properties instead of learning the models from scratch. Recent work performed in simulation only proposed model identification for predicting low dimensional physical parameters, such as either mass or friction [35], before searching for an optimal policy.

This work is based on a model-based approach which utilizes a physics engine and concentrates on identifying only the mechanical properties of the objects instead of recreating the dynamics from scratch. Furthermore, it utilizes Bayesian optimization and identifies a dimensionality reduction process for dealing with high-dimensional model identification challenges efficiently.

## Iii Model Identification

This work proposes an online approach for robots to learn the physical parameters of their dynamics through minimal physical interaction. Because of the high dimensionality of the parameter space of the tensegrity robot, even very efficient methods such as Bayesian optimization (BO) struggle to identify all parameters with sufficient accuracy.

This section introduces the overall framework of the model identification process. Dimensionality reduction methods, which decrease the search space of BO in order to achieve efficient optimization, are then covered in detail in the next section.

For the tensegrity robot, the physical properties of interest correspond to the density, length, radius, stiffness, damping factor, pre-tension, motor radius, motor friction, and motor inertia of the various rigid and tensile elements and actuators which are modeled in the NASA’s Tensegrity Robotics Toolkit (NTRT) [11]. In total, 15 different parameters are considered.

These physical properties are represented as a -dimensional vector , where is the space of all possible values of the physical properties. is discretized with a regular grid resolution. The proposed approach returns a distribution on discretized instead of a single point . This is appropriate due to the fact that model identification is generally an ill-posed problem, where multiple models can explain an observed trajectory with equal accuracy. The objective is to preserve all possible explanations for the purposes of robust planning.

The online model identification algorithm (given in Algorithm 1) takes as input a prior distribution , for time-step , on the discretized space of physical properties . is calculated based on the initial distribution and a sequence of observations . For the Tensegrity robot, is a state vector concatenating the 3D centers of all rigid elements, i.e., the rods in the corresponding Figure 1, and is a vector of motor torques.

The process consists of simulating the effects of the controls on the robot in states under various values of parameters and observing the resulting states , for . The goal is to identify the model parameters that make the outcomes of the simulation as close as possible to the real observed outcome . In other terms, the following black-box optimization problem is solved:

(1) |

wherein and are the observed states of the robot at times and , is the control that applied at time , and , the predicted state at time after simulating control at state using physical parameters .

The proposed approach consists of learning the error function from a sequence of simulations with different parameters . To choose these parameters efficiently in a way that quickly leads to accurate parameter estimation, a belief about the actual error function is maintained. This belief is a probability measure over the space of all functions , and is represented by a Gaussian Process (GP) [36] with mean vector and covariance matrix . The mean and covariance of the GP are learned from data points , where is a vector of physical properties of the object, and is the accumulated distance between actual observed states and states that are obtained from simulation using . High-fidelity simulations are computationally expensive. It is therefore important to minimize the number of simulations, i.e., evaluations of function while searching for the optimal parameters that solve Eq. 1. BO decides the location for next sample by optimizing the acquisition function. In our experiments, the expected improvement (EI) acquisition function [37] is used.

## Iv Dimensionality reduction

### Iv-a Random Embedding for Model Identification

For problems where space of physical properties has a high dimension , the method presented in Algorithm 1 is not practical because the number of elements in discretized is exponential in dimension . This is a common problem in global search methods [13]. In fact, it has been shown that Bayesian optimization techniques do not perform better than a random search when the dimension of the search space is too large (10 dimensions in the experiment in [38]). Therefore, Algorithm 1 cannot be directly used for robotic platforms with a large number of joints and parameters, such as the Tensegrity robot or compliant dexterous hands.

Random embedding is an efficient and effective dimensionality reduction technique [13]. Given a space of parameters with dimension , we generate a random matrix that projects points from to a lower-dimensional space of parameters where . Instead of discretizing , we discretize into a regular grid and map each point to a point in the original high-dimensional space by using , i.e. . One can show [13] that with probability one, where is the error function in Equation 1. Consequently, we run Algorithm 1 using discretized as input instead of . We project back the low-dimensional vectors to original parameter space using when we need to run the physical simulation to get the trajectory under a sampled value of .

However, For a randomly generated matrix and point , the corresponding high-dimensional vector is not guaranteed to belong to , but could instead lie anywhere within . The simulator may consider as invalid if it is outside of as shown in Fig.2. Moreover, simply using rejection sampling does not always work, since in some cases most of the sampled points will be invalid. Random EMbedding Bayesian Optimization (REMBO) [13] addresses this issue simply by projecting the point outside to the nearest boundary point of .

### Iv-B Variational Auto Encoder for Model Identification

An autoencoder is a neural network that learns to reconstruct the input by going through a latent space, which is in a lower dimensional space than the original input space[39]. It has shown to be very useful in unsupervised learning of low dimensional representations. A variational autoencoder (VAE) adds an additional constraint that the latent space follows a prior distribution, usually assumed to be Gaussian [14]. This additional constraint makes the model more useful as a generative model, as it also learns to generate output from the prior distribution in addition to reconstruction.

We adapt the VAE and combine it with the Bayesian optimization process, as shown in Fig. 3. Firstly, the VAE is trained with randomly sampled physical parameter data to learn a low dimension embedding . Once the VAE is optimized, the decoder component is used to project the low dimensional back to a value in the original physical parameter space. Thus, the Bayesian optimization process as detailed in Algorithm 1 can be done efficiently in the low dimensional space. The decoder can be regarded as a learned non-linear version of the projection matrix in REMBO.

### Iv-C Auto-Encoder with Learned Dynamics

The use of VAE for reconstructing parameters from a low dimensional space has some limitations. Specifically, we are more interested in the accuracy of the predicted trajectory than in identifying the true underlying physical parameters. Mechanical models of motion can tie together several parameters of a model. Thus, connecting the dimensionality reduction process directly with the task performance may further improve the performance when using the identified model on the task. This idea is similar to learning a locally linear dynamics model while aiming to maximize the performance of the controller [40].

To provide intuition, we begin with an illustrative toy example of pushing a point-mass along a single dimension. We then show how the same approach can be applied to a much more complex system such as the Tensegrity robot.

#### Iv-C1 Toy example: Pushing

Consider a cube of mass resting on a surface, as in Fig. 5, which can be represented by a point that can only move along one axis. Assuming uniform and constant coefficient of kinetic friction between the cube and its resting surface, an impulse is applied to the cube to cause displacement across some distance . Applying Newton’s Laws and Kinematic Equations, we have the following equations:

(2) | ||||

(3) |

where is the initial velocity of the cube after the impulse and is the duration of the cube’s movement. Solving the above equations, we have:

(4) |

We use this equation solely to generate training and testing data, in the same manner as a black-box simulator. This parallels later use of a physics simulator for the Tensegrity robot without direct exposure of the differential equations of motion. Our goal is to identify and , given only the initial impulse and the displacement , without mathematical analysis of the system dynamics equations.

This problem, however, is ill-posed due to the fact that different values of and can result in the same for a given value of . For example, assuming , both and will result in . In other words, as long as the value of is uniquely identified, so is the displacement . Thus, if the task is to predict , it is not necessary to individually identify and ; the scalar value can still uniquely determine the system. The goal then is to identify this one-dimensional representation automatically.

Firstly, we train a dynamics network to predict the displacement given inputs , , and , as shown in Fig. 4a. In this case and . The input state is omitted as we assume the point is always at the origin before being pushed. Secondly, we use the resulting dynamics network to train the auto-encoder to reconstruct and , as shown in Fig. 4b. During this step, the weights in the dynamics network are fixed. The encoder module is designed to receive input , and output one-dimensional , which is influenced by the previous observation. A unique aspect of the auto-encoder is that, instead of using reconstruction error of and as the loss function, it passes the reconstructed parameters to the dynamics network and uses the displacement error. The goal of the auto-encoder is to reconstruct parameters resulting in similar dynamics, as predicting dynamics is the primary concern.

#### Iv-C2 Tensegrity Robot

This procedure is next applied to the much more complex Tensegrity robot system. Here, we assume the existence of a low dimensional representation of both the physical parameter space and the state-action space that, once identified, can determine the system dynamics similar to in Sec. IV-C1.

One challenge in adapting the procedure is that the Tensegrity robot is inherently a high-dimensional, highly-dynamic system which makes learning a dynamics model extremely difficult. Instead, a simplified dynamics model can be learned where instead of using the full state which is 126 dimensional, only the height of the center of the mass of the robot is used as the state.^{1}^{1}1The selection of the state representation is not a focus of the paper. The search for optimal state representation is left for future work. In experiments, using the full state as input state and the height of the center of the mass as output results in better accuracy than using only the height of the center of the mass as both input and output.

Thus, the simplified dynamics model takes parameters, full state, and action as input and predict the height of the center of the mass of the next time step, as shown in Fig. 4a. In this case, is the 15-dimensional parameter, is the 126-dimensional full state, is the 24-dimensional action and is the 1-dimensional height of the center of the mass of the robot. Using the simplified dynamics model, we train an auto-encoder on top of it to learn the low dimensional presentation of the parameters.

Similar to the 1D pushing example, the auto-encoder in Fig. 4b is trained using loss function of the error between the new state simulated using original parameter and the new state predicted by the dynamics model using the reconstructed parameter . Afterward, Bayesian Optimization is performed using the decoder, as shown in Fig. 4c.

## V Experimental Results

### V-a Toy example: 1-D Point Pushing

In this toy example, we use equation 4 as the ”simulator” to generate data. 20,000 training data points are generated by randomly sampling , , and . 2,000 additional data points are generated for testing. As a proof of concept, we only compare model identification using Bayesian optimization in the original 2D parameter space and in the 1D latent space in the auto-encoder.

To train the dynamics network and the auto-encoder, The dynamics network has three hidden layers with 64, 128, and 64 hidden units and ReLU activation functions. The encoder network has two layers with 32, 1 units, and decoder has two layers with 1, 32 units. Both the encoder and the decoder only have ReLU on the first hidden layer.

The result is shown in Fig. 6. After training, the decoder is able to reconstruct and which results in close to zero error in the final displacement. Comparing to optimizing in the original 2D parameter space, using the decoder to optimize in 1D space is both more efficient and achieves lower error.

### V-B Tensegrity Robot

Setup: This experiment aims to identify the 15 parameters of the model of the Tensegrity SuperBall robot in NASA’s Tensegrity Robotics Toolkit [11]. The complex dynamics and high dimensionality of the robot make this problem very hard. Fig. 7 shows an example of the different results of applying the same control to the robot with 1% difference in the rod length (one of the 15 parameters). In absence of access to the real robot, the default values of the T6 model in NTRT are used as ground-truth.

The applied control law conducts an optimization procedure on the system’s geometric configuration alone, without accounting for dynamics [41]. Under the assumption that the base triangle remains in full contact with the ground, this law commands a change in cable lengths that correspond to a desired shift in the system’s center-of-mass. By displacing this value relative to the supporting base triangle, the system can be made dynamically unstable, causing a forward flop.

By using the controller above, 1200 trajectories are generated as training data by sampling the 15 parameters within the range of the ground truth, using the NTRT simulator. The assumption is that such error can appear during the robot modeling process and the proposed approach should be able to minimize such error. Examples of the trajectories can be found on https://sites.google.com/view/tensegrity/.

Random search is compared against as a baseline, where random values of the parameters are selected within the range. Nevertheless, it is well-known that Bayesian optimization in high dimensions is difficult due to the exponential growth of the search space. To deal with this issue, the two dimensionality reduction methods, REMBO and VAE are used to reduce the dimensionality of the parameter space from 15 to 5^{2}^{2}2The selection of the optimal low dimension is left for future work..

The encoder and decoder of the VAE used in the experiment are both two-layer neural networks. The input dimension of the encoder and the output dimension of the decoder is 15, which is the dimension of the parameter space. The latent space is 5 dimensional. Between them is one layer of 400 dimensions. This dimension is chosen through cross-validation by balancing accuracy and network complexity. The prior distribution of the latent space in the VAE is assumed to be . Based on the three-sigma rule, when sampling between , this interval should cover of the latent space when the VAE is optimized. For REMBO, each time a random projection matrix is generated to project the parameters into .

The dynamics network is a four-layer neural network with dimensions 128, 64, 32, and 1. The encoder and decoder with the dynamics network are also two-layer neural networks but much narrower than the VAE with only 10 and 5 dimensions for each layer.

Results: Fig. 8 shows the average error between the trajectories using the model parameters identified by different methods and the trajectories generated from the ground-truth simulator. When optimizing in the original 15-dim. space, as a data-efficient global optimization method, Bayesian optimization outperformed random search. Further improvements are achieved by dimensionality reduction, making the search more efficient. Bayesian optimization with the autoencoder with the learned dynamics network (BO with AE and Dynamics Net) in the 5-dimensional space achieves the lowest trajectory error, outperforming the method using REMBO or VAE. This shows that a learned better latent embedding enables more efficient parameter search in the Bayesian optimization process.

Table I provides the errors for each of the final identified parameter. Interestingly, although achieved lowest trajectory error, BO with AE and Dynamics Net did not identify all parameters with lowest error. Specifically, it turns out that parameters like rod_length, rod_space, rod_length_mp, motor_radius, motor_friction, and motor_inertia are not actually used in the current model of the SuperBall simulation. Thus even methods like VAE may be able to get lower reconstructing error on the parameters themselves, BO with AE and Dynamics Net is able to achieve lower trajectory error as it ties the model identification process with the dynamics. Additionally, some parameters may have a stronger influence on the robot dynamics. An intelligent way to identify these parameters would be helpful to reduce the dimensionality of the parameter space and could be more informative than random embeddings. This will be a direction for future work.

Random Search in 15-D | BO in 15-D | REMBO in 5-D | BO with VAE in 5-D | BO with AE and Dynamics Net | |
---|---|---|---|---|---|

density | 5.012.86 | 1.851.88 | 1.30 | ||

radius | 2.491.94 | 1.86 1.84 | 1.43 | 0.30 | |

density_mp | 5.402.96 | 1.89 1.86 | 2.38 | 1.00 | |

radius_mp | 4.782.78 | 1.94 1.94 | 2.00 | 0.69 | |

stiffnessActive | 4.492.68 | 1.84 1.90 | 1.680.46 | 1.71 | |

damping | 4.622.75 | 1.81 1.89 | 2.020.44 | 2.26 | |

rod_length | 5.052.75 | 1.90 1.88 | 2.040.31 | 6.250.59 | |

rod_space | 4.96 2.81 | 1.88 1.84 | 1.680.36 | 5.20 | |

rod_length_mp | 4.89 2.81 | 1.88 1.96 | 1.700.58 | 4.06 | |

pretension | 5.10 2.83 | 1.93 1.89 | 1.580.50 | 1.38 | |

maxTens | 4.99 2.87 | 1.86 1.83 | 1.850.42 | 5.48 | |

targetVelocity | 4.85 2.62 | 1.84 1.90 | 2.060.62 | 0.49 | |

motor_radius | 5.11 2.90 | 1.90 1.91 | 1.790.66 | 4.9 | |

motor_friction | 5.10 2.71 | 1.89 1.82 | 2.190.27 | 0.65 | |

motor_inertia | 4.78 2.80 | 1.83 1.88 | 2.000.45 | 1.86 |

## Vi Conclusion

This work proposes an information and data efficient framework for identifying physical parameters critical for robotic tasks, such as compliant robot locomotion. The framework aims to minimize the error between trajectories observed in experiments and those generated by a physics engine. To solve high-dimensional challenges, this work integrates Bayesian optimization with a projection to a lower-dimensional space through random embedding or learning a latent embedding utilizing auto encoder. The evaluation of the proposed method against alternatives is favorable both in terms of identifying parameters more efficiently, as well as resulting in more accurate locomotion trajectories.

An interesting extension of this work would involve the identification of controls during the learning process that help in quickly minimizing the error. This can be a robust control process, which takes advantage of Bayesian Optimization’s output in terms of a belief distribution for the identified parameters, so as to minimize entropy and maximize the safety of the experimentation process. Furthermore, it is interesting to compare the generality of the learned models and resulting control schemes that utilize them against completely model-free and end-to-end approaches for reinforcement learning and control.

## References

- [1] T. Erez, Y. Tassa, and E. Todorov, “Simulation tools for model-based robotics: Comparison of bullet, havok, mujoco, ODE and physx,” in IEEE International Conference on Robotics and Automation, ICRA, 2015, pp. 4397–4404.
- [2] “Bullet physics engine,” [Online]. Available: www.bulletphysics.org.
- [3] “MuJoCo physics engine,” [Online]. Available: www.mujoco.org.
- [4] “DART physics egnine,” [Online]. Available: http://dartsim.github.io.
- [5] “PhysX physics engine,” [Online]. Available: www.geforce.com/hardware/technology/physx.
- [6] “Havok physics engine,” [Online]. Available: www.havok.com.
- [7] A. P. Sabelhaus, J. Bruce, K. Caluwaerts, P. Manovi, R. F. Firoozi, S. Dobi, A. M. Agogino, and V. SunSpiral, “System design and locomotion of superball, an untethered tensegrity robot,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 2867–2873.
- [8] J. Friesen, A. Pogue, T. Bewley, M. de Oliveira, R. Skelton, and V. Sunspiral, “Ductt: A tensegrity robot for exploring duct systems,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, 2014, pp. 4222–4228.
- [9] K. Caluwaerts, J. Despraz, A. Iscen, A. Sabelhaus, J. Bruce, B. Schrauwen, and V. SunSpiral, “Design and control of compliant tensegrity robots through simulation and hardware validation,” Journal of The Royal Society Interface, vol. 11, no. 98, 2014.
- [10] X. Geng, M. Zhang, J. Bruce, K. Caluwaerts, M. Vespignani, V. SunSpiral, P. Abbeel, and S. Levine, “Deep reinforcement learning for tensegrity robot locomotion,” CoRR, vol. abs/1609.09049, 2016.
- [11] NTRT, “NASA tensegrity robotics toolkit (NTRT),” https://ti.arc.nasa.gov/tech/asr/intelligent-robotics/tensegrity/NTRT/.
- [12] B. T. Mirletz, I.-W. Park, R. D. Quinn, and V. SunSpiral, “Towards bridging the reality gap between tensegrity simulation and robotic hardware,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015.
- [13] Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Feitas, “Bayesian optimization in a billion dimensions via random embeddings,” Journal of Artificial Intelligence Research, vol. 55, pp. 361–387, 2016.
- [14] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in ICLR, 2014.
- [15] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed. Cambridge, MA, USA: MIT Press, 1998.
- [16] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, 1st ed. Athena Scientific, 1996.
- [17] J. Kober, J. A. D. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” International Journal of Robotics Research, July 2013.
- [18] S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics,” in Advances in Neural Information Processing Systems (NIPS), 2014.
- [19] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 02 2015. [Online]. Available: http://dx.doi.org/10.1038/nature14236
- [20] J. Peters, K. Mülling, and Y. Altün, “Relative entropy policy search,” in Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010), 2010, pp. 1607–1612.
- [21] J. Kober and J. R. Peters, “Policy search for motor primitives in robotics,” in Advances in neural information processing systems, 2009, pp. 849–856.
- [22] M. Dogar, K. Hsiao, M. Ciocarlie, and S. Srinivasa, “Physics-Based Grasp Planning Through Clutter,” in Robotics: Science and Systems VIII, July 2012.
- [23] K. M. Lynch and M. T. Mason, “Stable pushing: Mechanics, control- lability, and planning,” IJRR, vol. 18, 1996.
- [24] T. MeriÃ§li, M. Veloso, and H. Akin, “Push-manipulation of Complex Passive Mobile Objects Using Experimentally Acquired Motion Models,” Autonomous Robots, pp. 1–13, 2014.
- [25] J. Scholz, M. Levihn, C. L. Isbell, and D. Wingate, “A Physics-Based Model Prior for Object-Oriented MDPs,” in Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.
- [26] J. Zhou, R. Paolini, J. A. Bagnell, and M. T. Mason, “A convex polynomial force-motion model for planar sliding: Identification and application,” in 2016 IEEE International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, May 16-21, 2016, 2016, pp. 372–377.
- [27] M. Deisenroth, C. Rasmussen, and D. Fox, “Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning,” in Robotics: Science and Systems (RSS), 2011.
- [28] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, “Taking the human out of the loop: A review of bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2016.
- [29] R. Antonova, A. Rai, and C. G. Atkeson, “Sample efficient optimization for learning controllers for bipedal locomotion,” in Humanoid Robots (Humanoids), 2016 IEEE-RAS 16th International Conference on. IEEE, 2016, pp. 22–28.
- [30] R. Calandra, A. Seyfarth, J. Peters, and M. P. Deisenroth, “Bayesian optimization for learning gaits under uncertainty,” Annals of Mathematics and Artificial Intelligence (AMAI), vol. 76, no. 1, pp. 5–23, 2016.
- [31] A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause, S. Schaal, and S. Trimpe, “Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with bayesian optimization,” in 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, 2017, pp. 1557–1563. [Online]. Available: https://doi.org/10.1109/ICRA.2017.7989186
- [32] J. Swevers, C. Ganseman, D. B. Tukel, J. De Schutter, and H. Van Brussel, “Optimal robot excitation and identification,” IEEE transactions on robotics and automation, vol. 13, no. 5, pp. 730–740, 1997.
- [33] L. Ljung, Ed., System Identification (2Nd Ed.): Theory for the User. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1999.
- [34] D. Nguyen-Tuong and J. Peters, “Using model knowledge for learning inverse dynamics,” in ICRA. IEEE, 2010, pp. 2677–2682.
- [35] W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” in Proceedings of Robotics: Science and Systems, July 2017.
- [36] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. The MIT Press, 2005.
- [37] J. Močkus, “On bayesian methods for seeking the extremum,” in Optimization Techniques IFIP Technical Conference. Springer, 1975, pp. 400–404.
- [38] M. Ahmed, B. Shahriari, and M. Schmidt, “Do we need ”harmless” bayesian optimization and ”first-order” bayesian optimization?” in NIPS BayesOPT Workshop, 2016.
- [39] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, no. Dec, pp. 3371–3408, 2010.
- [40] S. Bansal, R. Calandra, T. Xiao, S. Levine, and C. J. Tomlin, “Goal-driven dynamics learning via bayesian optimization,” in IEEE CDC, 2017.
- [41] Z. Littlefield, D. Surovik, W. Wang, and K. E. Bekris, “From quasi-static to kinodynamic planning for spherical tensegrity locomotion,” in International Symosium on Robotics Research (ISRR), Puerto Varas, Chile, 12/2017 2017.