Contextual Reinforcement Learning ofVisuo-tactile Multi-fingered Grasping Policies

Contextual Reinforcement Learning of
Visuo-tactile Multi-fingered Grasping Policies

Visak Kumar, Tucker Hermans, Dieter Fox, Stan Birchfield and Jonathan Tremblay
School of Interactive Computing at Georgia Institute of Technology, visak3@gatech.edu
Robotics Center and the School of Computing at University of Utah, thermans@cs.utah.edu
NVIDIA, {dieterf, sbirchfield, jtremblay}@nvidia.com
Work performed during at NVIDIA
Abstract

Using simulation to train robot manipulation policies holds the promise of an almost unlimited amount of training data, generated safely out of harm’s way. One of the key challenges of using simulation, to date, has been to bridge the reality gap, so that policies trained in simulation can be deployed in the real world. We explore the reality gap in the context of learning a contextual policy for multi-fingered robotic grasping. We propose a Grasping Objects Approach for Tactile (GOAT) robotic hands, learning to overcome the reality gap problem. In our approach we use human hand motion demonstration to initialize and reduce the search space for learning. We contextualize our policy with the bounding cuboid dimensions of the object of interest, which allows the policy to work on a more flexible representation than directly using an image or point cloud. Leveraging fingertip touch sensors in the hand allows the policy to overcome the reduction in geometric information introduced by the coarse bounding box, as well as pose estimation uncertainty. We show our learned policy successfully runs on a real robot without any fine tuning, thus bridging the reality gap.

\fxsetup

status=draft, theme=color \definecolorfxtargetrgb0.8000,0.0000,0.0000 \definecolorfxnotergb0.0000,0.0000,0.8000 \definecolorMyBluergb0,0.2,0.8

I Introduction

Enabling robots to autonomously grasp object of varying shape and size with multi-fingered hands stands as a fundamental challenge necessary to produce more general manipulation skills such as pick-and-place tasks, human handover, and dexterous tool use. Classical solutions to this problem take a model-based planning and control approach. A typical pipeline estimates the object pose, given either a 3D point cloud or mesh of the object, then plans a set of contact locations and hand configuration to define the grasp, and finally generates a motion plan to reach and grasp the object. Such systems are sensitive to perception and calibration errors and often require significant computational time to plan and execute [6]. Such issues might cause the system to misbehave and fail to grasp the object.

In this work we propose to overcome these constraints by learning a policy to grasp objects of varying geometry and scale with a multi-finger gripper using deep reinforcement learning (RL). A few important challenges arise in formulating the multi-fingered grasping problem as an RL problem. First, how to cope with the relatively high dimension of the multi-fingered hand’s configuration space in order to effectively explore the space of possible grasping policies? Second, how should the learner represent the object to be grasped in a way that can effectively generalize across objects of varying shape, while still being succinct enough to train efficiently? Third, how can we learn such a policy purely in simulation with no need to fine tune the policy for use in the physical world?

In order to efficiently search over the high-dimensional space of grasping policies, we leverage recent advancements in camera-based human hand pose estimation [10] and imitation learning [26] to provide human grasping demonstrations from an RGB camera. We use these grasping demonstrations as a component in our reward function, providing a prior for preferred grasping trajectories to the learner in simulation.

We address the problems of object representation and sim-to-real transfer by proposing a bounding-box based object representation. We extract the location of the 8 vertices of the cuboid enveloping the object to provide the object’s pose, general shape, and size as a context variable to the policy. Using these keypoints explicitly as a context variable and training over a variable set of object shapes enables our policy to adapt to different block-shaped objects upon deployment without the need for further training.

However, this does not enable the robot to robustly compensate for object geometries, such as cylinders or cones, not tightly captured by bounding box. As such we additionally make use of tactile sensing to provide contact information as part of the robot’s state. This enables the policy to learn that making–and maintaining–contact is necessary for grasping. This has the further benefit of aiding in bridging the sim-to-real gap, where tactile sensors on the physical robot compensate not only for object shape mismatch but also localization and calibration error from visual sensing. We deploy our final learned policy onto a real world system where visual input to the policy comes from an RGB pose estimator [36] and the contact information is retrieved from BioTac tactile sensors.

Our approach differs from many recent RL for sim-to-real tasks which attempt to overcome poor parameterization of the system dynamics or object and environment appearance by learning policies robust to high variation in visual sensing [34, 35, 29]. We take an alternative approach of abstracting away the uncertain object appearance and geometry into a succinct set of geometric features. To account for the coarse approximation these features induce, we leverage tactile sensors in the robot’s fingertips to observe contacts explicitly as part of our state. This differs also from standard approaches to grasp learning where richer visual features are leveraged to understand the object geometry at a relatively high resolution; where these features are either learned [37, 19, 21] or hand-crafted [25, 14].

This work makes the following contributions:

  • [leftmargin=10pt,labelindent=10pt,topsep=0pt]

  • We present a system that leverages human demonstrations of grasping, reinforcement learning and sim-to-real to accomplish a multi-finger grasp task on a real-world system. We demonstrate that our system generalizes to unseen shapes in the real-world without any fine tuning.

  • We introduce a novel approach to fusing visual and tactile information in learned grasp policies, using 3D keypoints for context variables encoding object shape and binary contact signals within our object state. This allows our policy to reason about the object size and orientation implicitly creating a versatile policy that can adapt locally by leveraging the sensed contact information.

We provide empirical results demonstrating that benefits of our various contributions. We show that our keypoint representation coupled with tactile feedback can successfully grasp objects of varying shape not seen in training. We additionally quantify the benefits of using human hand grasping demonstration motions in learning a multi-fingered grasping policy. We show that our learned policy achieves comparable results to a hand-engineered policy on a real-word, physical robot without any fine tuning. We further demonstrate the ability to grasp with varying grasp styles simply by changing the human demonstrations provided during training. We will release our dataset of captured human hand motions used to teach our robot to grasp with style upon publication.

Ii Method

We now present the details of our approach to learning grasping policies for multi-fingered hands. We begin with a brief background of contextual policy search for reinforcement learning. We then give the specifics of how we encode the grasping problem into this contextual policy search framework. Following this we discuss how we learn policies informed from demonstration using RL. We conclude the section by describing how the policy is deployed on the physical robot.

Ii-a Background: Contextual Policy Search

We formulate the task of multi-finger grasping as a contextual policy search problem [13]. This differs from the classic Markov Decision Process (MDP) [32] in that the agent (robot) observes some context variable at the beginning of the episode which parameterizes the reward function ; where and define the state action spaces respectively. The objective of the contextual policy search problem remains the same as standard reinforcement learning, namely to find a policy , that maximizes the expected accumulated reward, conditioned on the observed context :

where , , and . The remaining components of the MDP also exist in our problem formulation, specifically is the transition function, is the initial state distribution and is the discount factor. We additionally make explicit the policy parameters which we seek to learn through roll-outs of the system.

Fig. 1: Visualization of the context variables keypoints for two objects. Note the pose estimation error for both objects and the mismatch in object shape for the soup can.

Ii-B Grasping as Contextual Policy Search

We define the context variables, , for our multi-fingered grasping problem as the keypoints of a bounding box surrounding the object of interest at its pose at the beginning of the episode (see Fig. 1). This defines a low dimensional feature representation to encode the object geometry, there are several ways to infer these features at runtime such as using pose estimation of known objects [36]. By providing this information of the object’s pose only at the beginning of the trial, we remove the need to explicitly track the object during the execution. We believe this to be an advantage as stably tracking the object, even when a known model exists, remains challenging, because of the inevitable (partial) occlusion of the object caused by the hand interacting with it. Since the initial estimate may be inaccurate and the object will likely move during execution, we provide binary contact information for each robot fingertip as part of the robot’s state space.

In simulation we can directly observe contacts using the model of the robot and object. On the physical system we estimate contact using the pressure sensors of the BioTac sensors embedded in each fingertip. In addition to localizing the object, we hypothesize that contact information provides an extremely useful signal in learning stable grasps that can generalize across different objects geometries. The state space includes the Cartesian palm location denoted by and orientation all defined in the robot base frame, joint positions and velocities of the 16 DOF four-fingered hand represented as () and () and contact vector which contains binary contact information about the four fingertips (). This results in final state space of dimension . The context variable is dimensional, it contains the Cartesian () locations of each corner of a cuboid in the robot base frame. We define the robot action space as the desired Cartesian hand pose and the desired joint positions of the fingers. As such our action space has 22 dimensions.

Ii-C Reward Function

The task of reaching and grasping a wide range of objects with a multi-fingered hand is not trivial and as such we introduce reward terms to overcome several different challenges. We present each reward term in turn below; we define the final reward as the sum of these terms with weights selected such that each component has relatively equal scale.

Hand location with respect to the object. The first reward component encourages moving the palm of the hand close enough to the object to enable contact. Assuming a valid object pose estimate, keypoint locations of the object are computed in the robot base frame. We use the average of the 4 keypoint locations on the top surface of the object, denoted , to compute the following reward:

(1)

Hand motion. The second reward component serves to focus the policy search on likely to work motions in order to overcome the relatively high-dimensional configuration space of multi-fingered hands (16 DOF for our Allegro hand). To tackle this issue, we use human demonstrations, captured from a hand pose estimator [10], as useful prior information for policy learning. This, however, introduces another concern as the kinematic structure of the human hand is different from the robot’s. Since we know the values of the kinematic link lengths of the Allegro hand and the human hand from which demonstrations are generated, we perform a simple re-scaling of the data to fit the robot hand dimensions. In addition, we only reward the policy when the robot’s fingertip locations track the fingertip locations obtained from the human hand pose estimator . The purpose of the demonstrations is not to provide an accurate trajectory for the fingers to follow, but to reduce the search space of the policy.

(2)

Task success: Once the robot grasps the object, we reward the policy if it can successfully lift the object to a position, , above its starting location, :

(3)

Contact. Our reward function also encourages the robot to make fingertip contact with the object. We hypothesize that contact information greatly improves the ability to learn a stable grasping policy across objects of varying size and geometry. Here we define variable to have value 1 if fingertip is in contact and 0 otherwise:

(4)

The goal of our control policy is to generalize to objects of different geometry. The structure of our reward function with multiple terms reflect this goal, e.g., touch sensing and cuboid keypoints. In our experiments, we found that a binary/sparse reward for a task involving a multi-fingered robot to reach and grasp an object is not feasible, the reward is too sparse to learn anything. We assume in our experimental set up that the hand starting location is near the object of interest.

Ii-D Training Details

We use the proximal policy optimization (PPO) [31] algorithm to learn the policy. We represent the policy as a multi-layered perceptron (MLP) with 2 hidden layers containing 128 neurons each. During training, at the beginning of each rollout we generate a new cuboid object with dimensions uniformly sampled from a pre-specified range, we estimate the keypoints of the object—sampled noise is added to the keypoint locations to simulate sensor noise present in the physical system—and pass them as context to the policy. The keypoint values then remain the same throughout that rollout. Since we wish to deploy the policy learned in simulation on a real robot, we apply domain randomization on the objects to account for the discrepancy between the simulator and physical world. In addition to keypoint location noise, we add uniform noise to the object mass, friction coefficients between the fingers and object, PD gains of the robot, and damping coefficients of the robot joints. The range of the uniform distribution was manually specified based on initial results on the robot. Our method takes about hrs of training time with four threads on an i7 collecting samples across iterations. These numbers are consistent across four different seeds.

Ii-E Keypoint Parameter Adaptation for Novel Geometries

A primary goal of our approach is to learn a policy that generalizes to objects of non-cuboid shapes not seen during training. In essence, a new object implies a new context for the policy. While we can use the bounding box of a novel object to extract the keypoints defining the context variables, we find that this does not work well for objects with shape that significantly differs from the bounding box. As such, we propose optimizing over the context variables in order to find values which will enable the pre-trained policy to succeed. Importantly, we remove the restriction that the keypoints define a recta-linear box allowing them to take any point in 3D.

Given a policy trained in simulation over a uniform distribution of contexts, when presented with a new object we fix the policy network and search over the context variables using CMA-ES. We initialize the keypoints using the object bounding box. We evaluate the objective function by running a rollout in simulation and provide the height reached by the object once lifted as a continuous reward for the planner to maximize. In each iteration, there are about 5 rollouts of the policy, which means that about 65-70 trajectories on the new object to fine tune the policy. This whole process takes about 20 min of compute time. We examine the benefit of this adaptation in Section III-B.

Iii Experiments

We evaluate our method both in simulation and on the real robot. In these experiments we answer the following overarching questions. First, how important is hand demonstration data to learn an effective policy? Second, how does including contact information change the effectiveness of the grasp? Third, how sensitive is the policy learning to the object feature representation? And fourth, can our policy successfully transfer to a real robot without adaptation?

As such this section is organized as follows: We first discuss the implications of our state representation and reward functions by comparing GOAT to different baselines. Then we quantify how parametrization search over our keypoint representation can improve the learned policy’s performance. In addition to these experiments, we also show that using our method we can grasp objects with 6 different styles and evaluate the effectiveness of the different grasp styles. We conclude this section by showing real-world experiments on the robot.

Iii-a Comparison Methods

In order to evaluate the proposed method, we compare it to three baselines:

Baseline 1. The policy does not use any contact information; we hypothesize that local contact information is important in adapting to non-cuboid shapes and for identifying stable grasps once the robot hand makes contact with the object.

Baseline 2. We include contact information, however, we do not reward the policy for tracking the human hand demonstrations—i.e., we set the weight in Eq. (2) to 0. We would like to test the importance of demonstration data in learning in this high-dimensional action space, which, combined with sparse nature of the reward, makes it a difficult reinforcement learning problem.

Baseline 3. We change the context variable to a single 6-DoF pose vector of the object’s center. This tests our hypothesis that using keypoint information as the context variable provides a coarse representation of the object geometry enabling the policy to adapt to objects of varying shape.

To compare the effectiveness of our method to that of the policies trained using the baseline methods we perform two different tests. First, we generate 100 random objects unseen by the policies during training and test grasps for each object from 5 random poses on the table. We compare the number of successful grasps out of these 500 resulting trials.

Figure 2 illustrates the number of successful grasps achieved by each method on different object types. Each bar represent the average of four different trained seed on a specific category. Object Database refers to the open source dataset of 3d objects grasp database [12] where we randomly used 20 objects. We can clearly see that our proposed method outperforms all the baselines for the different object types. Interestingly the baselines all perform somewhat similarly and thus suggesting that our method provides the most detailed information for accomplishing this task. We also show the learning curves for average reward achieved by each method during training in Figure 3 for the cuboid category. Learning curve results represent the average and variance over four different seeds. It is worth noting that the weighting of the reward function remains the same across all experiments.

Fig. 2: Grasp success rate of trained policies in simulation. The experiment was done for four different seeds.
Fig. 3: Average reward achieved during learning for the different methods averaged over four initial seeds for the cuboid category.
Fig. 4: Parameter adaptation on the keypoints to improve the performance of the policy. For both cuboid and non-cuboid shapes, parameter adaptation improves policy performance.
Fig. 5: CMA-ES optimization loss curve. Convergence is achieved after 13 iterations.
Fig. 6: Representative grasps generated by our policy executed on the physical robot.

Iii-B Parameter Adaptation experiments

In the previous experiment with unseen objects, we tested the trained policy with context parameters selected from the object bounding box provided by our simulator. We ran experiments to investigate the effect of keypoint adaptation approach presented in Section II-E. Figure 4 shows the improvement in grasp success rate after parameter adaptation for both cuboid and non-cuboid objects. Figure 5 illustrates how the optimization loss reduces during the parameter adaption process. It takes on average iterations of the CMA-ES to identify keypoint inputs that enable the policy to pick up novel objects.

Fig. 7: Success rates of different grasp styles. M, R and I refer to middle, ring and index finger respectively.

Iii-C Grasping with Style

To leverage the hand pose data made available by the hand pose estimator we learn different grasping styles. For the purpose of this experiment we define a grasping style as simple motions that the robot has to follow, e.g., only using the thumb and the index finger for grasping. Figure 7 illustrates the grasp success rate of each of the different styles. As expected, two fingered grasps are not as successful as with three or four fingered grasps. The objects used for this test were a mixture of 50% cuboid and 50% non-cuboid shapes.

Iii-D Real Robot

The ultimate test for GOAT is whether the learned policy can be deployed onto a real world robot. We use an Allegro robotic hand with 4 BioTac sensors mounted on a 7-DoF Kuka LBR iiwa 7 R800 arm. We use the pressure sensing on the BioTac to detect contact, and it is quite sensitive. We use DOPE [36] to localize the object and generate its bounding box keypoint locations. We use the 5 objects DOPE can detect from the YCB dataset [4]: cracker box, meat, mustard, soup, and sugar box. Other methods could be used here to fit a bounding box around the object, similar to [20], we could leverage point cloud sensing to fit a bounding box on points above the work surface assuming a non cluttered environment. During the experiment, the object was placed randomly within the robot’s workplace five times with a random in plane orientation between and , where means the object’s axis is aligned with the robot base. For each pose detection we sample a normal distribution with variance of 1 mm or 10 mm to perturb the object location. We consider a successful grasp if the object stays above the work surface for at least 5 seconds.

We compared our method against a handwritten grasping policy, denoted baseline. Our baseline simply moves to a position 6 cm above the estimated center of the object. Once it reaches this location, the hand begins closing its fingers towards the object. Each finger stops moving when it detects contact with the object. Once all fingers have touched the object the hand exerts more force on the object before lifting it up 7 cm.

noise = 0.001 noise = 0.01
objects baseline GOAT baseline GOAT
cracker box 5 5 3 5
meat 5 5 2 2
mustard 5 3 3 3
soup 3 4 0 1
sugar box 5 5 4 4
all 23/25 22/25 12/25 14/25
TABLE I: Experiments showing GOAT performance on a real world against a hand tuned baseline.

Table I depicts our results, it shows that our method performs similarly to the baseline under different noise levels. The soup is quite a challenging object for performing a top grasp, we were surprised to see our method moving its finger in such a way that it was looking for the object and achieving stable grasp with the cylinder even though it was never trained on such physical object. Representative grasps generated by our policy for each object are shown in Figure 6.

Iv Related Work

Robotic grasping is normally approached either through analytical, model-based methods or data driven methods using either supervised or reinforcement learning. The former focuses on constructing grasps that satisfy specific conditions, e.g., gripper configuration, object contact points, force closure, task completion, etc. while modelling the robot universe based on 3D models, partial meshes, and dynamic kinematic models [30]. Whereas the latter, learning-based methods, might learn from annotated datasets, or from the robot interacting with its environment [19, 40]. These learned grasping behaviors tend to generalize better to unseen objects and situations.

Reinforcement Learning (RL) has been gaining prominence for robotic manipulation in recent years; many of these works have focused on learning grasping, but the majority focus on the simpler 2D gripper problem [11, 33, 39, 40, 17, 28, 3, 8]. Andrychowicz et al. have trained a multi-finger robotic hand policy to repose a cube in-hand to match a desired pose [1]. Similar to our work they leverage simulation to train a policy to be deployed in the real world, however they do not focus on grasping, instead assuming the object already rests in the robot’s hand.

The closest previous work to ours by Osa et al. also learns grasping policy for different grasping styles using reinforcement learning [25] initialized by human demonstrations. The grasping style is function of the surface mesh similarity to those seen during training and, as such, wont be able to enforce a specific style a priori.

Another work with similar goals to ours uses supervised learning, coupled with analytical planning, to plan multi-fingered grasps of different styles, i.e., precision and power [21]. They achieve this by explicitly modeling the grasp style as a decision variable in the grasp optimization. Similar to previous robotics work [37, 22, 19, 18, 12], they learn a grasp success predictor from data. Given a grasp configuration they use the gradients from the predictor to refine the proposed grasp until it is predicted to be successful (has high probability). Once the grasp configuration is found, it gets executed by a planner. Our work differs from this as we seek to learn separate grasping policies for each grasp style from a single human hand demonstration without relying on any planning algorithms for grasp execution. Other supervised-learning works have focus on grasping objects using one shot learning to predict contact points [14].

Representation plays a very important role for learning in robotics manipulation. Choosing the right one will allow for completion of learning downstream tasks. Lee et al. proposed a method that learns an initial representation using unsupervised learning methods [16]. Once the representation is learned they leverage the multi-sensing description to learn tasks using RL methods, such as, peg-in-hole insertion. Other work have explored using touching sensing to grasp objects under different assumptions, although very little work has been done on learning from using this sensor [9, 15, 7, 24, 2]. Manuelli et al. also leverage keypoint representation to learn an agnostic representation for a class of objects where a classical controller is written to accomplish a pick-and-place task [23]. Other works have focus on learning the full 6D pose of known objects for robotics pick and place [36, 38]. Similar to [27, 5], our state representation also includes finger contact information to overcome shape and pose uncertainty; however, they rely on hand-tuned, model-based controllers for execution. We believe our approach to be the first to explore using visual keypoints coupled with tactile-feedback in order to learn grasping behaviors with RL.

V Conclusion and Discussion

We have presented a contextual policy search approach to learning policies for grasping unknown objects with multi-fingered hands using bounding box representation and contact sensing. We validate that our approach can train purely in simulation and be successfully deployed in the real world on a physical robot. We introduce the use of bounding box keypoints as a contextual representation for the reward and, in turn, the policy. We show that coupling this keypoint representation with contact sensing in the policy allows the robot to adapt to previously unseen shapes and overcome uncertainty in object pose estimation arising from noisy visual sensing. This allows our method to handle objects with shape deviating greatly from that of a bounding box (e.g. a cone) we can optimize over the context variables to enable greater grasping performance without needing to retrain our learned policy.

Acknowledgments

The authors would like to thank Karl Van Wyk for his amazing help for setting up the the robotics system. We would also like to thank Nathan Ratliff, Rowland O’Flaherty, Ankur Handa, and Clemens Eppner for their technical help with various challenges.

References

  • [1] M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al. (2018) Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177. Cited by: §IV.
  • [2] R. Calandra, A. Owens, D. Jayaraman, J. Lin, W. Yuan, J. Malik, E. H. Adelson, and S. Levine (2018) More than a feeling: learning to grasp and regrasp using vision and touch. IEEE Robotics and Automation Letters 3 (4), pp. 3300–3307. Cited by: §IV.
  • [3] S. Caldera, A. Rassau, and D. Chai (2018) Review of deep learning methods in robotic grasp detection. Multimodal Technologies and Interaction 2 (3), pp. 57. Cited by: §IV.
  • [4] B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar (2015) The YCB object and model set. In IEEE Int. Conf. on Advanced Robotics, pp. 510–517. Cited by: §III-D.
  • [5] Z. Chen, T. Wimböck, M. A. Roa, B. Pleintinger, M. Neves, C. Ott, C. Borst, and N. Y. Lii (2015) An adaptive compliant multi-finger approach-to-grasp strategy for objects with position uncertainties. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4911–4918. Cited by: §IV.
  • [6] H. Dai, A. Majumdar, and R. Tedrake (2015) Synthesis and optimization of force closure grasps via sequential semidefinite programming. In Int. Symp. on Robot. Res., pp. 1–16. Cited by: §I.
  • [7] H. Dang and P. K. Allen (2014) Semantic grasping: planning task-specific stable robotic grasps. Autonomous Robots 37 (3), pp. 301–316. Cited by: §IV.
  • [8] K. Fang, Y. Zhu, A. Garg, V. Mehta, A. Kuryenkoy, L. Fei-Fei, and S. Savarese (2018) Learning task-oriented grasping for tool manipulation with simulated self-supervision. In Robotics Science and Systems, Cited by: §IV.
  • [9] K. Hsiao, S. Chitta, M. Ciocarlie, and E. G. Jones (2010) Contact-reactive grasping of objects with partial shape information. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1228–1235. Cited by: §IV.
  • [10] U. Iqbal, P. Molchanov, T. Breuel Juergen Gall, and J. Kautz (2018) Hand pose estimation via latent 2.5 d heatmap regression. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 118–134. Cited by: §I, §II-C.
  • [11] T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine (2018) Residual reinforcement learning for robot control. arXiv preprint arXiv:1812.03201. Cited by: §IV.
  • [12] D. Kappler, J. Bohg, and S. Schaal (2015) Leveraging big data for grasp planning. In Proc. IEEE Int. Conf. Robot. Autom., pp. 4304–4311. Cited by: §III-A, §IV.
  • [13] J. Kober, E. Oztop, and J. Peters (2011) Reinforcement learning to adjust robot movements to new situations. In International Joint Conference on Artificial Intelligence, pp. 2650–2655. External Links: Document, ISBN 9781577355120, ISSN 10450823 Cited by: §II-A.
  • [14] M. Kopicki, R. Detry, M. Adjigble, R. Stolkin, A. Leonardis, and J. L. Wyatt (2016) One-shot learning and generation of dexterous grasps for novel objects. The International Journal of Robotics Research 35 (8), pp. 959–976. Cited by: §I, §IV.
  • [15] J. Laaksonen, E. Nikandrova, and V. Kyrki (2012) Probabilistic sensor-based grasping. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2019–2026. Cited by: §IV.
  • [16] M. A. Lee, Y. Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg (2018) Making sense of vision and touch: self-supervised learning of multimodal representations for contact-rich tasks. arXiv preprint arXiv:1810.10191. Cited by: §IV.
  • [17] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37 (4-5), pp. 421–436. Cited by: §IV.
  • [18] M. Liu, Z. Pan, K. Xu, K. Ganguly, and D. Manocha (2019) Generating grasp poses for a high-dof gripper using neural networks. arXiv preprint arXiv:1903.00425. Cited by: §IV.
  • [19] Q. Lu, K. Chenna, B. Sundaralingam, and T. Hermans (2017) Planning multi-fingered grasps as probabilistic inference in a learned deep network. In International Symposium on Robotics Research, External Links: Link Cited by: §I, §IV, §IV.
  • [20] Q. Lu and T. Hermans (2019) Modeling Grasp Type Improves Learning-Based Grasp Planning. IEEE Robotics and Automation Letters. Cited by: §III-D.
  • [21] Q. Lu and T. Hermans (2019) Modeling grasp type improves learning-based grasp planning. IEEE Robotics and Automation Letters 4 (2), pp. 784–791. Cited by: §I, §IV.
  • [22] J. Mahler, F. T. Pokorny, B. Hou, M. Roderick, M. Laskey, M. Aubry, K. Kohlhoff, T. Kröger, J. Kuffner, and K. Goldberg (2016) Dex-net 1.0: a cloud-based network of 3d objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In IEEE International Conference on Robotics and Automation (ICRA), pp. 1957–1964. Cited by: §IV.
  • [23] L. Manuelli, W. Gao, P. R. Florence, and R. Tedrake (2019) KPAM: keypoint affordances for category-level robotic manipulation. arXiv preprint arXiv:1903.06684. Cited by: §IV.
  • [24] E. Nikandrova and V. Kyrki (2015) Category-based task specific grasping. Robotics and Autonomous Systems 70, pp. 25–35. Cited by: §IV.
  • [25] T. Osa, J. Peters, and G. Neumann (2018) Hierarchical Reinforcement Learning of Multiple Grasping Strategies with Human Instructions. Advanced Robotics 32 (18), pp. 955–968. External Links: Link Cited by: §I, §IV.
  • [26] X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne (2018) Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (TOG) 37 (4), pp. 143. Cited by: §I.
  • [27] R. Platt Jr, A. H. Fagg, and R. A. Grupen (2010) Null-space grasp control: theory and experiments. IEEE Transactions on Robotics 26 (2), pp. 282–295. Cited by: §IV.
  • [28] D. Quillen, E. Jang, O. Nachum, C. Finn, J. Ibarz, and S. Levine (2018) Deep reinforcement learning for vision-based robotic grasping: a simulated comparative evaluation of off-policy methods. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6284–6291. Cited by: §IV.
  • [29] F. Sadeghi and S. Levine (2017-07) CAD2RL: real single-image flight without a single real image. pp. . External Links: Document Cited by: §I.
  • [30] A. Sahbani, S. El-Khoury, and P. Bidaud (2012) An overview of 3d object grasp synthesis algorithms. Robotics and Autonomous Systems 60 (3), pp. 326–336. Cited by: §IV.
  • [31] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov (2017) Proximal policy optimization algorithms. ArXiv abs/1707.06347. Cited by: §II-D.
  • [32] R. S. Sutton, A. G. Barto, and A. B. Book (1998) Reinforcement Learning : An Introduction. MIT Press. Cited by: §II-A.
  • [33] G. Thomas, M. Chien, A. Tamar, J. A. Ojea, and P. Abbeel (2018) Learning robotic assembly from cad. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. Cited by: §IV.
  • [34] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel (2017) Domain randomization for transferring deep neural networks from simulation to the real world. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. Cited by: §I.
  • [35] J. Tobin, W. Zaremba, and P. Abbeel (2018) Domain randomization and generative models for robotic grasping. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3482–3489. Cited by: §I.
  • [36] J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield (2018) Deep object pose estimation for semantic robotic grasping of household objects. In Conference on Robot Learning (CoRL), External Links: Link Cited by: §I, §II-B, §III-D, §IV.
  • [37] J. Varley, J. Weisz, J. Weiss, and P. Allen (2015) Generating multi-fingered robotic grasps via deep learning. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4415–4420. Cited by: §I, §IV.
  • [38] C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese (2019) DenseFusion: 6d object pose estimation by iterative dense fusion. arXiv preprint arXiv:1901.04780. Cited by: §IV.
  • [39] W. Yu, V. C. Kumar, G. Turk, and C. K. Liu (2019) Sim-to-real transfer for biped locomotion. arXiv preprint arXiv:1903.01390. Cited by: §IV.
  • [40] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser (2018) Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4238–4245. Cited by: §IV, §IV.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
399406
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description