Grasp selection analysis for two-step manipulation tasks
Manipulation tasks are sequential in nature. Grasp selection approaches that take into account the constraints at each task step are critical, since they allow to both (1) Identify grasps that likely require simple arm motions through the whole task and (2) To discard grasps that, although feasible to achieve at earlier steps, might not be executable at later stages due to goal task constraints. In this paper, we study how to use our previously proposed manipulation metric for tasks in which 2 steps are required (pick-and-place and pouring tasks). Even for such simple tasks, it was not clear how to use the results of applying our metric (or any metric for that matter) to rank all the candidate grasps: Should only the start state be considered, or only the goal, or a combination of both? In order to find an answer, we evaluated the (best) grasps selected by our metric under each of these 3 considerations. Our main conclusion was that for tasks in which the goal state is more constrained (pick-and-place), using a combination of the metric measured at the start and goal states renders better performance when compared with choosing any other candidate grasp, whereas in tasks in which the goal constraints are less rigidly defined, the metric measured at the start state should be mainly considered. We present quantitative results in simulation and validate our approach’s practicality with experimental results in our physical robot manipulator, Crichton.
Given a manipulation task, a redundant robot manipulator and a target object, many possible candidate grasps can be used to accomplish the task. Finding a suitable grasp among the infinite set of candidates is a challenging problem that has been addressed frequently in robotics, resulting in an abundance of approaches . Interestingly, the vast majority of these methods have two aspects in common: (1) The metrics used for grasp selection focus on the hand-centric aspect of a manipulation task, such as grasp robustness, and (2) Manipulation is implicitly seen as a single-step task, in which the main goal is to reach an object without further regard to what will be done with it once it is grasped.
In general, even the simplest of manipulation tasks, such as pick-and-place, has 2 or more steps. And while grasp robustness is perhaps the most important aspect to predict the success of a manipulation task, it is not the only factor to consider. For a grasp to be executable through a whole task, feasible arm motions between steps are needed. We argue that a metric that considers both the grasp robustness and arm kinematics is a more useful way to select grasps that will in turn entail arm motions that can be easily planned and executed in the real world.
In our previous work , we proposed our arm-and-grasp metric () and showed its usefulness to select grasps for simple pick-up tasks that in average involved shorter and faster arm motions when compared with those produced by selecting other possible candidate grasps. In this paper we extend the use of to select grasps for tasks with 2 steps, such as pick-and-place and pouring tasks. A challenge in considering 2-step vs 1-step tasks is the fact that the way to use is not straightforward: Should the metric be measured only at the start state, at the goal stage or should it be a combination of measurements at both states? Intuitively, we suspect that a metric that considers the whole task (from start to goal) would be more useful; however, after analyzing the 2 tasks aforementioned, we conclude that considering either start, goal or both mainly depends on how constrained/loose the goal state is. We will show simulation experiments that support this claim in Section IV and Section V. The remainder of this paper is organized as follows: Section II presents a condensed review of existing work. In section III we present a brief summary of the arm-and-grasp metric already presented in . Section IV presents our evaluation of the metric in pick-and-place tasks mesured at the start and goal state as well as an average of both. Section V shows a similar evaluation but in pouring tasks. Finally, our conclusions, including the limitations of our approach are briefly stated in Section VI.
Ii Related Work
In this section we review work concerning grasp selection for manipulation tasks. For a more detailed survey of previous research in the area, the reviews presented in  and  are highly recommended.
Pioneering work on grasp selection was developed by Cutkosky , who observed that humans select grasps in order to satisfy 3 main types of constraints: Hand constraints, object constraints and task-based constraints. As pointed out by Bohg et al. in , there is little work on task-dependent grasping when compared to work focused on the first two types of constraints. Hence, the main goal for most existing planners is to find a grasp such that the robot can reach the target object, without further regard of what will be done once the object is picked.
In current common practice, grasps are generated offline and are then ranked based on their force-closure properties, which theoretically express their robustness and stability. One of the most popular metrics () was proposed by Ferrari and Canny . However, it has been noted by different authors that analytical metrics do not guaranteee a stable grasp when executed in a real robot. This can be explained by the fact that these classical metrics consider assumptions that don’t always hold true in real environments (i.e. static object-hand interaction, Coulomb friction and point contact). On the other hand, studies that consider human heuristics to guide grasp search have shown remarkable results, outperforming classical approaches. In , Balasubramanian observed that when humans kinestetically teach a robot how to grasp objects, they strongly tend to align the robotic hand along one of the object’s principal axis, which later results in more robust grasps. The author termed skewness to the metric measuring the axis deviation. In , Przybylski et al. combine the latter metric with and use it to rank grasps produced with GraspIt!. Berenson et al. proposed a score combining 3 measures: , object clearance and the robot relative position to the object.
All the work aforementioned focus on grasp-centric metrics, whereas we stress the importance of choosing a grasp also taking into account the arm kinematics as to encourage grasps that are easily reachable. In general, the problem of grasp planning is considered isolated from arm planning, although there are a few exceptions: Vahrenkamp et al. proposed Grasp-RRT  in order to perform both grasp and arm planning combined. In a similar vein, Roa et al. also proposed an approach that solve both problems simultaneously . Both approaches focus on reaching tasks. Along the same lines, Berenson et al. proposed the use of Task Space Regions  that allow planning arm movements while also searching grasps. However, the main disadvantages of this approach are that the object needs to be known beforehand, the task regions must be explicitly defined by the user and it does not have a specific way to deal with multi-step tasks (planning occurs one step at a time, with no way to make the goal influence the grasp selection on an earlier step). Finally, a myriad of work exists that analyzes complex sequential manipulation tasks from a learning point of view, mostly in the form of imitation learning. However, most of these approaches focus solely on the arm motions and are naturally dependent on the number and variety of human demonstrations available [21, 12]. For tasks as simple as the ones considered in this paper (pick-and-place and pouring), we expect that successful results can be obtained by plainly selecting the grasp to try first with a sensible metric such as .
Lastly, during recent years, deep learning approaches have flourished in diverse areas of robotics, manipulation being not the exception, with applications showcasing robots capable of picking up objects from a bin , opening doors  and learning to push objects inside a crate . Our work, however, has as a main goal to provide a simple, online grasp selection strategy that do not require any kind of offline training and that can handle novel, simple objects for which several candidate grasps are generated on the fly.
Iii Arm-and-hand metric for grasp selection
In this section we briefly recap the arm-and-hand metric we will use through the rest of this paper (), first describing each of its two component parts ( and ) and then their combination . A detailed description of and the results of applying it to simple pick-up tasks can be found in .
Iii-a Arm Metric ()
When humans perform simple reaching actions, they select a grasp such that their arm is comfortable at the end of the reaching movement. This inherently simple phenomenon, known as the end comfort effect, has been observed in adult humans as well as in other primates .
Our proposed arm-centric metric intends to capture the comfort factor for a given grasp. Formally, for a given grasp applied on an object located at we define our arm metric as the number of collision-free inverse kinematic solutions that allow the robot hand to execute .
For our specific setup, the redundant robot arm presents a standard S-R-S configuration for which a pseudo-analytic solution is available  given as input an end-effector’s goal pose and a free parameter which determines the elbow pose. In the equation above, the initial set of inverse kinematic solutions are calculated by discretizing and evaluating which of them are collision-free.
Iii-B Grasp Metric ()
The arm-centric metric presented above only considers the arm comfort. Consider the scenario in Figure 3, where 3 candidate grasps are depicted for a cylindrical object (a grasp here being parameterized by two elements: (1) The relative rigid transform of the end-effector frame with respect to the object frame, and (2) The finger’s initial joint configurations). Let us assume that these grasps have similar values, hence they are all deemed equally desirable. From human experience, we can all agree that the second grasp is the most likely to be stable since the hand is closer to the center of mass of the object being held. We incorporate this heuristic on the proposed grasp metric.
Our second metric attempts to favor grasps that hold the object near its center of gravity. We propose to quantify this heuristic as the distance between the object’s center of mass and the hand’s approach direction vector. We select this metric because it is easy to calculate, as it is just the distance between a line and a point. This metric is similar to the existing metric , which measures the distance between the center of the contact polygon and the center of mass of the object. We prefer our metric over mainly because our system does not provide finger contact information.
Iii-C Arm-Grasp Metric ()
Now that we have both metrics, we must combine them. A direct way to do this could be using a weighted sum of both. However, both metrics have different units ( is adimensional and has length units), hence adding them is not straightforward. Instead, we propose to calculate using 2 consecutive steps (illustrated in Figure 4), each of which uses one of the metrics at a time: In the first step, is used to divide the grasp set in 4 groups according to their quality, whereas in the second step, is used to further order the grasps within each subgroup. This can be explained in simple terms as:
Calculate the mean and the standard deviation of the arm metric () over all the grasps in .
Divide the grasps in 4 groups, similarly as :
Very good quality:
Within each of the 4 groups, reorder the grasps according to their grasp metric .
The final ordered set of grasps will contain 4 -based ordered sets (very good, good, fair and bad), inside each of which grasps are ordered according to .
It is worth noticing that, rather than producing a numeric value, in fact outputs an ordering of the grasps in . In  we showed that by selecting the grasp ranked as the best in , the average arm motion length and end effector displacement entailed was shorter than when selecting other candidate grasp. In the following sections we will analyze what is the best way to use this metric for 2-step tasks such as Pick-and-Place (Section IV) and Pouring (Section Section V).
Iv Pick-and-Place Tasks
Iv-a Task Definition
Given an object at a start pose , the robot must reposition to a 3D goal position keeping the object upright. Both start and goal states must be inside the reachable workspace of the robot arm being used. The task is considered successful if at the end of the pick-and-place execution the object is set at with a small margin of position error.
Notice that the goal state is not fully constrained (Figure 5). Specifically, the yaw orientation of can adopt many possible values. This presents a challenge for testing our metric , since for it to be evaluated we need a full 6D goal pose () to be defined. We chose to use this task description over a fully-constrained one as this is a very common pick-and-place variation that routinely appears in household scenarios..
In order to measure at the goal state, we propose to use what we term a goal pose guess (6D) for each candidate grasp, such that this can be used as an estimation of the likely 6D pose of at the goal sate. In the next section we explain how to generate these goal pose guesses based on a simple human heuristic.
Iv-B Generating likely goal pose guesses ()
We designed a simple heuristic to generate goal pose guesses for candidate grasps in pick-and-place tasks. We assume that we have a set of candidate grasps feasible to execute on at , but yet untested in the goal position as it is not fully defined. For each grasp , we define their corresponding goal pose guess as follows:
Calculate a referential rotation () from the start pose to the goal position . We do this by generating vectors originating in the shoulder point and pointing to both the 3D origin of and to . The referential rotation is the angle between these two vectors projected on the table plane (considering z as the up direction this is in fact a yaw angle). Figure 6 shows a visualization of for a sample problem.
We use the referential rotation calculated above as a maximum limit for the goal’s relative rotation with respect to . In general, we assume that for manipulation tasks, only the minimum effort necessary will be used (also known as the human ”economical principle” ). Under this assumption, a pick-and-place operation will apply a rotation on the object only when it is necessary.
For each candidate grasp , we set in a goal pose such that the relative rotation varies between 0 (minimum rotation) and . We discretize this interval in a small number of samples and test if is feasible at each of them. If so, we stop the testing and store this goal pose guess for future use in the grasp selection process. Notice that we start searching for pose guesses starting from zero, as we assume that rotations are minimum. Once a feasible pose is found, the search stops.
The algorithmic version of the heuristic is depicted in Algorithm 1. As it is shown, the output of this procedure is a guess goal pose per each candidate grasp. We use these guess poses instead of the original goal positions to calculate our metric and evaluate it accordingly.
Iv-C Calculating as an average for both start and goal states
Once we calculated our goal guess poses, we can measure at both the start and goal state. However, we also would like to get a measurement that considers both of them at the same time. As we pointed out, our metric produces as an output an ordering rather than a numerical value, so a standard averaging is not possible. We observe, however, that although our metric is composed of two components, and , only changes when evaluated at different object’s poses ( remains the same as the grasps are rigid). Given this, a simple way to calculate an average version of consists on averaging the values at the start and goal states and then use this average in combination with the non-changing to produce the final ordering, incorporating then both the start and goal information in the process. The algorithmic version corresponding to this explanation can be seen in Algorithm 2.
Iv-D Evaluation of at start, goal and as an average
We evaluate our metric in a series of random pick-and-place experiments on simulation. Using a simple tabletop scenario, such as the one shown in Figure 7 we generate a set of candidate grasps (using the method explained in ) and select which grasp to use according to our metric measured under 3 modalities: Start state, goal guess state and average. We performed 250 random experiments per each of 6 objects evaluated as to make sure the results were not affected by the object’s geometry (Figure 8).
The results of the simulation experiments are shown in Table I. We use 3 metrics to compare the performance of the grasps selected under measured at different steps:
Success rate: The main point of ranking the grasps is to avoid having to try multiple grasps before finding one that produces a feasible arm motion. The success rate measures if the grasp ranked the best produces a solution (no further grasps are tried).
Planning time: Ideally, the grasp selected must be easy to reach and execute, hence short planning times should be expected.
End Effector displacement: Related to planning time. Easy arm motions should imply short end-effector translation in the workspace.
|Object||Success||Hand Disp.(m)||Plan time (s)|
The following observations are made from the aforementioned results table:
The average metric presents higher success rates for the objects evaluated, with the metric measured at the goal state coming a close second. Interestingly, the metric measured at the start pose produce the lowest success rates.
The end effector distance traveled during the pick-and-place task is in general shorter for the metric measured at the goal and average cases.
The planning times are consistently lower for the metric measured at the goal state, although the difference is very small with respect to the other two cases.
In general, we could say that using the metric measured at either the goal state or as an average of the start and the goal produce good results. To give a better idea of the advantage of using this metric with respect to not using any metric at all, please refer to Table II.
|Object||Success||Hand Disp.(m)||Plan time (s)|
To illustrate the aplicability of our approach, we tested it in our robotic platform, a Schunk LWA4 bimanual manipulator. The video accompanying this paper shows the pick-and-place tasks being executed for 3 different scenarios (place object inside a box, on top of a box or on the table surface) with 9 objects of different geometries. Figure 9 and Figure 10 show the start and goal states for 2 of the evaluated objects in each of the 3 pick-and-place experiment variations. The grasps used in each example were selected by using the average discussed earlier. An interesting observation that can be deduced from both of these images is the fact that, even when the object geometries are different, the grasps selected by the robot can be described as visually similar: When the object must be put inside a box, the grasp chosen is one coming from the top, likely to avoid possible collisions with the box walls. In the on-box case, the grasp chosen has the arm coming at an angle, which makes the goal configuration more comfortable. Finally in the simplest on-table case, the grasp comes from the side, which corresponds to a relaxed arm configuration. We consider that this is a very good advantage of using simple measurements such as : Although the arm motion planning is sampling-based (RRT), the grasp selection introduces a certain level of determinism since the grasp selected will always be the same as long as the environmental conditions are similar.
V Pouring Tasks
V-a Task Definition
Given a pouring object in a starting pose and a second container object , the goal of the task is to position over such that transfer of contents from to is feasible. In a geometrical context, pouring in this paper is considered as a state in which is placed above and its main principal axis form an angle with the horizontal, such that is effectively pointing down.
In the pouring task we found that, analogously to the pick-and-place task, the goal is not fully constrained. For and considered as symmetric objects, there exists a manifold of possible locations where the pouring is feasible to execute (as long as is located above , keeping its top surface above the opening of , the tilting is likely to succeed). Given this, we also have to generate possible goal grasp poses (that allow tilting) in order to evaluate our metric in a similar manner as in the pick-and-place case.
V-B Generation of possible grasp poses
Given a receiver , many possible hand poses near it can be considered as candidate goal poses (nearest the ’s center, nearest its border). We generate a set of discretized goal poses that allow the hand to finish in an orientation suitable for pouring from symmetrical objects by setting the hand orientation such that its approach direction is tangent to ’s perimeter and has its orientation up. The position of the hand is set as the average between the minimum and maximum distance the object can be from such that perimeter is on top of . Figure 12 shows example goal guess poses generated for a given location. Some parts of the ’s border are unreachable for the hand, hence no grasps will be considered in that area. The formal algorithm of this process is shown in Algorithm 3. An example of the end-effector generated for a sample scenario can be seen in Figure 12
V-C Evaluation of at start, goal and as an average
In a manner similar to Section IV-D, we evaluate the metric measured at the start pose, guess goal pose and as an average of both. For these pouring tasks, however, since there are two objects involved ( and ), we test different combinations of these. The object models used for these are shown in Figure 13
The results are shown in Tables III, IV and V. In a surprising turn of events, for the pouring tasks evaluated we found that the metric with the best performance both in success rate and end-effector displacement was evaluated at the start location. This partly contradicts our findings in the previous pick-and-place scenario, where the measure taken using the average configurations between start and goal provided the best performance. Nonetheless, as in the previous cases, we observe that the performance with metrics (any metric) is in general better than when using any other grasp selected. We hypothetize that the main reason why in this case the start pose is more determinant in the results is that the goal pose for the end-effector really does not directly depend on the start pose. Rather, independently of where the hand starts, the goal pose is mostly defined by the position of the container in the robot workspace (the hands ends up in a pose with respect to the container that is more comfortable). Given this, it is not surprising then that the best performance is given only by the start location.
|Object||Success||Hand Disp.(m)||Plan time (s)|
|Object||Success||Hand Disp.(m)||Plan time (s)|
|Object||Success||Hand Disp.(m)||Plan time (s)|
As for the pick-and-place case, the results showed here mostly involve simulation. We also tested our grasp selection approach in our physical robot, performing 54 runs involving 3 containers, 3 objects to pour from (with different geometry) and different start and goal locations. Some of these results are shown in the accompanying video, and a few snapshots are shown in Figure 15, which depicts the robot final goal state during pouring tasks at 2 locations. For each object, the start location of the object was different, however we can see that the final state is similar. As we discussed earlier in this Section, the goal for the pouring tasks is loosely constrained in such a way that the goal state is not strongly tied to the start state, hence it does not matter too much for the grasp selection process.
In this paper we have presented a quantitative analysis of the advantages of using a manipulation metric () to select a grasp from a set of possible candidates. By using our metric, we observed that the grasp selected in most of the cases entails arm motions that present 3 advantageous characteristics: (1) Shorter end-effector path lengths, (2) Shorter planning times, and (3) Higher success rate with respect to other grasps in the candidate set. We evaluated our metric under 3 different modalities: At the start step, at the goal step and as an average of both. We found that for pick-and-place tasks, in which the goal constraints are more limited, the measured as an average (that is, considering both the start and the goal states) produced better results. For pouring tasks, in which the goal state is even more loosely defined, we found that by simply measuring our metric at the start step, the results were better when comparing it with any of the other grasps in the candidate set.
As it was clearly stated in the abstract, this paper focuses on two-step tasks. Even in this simple case we found that there is not a single one-size-fits-all strategy to select adequate grasps that present comparative better properties as the ones discussed in this paper (brief planning times, end-effector short paths, high success rate). The approach presented here is not directly transferable to 3-or-more-step tasks. A possible way to circumvent this issue could be dividing complex manipulation tasks in simple one- and two-step tasks (in which our metric could be used to select the adequate grasps). This is matter of future work.
-  R. Balasubramanian, L. Xu, P. D Brook, J.R. Smith, and Y. Matsuoka. Physical human interactive guidance: Identifying grasping principles from human-planned grasps. In The Human Hand as an Inspiration for Robot Hand Development. Springer, 2014.
-  D. Berenson, R. Diankov, K. Nishiwaki, S. Kagami, and J. Kuffner. Grasp planning in complex scenes. In Humanoids, 2007.
-  D. Berenson, S. Srinivasa, and J. Kuffner. Task space regions: A framework for pose-constrained manipulation planning. The International Journal of Robotics Research, 2011.
-  J. Bohg, A. Morales, T. Asfour, and D. Kragic. Data-driven grasp synthesis: A survey. IEEE Transactions on Robotics, 2014.
-  M.R. Cutkosky. On grasp choice, grasp models, and the design of hands for manufacturing tasks. IEEE Transactions on Robotics and Automation, 1989.
-  C. Ferrari and J. Canny. Planning optimal grasps. In ICRA, 1992.
-  C. Finn and S. Levine. Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696, 2016.
-  J. Fontanals, B.A. Dang-Vu, O. Porges, J. Rosell, and M. Roa. Integrated grasp and motion planning using independent contact regions. Humanoids, 2014.
-  S. Gu, E. Holly, T. Lillicrap, and S. Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. arXiv preprint arXiv:1610.00633, 2016.
-  A. Huamán Quispe, H. Ben Amor, and H.I. Christensen. Combining arm and hand metrics for sensible grasp selection. In CASE. IEEE, 2016.
-  A. Huamán Quispe, B. Milville, M. Gutiérrez, C. Erdogan, M. Stilman, H.I. Christensen, and H. Ben Amor. Exploiting symmetries and extrusions for grasping household objects. In ICRA, pages 3702–3708. IEEE, 2015.
-  O. Kroemer, C. Daniel, G. Neumann, H. van Hoof, and J. Peters. Towards learning hierarchical skills for multi-phase manipulation tasks. In ICRA, 2015.
-  B. León, C. Rubert, J. Sancho-Bru, and A. Morales. Characterization of grasp quality measures for evaluating robotic hands prehension. In ICRA, pages 3688–3693. IEEE, 2014.
-  S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. arXiv preprint arXiv:1603.02199, 2016.
-  H Nagasaki. Asymmetric velocity and acceleration profiles of human arm movements. Experimental Brain Research, 74(2):319–326, 1989.
-  M. Przybylski, T. Asfour, R. Dillmann, R. Gilster, and H. Deubel. Human-inspired selection of grasp hypotheses for execution on a humanoid robot. In IEEE-RAS Humanoids, 2011.
-  D. A. Rosenbaum, C. M. van Heugten, and G. E. Caldwell. From cognition to biomechanics and back: The end-state comfort effect and the middle-is-faster effect. Acta psychologica, 94(1):59–85, 1996.
-  A. Sahbani, S. El-Khoury, and P. Bidaud. An overview of 3d object grasp synthesis algorithms. Robotics and Autonomous Systems, 60(3):326–336, 2012.
-  M. Shimizu, H. Kakuya, W. Yoon, K. Kitagaki, and K. Kosuge. Analytical inverse kinematic computation for 7-DOF redundant manipulators with joint limits and its application to redundancy resolution. Transactions on Robotics, 24(5):1131–1142, 2008.
-  N. Vahrenkamp, T. Asfour, and R. Dillmann. Simultaneous grasp and motion planning: Humanoid robot armar-iii. Robotics & Automation Magazine, 2012.
-  A. Yamaguchi, C.G. Atkeson, and T. Ogasawara. Pouring skills with planning and learning modeled from human demonstrations. International Journal of Humanoid Robotics, 12(03):1550030, 2015.