Data-Driven Approach to Simulating Realistic Human Joint Constraints

Data-Driven Approach to Simulating Realistic Human Joint Constraints

Yifeng Jiang and C. Karen Liu Yifeng Jiang is with School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA yjiang340@gatech.eduC. Karen Liu is with School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA karenliu@cc.gatech.eduAccompanying video: https://youtu.be/wzkoE7wCbu0Github repositories: https://github.com/jyf588/Human-Joint-Constraints-Training and https://github.com/dartsim/dart/pull/1016
Abstract

Modeling realistic human joint limits is important for applications involving physical human-robot interaction. However, setting appropriate human joint limits is challenging because it is pose-dependent: the range of joint motion varies depending on the positions of other bones. The paper introduces a new technique to accurately simulate human joint limits in physics simulation. We propose to learn an implicit equation to represent the boundary of valid human joint configurations from real human data. The function in the implicit equation is represented by a fully connected neural network whose gradients can be efficiently computed via back-propagation. Using gradients, we can efficiently enforce realistic human joint limits through constraint forces in a physics engine or as constraints in an optimization problem.

I Introduction

As robots move from industrial applications to providing personal assistance alongside people, the emergence of collaborative robotics demands more generalizable yet tractable methodologies to model human movements and behaviors. The robot control policy that considers human state based on such methodologies has the potential to provide more effective assistance when applied in the real world. For a collaborative scenario that involves physical contacts and forceful interactions between the robot and the human, accurately predicting human movements is paramount to not only the functionality of the robot but also the safety of the human.

Modeling human behaviors in response to robot actions highly depends on the task of interest, the functionality of the robot, and the environment features and constraints. However, one common concern faced by most collaborative scenarios involving physical interactions is the awareness of the range of human motion due to physical joint limits. If the human joint limits are modeled too conservatively, the robot might miss out many effective strategies to assist people. On the other hand, overly relaxed human joint limits might lead to a robot control policy unsafe to humans. Setting appropriate human joint limits is challenging because not only that different joints have different ranges of angles, biomechanics literature [1][2][3] suggests that the range of angle varies depending on the positions of other joints (inter-joint dependency) or other degrees-of-freedom in the same joint (intra-joint dependency). For example, the range of flexion of our elbow depending on whether it is in front of or behind the body.

The paper introduces a new technique to enhance a general-purpose physics engine, such as Bullet [4], MuJoCo [5], or DART [6], to accurately simulate human as a dynamic system. As joint limits are one of the most important kinematic constraints that give rise to the unique characteristics of human movements, our technique formulates realistic joint limits from real human data and enforces them through constraint forces in a dynamic system. Our work is built on the comprehensive study on the range of human motion conducted by Akhter and Black [7], who captured human motion that includes an extensive variety of stretching poses performed by trained athletes. Using the dataset, they developed a procedure to determine whether a human configuration is within the valid range. This validity procedure involves discrete operations such as “if statements” and table lookups. While it is sufficient for determining the validity of a given pose, the lack of analytical representation and the non-differentiable nature of this procedure prevents it from being incorporated into the process of physics simulation.

We propose to learn an implicit equation, , to represent the boundary of valid human joint configurations, where is an analytical and differentiable function. We utilize the validity procedure developed by Akhter and Black to provide unlimited labeled data for learning represented as a fully connected neural network. The main benefit of a neural network representation is that the gradient of the function can be easily computed via back-propagation. Using gradients, we can efficiently enforce realistic human joint limits through constraint forces in a physics engine. For any model-free policy learning method, human joint limits will be enforced as part of the ”black-box” physics simulation, similar to the way contact constraints are handled. In addition to dynamic applications, we can also utilize the gradients of the neural network to solve inverse kinematics (IK) problems where solutions are confined in the set of valid human poses.

Our method is general, computationally efficient, and easy to implement. The learned joint-limit constraint can achieve accuracy. We evaluate the joint-limit constraint as a dynamic constraint in a physics simulation and as a kinematic constraint in a pose optimization.

Ii Related Work

Human joint ranges vary significantly from facet joints being able to merely rotate several degrees [8] to shoulder joints capable of near 180 degree movements in all three degrees-of-freedom [9]. The intra-joint and inter-joint dependencies further complicate the range of each degree-of-freedom [1][2][3]. For example, Hatze [3] showed the inter-joint coupling between the elbow range and the shoulder orientation. A model of viscoelastic torques of the elbow was also proposed, but the method requires to estimate numerous subject-specific parameters from repeated experiments, making it difficult to be applied as a general joint-limit constraint.

An accurate and tractable computational model of human joint limits has many applications in computer vision and graphics. Enforcing joint limits has been found useful to disambiguate 3D poses estimated from 2D images [10][11] [12][13]. However, most methods in this area assume a fixed range for each degree-of-freedom of the joint, overly simplifying the range of human motion to a few box constraints. Human pose distribution learned from motion capture data has also been heavily used in motion reconstruction in computer animation [14][15][16][17][18]. For example, Grochow et al. [14] used this distribution as a prior to solve constrained IK problems in order to generate human-like natural poses. Nevertheless, none of these methods aim to precisely define the boundary of the range of human motion. In contrast, our work enforces the learned joint limits in physics simulation, which might enable applications in robotics, such as the design of collaborative robots.

A few more realistic joint-limit models have been proposed recently. Brau and Jiang [19] incorporated both joint orientation and self-occlusion as prior distributions to estimate 3D poses, because joint limits are caused by anatomical constraints and by the physical presence of other body parts. However, it is not clear how one can enforce the validity of a pose from the priors. Herda et al. [20] modeled arm joint limits as implicit hypersurfaces learned from motion capture data. The range of elbow is modeled as an array of implicit equations, indexed by discretized shoulder orientations. While this method might suffice kinematic applications, it does not directly apply to dynamic applications using physics simulation. Akhter and Black [7] also utilized human data to develop a procedure that determines whether a full-body pose is within the valid range of human motion. Their discontinuous model can be used as an inequality constraint, but cannot utilize efficient gradient-based methods to solve the optimization. Built upon Akhter and Black’s work, our analytical and differentiable model can be incorporated widely in any constraint solving or optimization problems that demand accurate gradients.

Iii Method

We take a data-driven approach to learn a hypersurface represented by an implicit equation, , where indicates the joint configuration of the agent in generalized coordinates. The hypersurface represents the boundary of the range of motion a human can achieve. If is a valid configuration, . Otherwise, is negative.

Once a differentiable, analytical function, is learned, we can utilize its gradient in a number of robotic applications. For example, we can create a dynamic constraint to enforce the learned joint limits in a physics simulator. We can also enforce joint limits as constraints in planning or trajectory optimization problems.

Iii-a Learning joint limits

The function is represented by a fully connected neural network, which maps a joint configuration to a scalar which indicates the validity of . We utilize the method developed by Akhter and Black [7] to generate unlimited training data. Using an extensive motion capture dataset of human poses, Akhter and Black introduced a procedure, , which maps an array of 3D joint positions, from bone segments, to an array of binary numbers indicating the validity of the orientation of each bone segment. Their algorithm only evaluates the validity of the arms without hands, the legs, and the head. It also assumes that the validity of each limb is independent of other limbs or the head. As such, we define two functions and for the four limbs of the agent. The input of consists of a shoulder with 3 degrees-of-freedom (3-DOF) and a 1-DOF elbow, while the input of considers a 3-DOF hip, a 1-DOF knee, and a 2-DOF ankle. Evaluating the validity of the neck is not presented in this paper but is a trivial extension.

The procedure for generating training samples is shown in Algorithm 1. A training sample is a pair of joint configuration and its label. For each limb with bone segments, we first initialize buffers, , for storing the training samples categorized by the validity of the joint configuration (Line 1). According to Akhter and Black, if a bone segment is invalid, all the offspring bones are considered invalid. This treatment can be justified by the fact that the validity of a child segment might depend on its parent segment but not vice versa. As such, there are only categories of invalid configurations, which are stored in to . The samples with a valid configuration are stored in . For example, the types of validity for the arm include valid upper arm and valid lower arm (stored in ), valid upper arm and invalid lower arm (stored in ), and invalid upper arm (stored in ).

To generate each training sample, the algorithm samples a vector, , from the joint configuration space (Line 3), computes the Cartesian positions of the joints from via forward kinematics (Line 4), and evaluates the validity of the joint positions by calling the function (Line 5). If all the bone segments are valid, the sample is labeled with value and stored in (Line 6-7). Otherwise, we label the sample and store it in the buffer corresponding to the most significant bit (MSB) of the inverted binary vector returned by (Line 8-10). Note that the joint configuration is represented by the sine and cosine of each joint angle. This is a common practice to represent cyclic variables in the input space of a neural network to improve learning efficiency.

1:Initialize buffers:
2:while  do
3:     Uniformly sample from joint configuration space 
4:      = ForwardKinematics()  
5:      = isValid()  
6:     if all(then
7:         Store in
8:     else
9:          = MSB()
10:         Store in        
11:     
12:Keep first samples in each buffer  
13:return
Algorithm 1 Generating Training Data

As one might expect, the uniform sampling in the joint configuration space results in unbalanced distribution across categories ’s. For example, only of the uniformly sampled leg samples fall in and in . On the other hand, of samples fall in and in . The unbalanced samples will significantly bias the learning results. We use rejection sampling to balance the number of samples in each category. In practice, we sample uniformly until every buffer has at least samples and reject the extra samples in the buffers that have more than samples.

We train two separate fully connected feed-forward neural networks for and for . Each network has three hidden layers with 128 hidden units per layer. All activations are tanh except for the sigmoid function at the final layer. The output of the neural network is subtracted by 0.5 such that ranges between and . Since the procedure to generate training data is completely automatic, the number of training samples can be as large as necessary. In our experiences, we find that samples in total is sufficient to reach accuracy on the testing set after epochs. The training of each neural network takes only a few minutes on a Core i5 CPU without GPU support.

Despite that the learned functions, and , can well represent , there is a potential issue due to the ambiguity caused by the transformation from the joint configuration to the Cartesian joint positions (i.e. ). For example, the joint configuration of shoulder rotation and elbow bending results in the same 3D joint positions as those from the configuration of shoulder rotation and elbow bending . deems both configurations valid, but bending the elbow backward is clearly not achievable by humans. Fortunately, this ambiguity can be easily resolved if we also enforce standard box constraints on the joints during usage. These box constraints are set wider than the range of motion and are only used to exclude clearly impossible poses. For example, in our experiments we loosely define the box constraint of the rotation DOF in the shoulder to be and of the DOF in the elbow to be . We do not set box constraints on the other two DOFs in the shoulder. A valid joint configuration must satisfy both the learned joint-limit constraint and the default box constraints. This also implies that sampling outside the box constraints during training is unnecessary.

Iii-B Enforcing joint limits as dynamic constraints

Given the learned function, , the implicit equation represents a hyperpsurface in the joint configuration space where one side to the surface () is infeasible. In physics simulation, this is analogous to a contact constraint that prevents two objects inter-penetrating each other. To maintain such dynamic constraints during simulation, we must apply constraint impulses to the simulated bodies when they are exactly on the hypersurface (i.e. ). The constraint impulses can be computed in different ways, one of which formulates a Linear Complementarity Problem (LCP) to solve all the active constraints at the current time step. In this paper, we incorporate our learned joint-limit constraints in the LCP framework, but they can be applied to other constraint handling methods, such as the optimization-based approach proposed by Todorov [21].

The governing equation in LCP is the relationship between the rate of constraint violation () and the constraint impulse that stops the violation (). In an implicit-stepping formulation, we define as the rate of joint-limit violation in the next time step:

where is the joint velocity at the next time step. The Jacobian, , can be efficiently calculated via back propagation of the trained network.

When the constraint is active, i.e. , LCP solves for and simultaneously such that the following three conditions are met:

  1. : The unilateral joint-limit constraint must be satisfied at the next time step.

  2. : The constraint impulse must only push in the direction that prevents constraint violation.

  3. : Either or must be zero. If , the joint limit will no longer be violated and thus there should be no constraint impulse (). If , the joint-limit constraint must continue to be active ().

Once the LCP is solved, we can compute the corresponding joint torques by:

where is a vector in . By applying to the corresponding joints, the pose-dependent joint limits are enforced via dynamics. Note that the joint-limit constraints are solved together with other constraints (e.g. contact constraints) and the dynamic equations of motion, such that the next state of the agent is dynamically valid within the range of motion.

Iii-C Enforcing joint limits in inverse kinematics (IK)

Satisfying joint-limit constraints is important in kinematic trajectory planning. A typical planning algorithm uses IK to compute a reference trajectory which is then tracked by feedback controllers. Incorporating the joint-limit constraint in IK planning prevents the agent from hitting joint limits during execution:

where is the forward kinematics routine that computes the Cartesian location of a body point from a joint configuration and is the target location for the body point.

This non-convex constrained optimization can be solved by various optimization techniques, such as Sequential Quadratic Programming (SQP) or interior point methods. Here we simply enforce the constraint as a penalty in the objective function when the inequality is violated:

In practice, we make the inequality constraint slightly tighter, , to strictly enforce the joint limits.

Iv Evaluation

We first validated that the learned networks are an adequate differentiable substitute for by reporting the accuracy on the test datasets. We then evaluated the joint-limit constraints represented by the networks in two applications: physics simulation and inverse kinematics.

Iv-a Accuracy of classification

After training the neural networks using TensorFlow [22], we tested each neural network on a set of random joint configurations sampled from the same distribution as the training data. The confusion matrices for and are given in Fig. 1. Rejection sampling reduces bias in the trained classifiers by balancing the false negative and false positive ratio. Fig. 2 shows the learning curves of and .

Fig. 1: Confusion matrices of (Left) and (Right).
Fig. 2: Learning curves of (Left) and (Right).

Iv-B Physics simulation

We used the open source physics engine DART [6], which provides APIs to implement user-defined constraints without modifying the core code that formulates and solves LCP. For feedforward and back-propagation operations on a neural network, we incorporated the light-weight, C++ library tiny-dnn [23], with the trained weights imported from TensorFlow. To clearly visualize the effect of joint-limit constraints, the following experiments only simulate one limb with the torso fixed in place.

Satisfaction of joint limits

To evaluate how precisely the joint-limit constraints are enforced in the simulation, we applied random joint torques to the agent and reported the statistics of joint-limit violation. With over joint configurations sampled along the simulated trajectory, Fig. 3 shows the histogram of , indicating the distribution of joint-limit violation. It is evident that the random joint torques frequently drive the agent to invalid joint configurations without the proposed joint-limit constraints. We use trained as ground truth because it has been established in IV-A that is an accurate approximator of .

Fig. 3: Histogram of joint-limit violation. The orange bars show the simulated joint configurations with only box constraints on joints enforced, which results in a large portion of invalid joint configurations. With our learned joint-limit constraint enforced (blue bars), all the tested configurations stay in the valid region of the constraint.

Pose-dependent joint range

Akhter and Black [7] used the term ”pose-conditioned” to emphasize that some joint ranges depend on the configuration of their parent joints. To test whether our joint-limit constraints exhibit the same dependency, we simulated elbow flexion with different fixed upper arm positions. Fig. 4 shows the moments when the elbow limit is reached. When the upper arm is beside the torso, the elbow can flex much more than when the upper arm is behind the torso. Similarly, Fig. 5 shows that the knee has a wider range of motion when the hip is flexing as opposed to abducting.

Fig. 4: Pose-dependent elbow flexion limits. Left: When the arm is beside the torso, the elbow can flex up to . Right: When the arm is behind the torso, the elbow can only flex up to .
Fig. 5: Pose-dependent knee flexion limits. Left: When the hip is abducting, the knee can only flex up to . Right: When the hip is flexing, the knee can flex up to .

Emergence of realistic motion

In this experiment, the agent applied torque on the elbow when the arm is behind the back. Unlike the previous example shown in Fig. 4 Right, we unlocked the upper arm and left it completely passive. As the elbow continues to flex, we observe that the upper arm starts to abduct and rotate back towards the front, adjusting its position to allow more elbow flexion (Fig. 6 Top). In contrast, without our joint-limit constraint, the upper arm stays in the same position while the elbow flexes beyond the range of human motion (Fig. 6 Bottom).

Fig. 6: Shoulder unlocking. Torque is applied on the elbow when the arm is behind the back. Top: With the proposed joint-limit constraint, as the elbow continues to flex, the upper arm starts to abduct and rotate back towards the front, adjusting its position to allow more elbow flexion. Bottom: Without the joint-limit constraint, the upper arm stays at its initial position while the elbow flexes beyond the range of human motion.

Iv-C Inverse kinematics

In this set of experiments, the agent is commanded to reach a 3D location by solving an inverse kinematics problem. We formulated the IK problem as an optimization and used gradient descent to solve it.

Satisfaction of joint limits

We randomly generated a set of 3D target locations and solved IK for each location sequentially. The joint configurations generated by solving these IK problems were then evaluated by the constraint functions . Fig. 7 shows the histogram of . The joint configurations strictly stay in the valid region when the optimization incorporates the learned joint-limit constraints as penalty. In contrast, most of the joint configurations are invalid when only box joint-limit constraints are enforced.

Fig. 7: Histogram of joint-limit violation. The orange bars show the IK solutions without enforcing our joint-limit constraint. With the constraint enforced (blue bars), all the IK solutions are valid joint configurations.

Emergence of realistic motion

Fig. 8 demonstrates the situation where the target is unreachable within the range of human motion. Without our joint-limit constraint, the IK solution reaches the target by an unrealistic pose (Bottom). The IK solution that satisfies our joint limits ”correctly” fails to reach the target (Top). In addition, we observe that our joint-limit constraints can mitigate the issue of self-collision because many of the self-penetrating poses are not presented in the database learns from. As a result, our joint-limit constraints also classify these poses as invalid. Fig. 9 shows that, without activating collision detection and handling, our joint-limit constraints alone can make IK solution collision-free.

Fig. 8: Unreachable target within the range of human motion. Top: With the learned joint-limit constraint, the IK solution that satisfies the joint limits ”correctly” fails to reach the target. Bottom: With only box limits, the IK solution reaches the target by an unrealistic pose.
Fig. 9: Self-collision. Top: Without activating collision detection and handling, our joint-limit constraint alone can make IK solution collision-free. Bottom: The IK solution with box limits only exhibits self-penetration.

V Conclusions

This paper has proposed a general, computationally efficient, and easy to implement method to model accurate human joint limits. We have learned a joint-limit constraint function from real world data and represented it as a neural network. The differentiability of the function allows us to compute the gradient which is required for enforcing the joint limits using constraint forces in a physics simulation. In addition, we have shown that the joint-limit constraint can be incorporated into pose optimization problems and solved by gradient-based optimization methods.

Our method can be further improved. The trained network is specific to a particular joint configuration space with specific joint types, axis orders, and the rest pose. However, the proposed algorithm is general to learn the function for different joint configuration spaces. For example, one future direction is to use quaternion to represent ball joints to avoid the Gimbal lock. For another possible improvement, our learned function can evaluate the feasibility of a pose, but not the comfort level associated with it. Learning a function to address the preference of the human poses can be a possible future direction.

References

  • [1] A. Engin and S.-M. Chen, “Statistical data base for the biomechanical properties of the human shoulder complex Part I: Kinematics of the shoulder complex,” Journal of Biomechanical Engineering, vol. 108, no. 3, pp. 215–221, 1986.
  • [2] X. Wang, M. Maurin, F. Mazet, N. D. C. Maia, K. Voinot, J. P. Verriest, and M. Fayet, “Three-dimensional modelling of the motion range of axial rotation of the upper arm,” Journal of Biomechanics, vol. 31, no. 10, pp. 899–908, 1998.
  • [3] H. Hatze, “A three-dimensional multivariate model of passive human joint torques and articular boundaries,” Clinical Biomechanics, vol. 12, no. 2, pp. 128–135, 1997.
  • [4] Bullet Physics Development Team, “Bullet Physics Library: Real Time Physics Simulation.” http://bulletphysics.org.
  • [5] Roboti LLC, “MuJoCo Physics Engine.” http://mujoco.org.
  • [6] J. Lee, M. X. Grey, S. Ha, T. Kunz, S. Jain, Y. Ye, S. S. Srinivasa, M. Stilman, and C. K. Liu, “DART: Dynamic animation and robotics toolkit,” The Journal of Open Source Software, 2018.
  • [7] I. Akhter and M. J. Black, “Pose-conditioned joint angle limits for 3D human pose reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1446–1455, 2015.
  • [8] M. Kozanek, S. Wang, P. G. Passias, Q. Xia, G. Li, C. M. Bono, K. B. Wood, and G. Li, “Range of motion and orientation of the lumbar facet joints in vivo,” Spine, vol. 34, no. 19, pp. E689–E696, 2009.
  • [9] R. S. Snell, Clinical Anatomy by Systems, pp. 427–428. Lippincott Williams & Wilkins, 2007.
  • [10] J. Chen, S. Nie, and Q. Ji, “Data-free prior model for upper body pose estimation and tracking,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 4627–4639, 2013.
  • [11] J. M. Rehg, D. D. Morris, and T. Kanade, “Ambiguities in visual tracking of articulated objects using two and three-dimensional models,” The International Journal of Robotics Research, vol. 22, no. 6, pp. 393–418, 2003.
  • [12] C. Sminchisescu and B. Triggs, “Estimating articulated human motion with covariance scaled sampling,” The International Journal of Robotics Research, vol. 22, no. 6, pp. 371–391, 2003.
  • [13] D. Demirdjian, “Enforcing constraints for human body tracking,” in Computer Vision and Pattern Recognition Workshop, 2003., vol. 9, pp. 102–102, IEEE, 2003.
  • [14] K. Grochow, S. L. Martin, A. Hertzmann, and Z. Popović, “Style-based inverse kinematics,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 522–531, 2004.
  • [15] P. Zhang, K. Siu, J. Zhang, C. K. Liu, and J. Chai, “Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture,” ACM Transactions on Graphics (TOG), vol. 33, no. 6, p. 221, 2014.
  • [16] J. Chai and J. K. Hodgins, “Constraint-based motion optimization using a statistical dynamic model,” ACM Transactions on Graphics (TOG), vol. 26, no. 3, p. 8, 2007.
  • [17] J. Min, Y.-L. Chen, and J. Chai, “Interactive generation of human animation with deformable motion models,” ACM Transactions on Graphics (TOG), vol. 29, no. 1, p. 9, 2009.
  • [18] M. Brand and A. Hertzmann, “Style machines,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 183–192, 2000.
  • [19] E. Brau and H. Jiang, “A bayesian part-based approach to 3D human pose and camera estimation,” Pattern Recognition (ICPR), 2016 23rd International Conference on, pp. 1762–1767, 2016.
  • [20] L. Herda, R. Urtasun, and P. Fua, “Hierarchical implicit surface joint limits for human body tracking,” Computer Vision and Image Understanding, vol. 99, no. 2, pp. 189–209, 2005.
  • [21] E. Todorov, “Convex and analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo,” Robotics and Automation (ICRA), 2014 IEEE International Conference on, pp. 6054–6061, 2014.
  • [22] The TensorFlow Developers, “TensorFlow: Large-scale machine learning on heterogeneous systems.” https://tensorflow.org/.
  • [23] tiny-dnn Authors, “tiny-dnn: Header only, dependency-free deep learning framework in C++14.” http://tiny-dnn.readthedocs.io.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
139458
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description