DataDriven Approach to Simulating Realistic Human Joint Constraints
Abstract
Modeling realistic human joint limits is important for applications involving physical humanrobot interaction. However, setting appropriate human joint limits is challenging because it is posedependent: the range of joint motion varies depending on the positions of other bones. The paper introduces a new technique to accurately simulate human joint limits in physics simulation. We propose to learn an implicit equation to represent the boundary of valid human joint configurations from real human data. The function in the implicit equation is represented by a fully connected neural network whose gradients can be efficiently computed via backpropagation. Using gradients, we can efficiently enforce realistic human joint limits through constraint forces in a physics engine or as constraints in an optimization problem.
I Introduction
As robots move from industrial applications to providing personal assistance alongside people, the emergence of collaborative robotics demands more generalizable yet tractable methodologies to model human movements and behaviors. The robot control policy that considers human state based on such methodologies has the potential to provide more effective assistance when applied in the real world. For a collaborative scenario that involves physical contacts and forceful interactions between the robot and the human, accurately predicting human movements is paramount to not only the functionality of the robot but also the safety of the human.
Modeling human behaviors in response to robot actions highly depends on the task of interest, the functionality of the robot, and the environment features and constraints. However, one common concern faced by most collaborative scenarios involving physical interactions is the awareness of the range of human motion due to physical joint limits. If the human joint limits are modeled too conservatively, the robot might miss out many effective strategies to assist people. On the other hand, overly relaxed human joint limits might lead to a robot control policy unsafe to humans. Setting appropriate human joint limits is challenging because not only that different joints have different ranges of angles, biomechanics literature [1][2][3] suggests that the range of angle varies depending on the positions of other joints (interjoint dependency) or other degreesoffreedom in the same joint (intrajoint dependency). For example, the range of flexion of our elbow depending on whether it is in front of or behind the body.
The paper introduces a new technique to enhance a generalpurpose physics engine, such as Bullet [4], MuJoCo [5], or DART [6], to accurately simulate human as a dynamic system. As joint limits are one of the most important kinematic constraints that give rise to the unique characteristics of human movements, our technique formulates realistic joint limits from real human data and enforces them through constraint forces in a dynamic system. Our work is built on the comprehensive study on the range of human motion conducted by Akhter and Black [7], who captured human motion that includes an extensive variety of stretching poses performed by trained athletes. Using the dataset, they developed a procedure to determine whether a human configuration is within the valid range. This validity procedure involves discrete operations such as âif statementsâ and table lookups. While it is sufficient for determining the validity of a given pose, the lack of analytical representation and the nondifferentiable nature of this procedure prevents it from being incorporated into the process of physics simulation.
We propose to learn an implicit equation, , to represent the boundary of valid human joint configurations, where is an analytical and differentiable function. We utilize the validity procedure developed by Akhter and Black to provide unlimited labeled data for learning represented as a fully connected neural network. The main benefit of a neural network representation is that the gradient of the function can be easily computed via backpropagation. Using gradients, we can efficiently enforce realistic human joint limits through constraint forces in a physics engine. For any modelfree policy learning method, human joint limits will be enforced as part of the ”blackbox” physics simulation, similar to the way contact constraints are handled. In addition to dynamic applications, we can also utilize the gradients of the neural network to solve inverse kinematics (IK) problems where solutions are confined in the set of valid human poses.
Our method is general, computationally efficient, and easy to implement. The learned jointlimit constraint can achieve accuracy. We evaluate the jointlimit constraint as a dynamic constraint in a physics simulation and as a kinematic constraint in a pose optimization.
Ii Related Work
Human joint ranges vary significantly from facet joints being able to merely rotate several degrees [8] to shoulder joints capable of near 180 degree movements in all three degreesoffreedom [9]. The intrajoint and interjoint dependencies further complicate the range of each degreeoffreedom [1][2][3]. For example, Hatze [3] showed the interjoint coupling between the elbow range and the shoulder orientation. A model of viscoelastic torques of the elbow was also proposed, but the method requires to estimate numerous subjectspecific parameters from repeated experiments, making it difficult to be applied as a general jointlimit constraint.
An accurate and tractable computational model of human joint limits has many applications in computer vision and graphics. Enforcing joint limits has been found useful to disambiguate 3D poses estimated from 2D images [10][11] [12][13]. However, most methods in this area assume a fixed range for each degreeoffreedom of the joint, overly simplifying the range of human motion to a few box constraints. Human pose distribution learned from motion capture data has also been heavily used in motion reconstruction in computer animation [14][15][16][17][18]. For example, Grochow et al. [14] used this distribution as a prior to solve constrained IK problems in order to generate humanlike natural poses. Nevertheless, none of these methods aim to precisely define the boundary of the range of human motion. In contrast, our work enforces the learned joint limits in physics simulation, which might enable applications in robotics, such as the design of collaborative robots.
A few more realistic jointlimit models have been proposed recently. Brau and Jiang [19] incorporated both joint orientation and selfocclusion as prior distributions to estimate 3D poses, because joint limits are caused by anatomical constraints and by the physical presence of other body parts. However, it is not clear how one can enforce the validity of a pose from the priors. Herda et al. [20] modeled arm joint limits as implicit hypersurfaces learned from motion capture data. The range of elbow is modeled as an array of implicit equations, indexed by discretized shoulder orientations. While this method might suffice kinematic applications, it does not directly apply to dynamic applications using physics simulation. Akhter and Black [7] also utilized human data to develop a procedure that determines whether a fullbody pose is within the valid range of human motion. Their discontinuous model can be used as an inequality constraint, but cannot utilize efficient gradientbased methods to solve the optimization. Built upon Akhter and Black’s work, our analytical and differentiable model can be incorporated widely in any constraint solving or optimization problems that demand accurate gradients.
Iii Method
We take a datadriven approach to learn a hypersurface represented by an implicit equation, , where indicates the joint configuration of the agent in generalized coordinates. The hypersurface represents the boundary of the range of motion a human can achieve. If is a valid configuration, . Otherwise, is negative.
Once a differentiable, analytical function, is learned, we can utilize its gradient in a number of robotic applications. For example, we can create a dynamic constraint to enforce the learned joint limits in a physics simulator. We can also enforce joint limits as constraints in planning or trajectory optimization problems.
Iiia Learning joint limits
The function is represented by a fully connected neural network, which maps a joint configuration to a scalar which indicates the validity of . We utilize the method developed by Akhter and Black [7] to generate unlimited training data. Using an extensive motion capture dataset of human poses, Akhter and Black introduced a procedure, , which maps an array of 3D joint positions, from bone segments, to an array of binary numbers indicating the validity of the orientation of each bone segment. Their algorithm only evaluates the validity of the arms without hands, the legs, and the head. It also assumes that the validity of each limb is independent of other limbs or the head. As such, we define two functions and for the four limbs of the agent. The input of consists of a shoulder with 3 degreesoffreedom (3DOF) and a 1DOF elbow, while the input of considers a 3DOF hip, a 1DOF knee, and a 2DOF ankle. Evaluating the validity of the neck is not presented in this paper but is a trivial extension.
The procedure for generating training samples is shown in Algorithm 1. A training sample is a pair of joint configuration and its label. For each limb with bone segments, we first initialize buffers, , for storing the training samples categorized by the validity of the joint configuration (Line 1). According to Akhter and Black, if a bone segment is invalid, all the offspring bones are considered invalid. This treatment can be justified by the fact that the validity of a child segment might depend on its parent segment but not vice versa. As such, there are only categories of invalid configurations, which are stored in to . The samples with a valid configuration are stored in . For example, the types of validity for the arm include valid upper arm and valid lower arm (stored in ), valid upper arm and invalid lower arm (stored in ), and invalid upper arm (stored in ).
To generate each training sample, the algorithm samples a vector, , from the joint configuration space (Line 3), computes the Cartesian positions of the joints from via forward kinematics (Line 4), and evaluates the validity of the joint positions by calling the function (Line 5). If all the bone segments are valid, the sample is labeled with value and stored in (Line 67). Otherwise, we label the sample and store it in the buffer corresponding to the most significant bit (MSB) of the inverted binary vector returned by (Line 810). Note that the joint configuration is represented by the sine and cosine of each joint angle. This is a common practice to represent cyclic variables in the input space of a neural network to improve learning efficiency.
As one might expect, the uniform sampling in the joint configuration space results in unbalanced distribution across categories ’s. For example, only of the uniformly sampled leg samples fall in and in . On the other hand, of samples fall in and in . The unbalanced samples will significantly bias the learning results. We use rejection sampling to balance the number of samples in each category. In practice, we sample uniformly until every buffer has at least samples and reject the extra samples in the buffers that have more than samples.
We train two separate fully connected feedforward neural networks for and for . Each network has three hidden layers with 128 hidden units per layer. All activations are tanh except for the sigmoid function at the final layer. The output of the neural network is subtracted by 0.5 such that ranges between and . Since the procedure to generate training data is completely automatic, the number of training samples can be as large as necessary. In our experiences, we find that samples in total is sufficient to reach accuracy on the testing set after epochs. The training of each neural network takes only a few minutes on a Core i5 CPU without GPU support.
Despite that the learned functions, and , can well represent , there is a potential issue due to the ambiguity caused by the transformation from the joint configuration to the Cartesian joint positions (i.e. ). For example, the joint configuration of shoulder rotation and elbow bending results in the same 3D joint positions as those from the configuration of shoulder rotation and elbow bending . deems both configurations valid, but bending the elbow backward is clearly not achievable by humans. Fortunately, this ambiguity can be easily resolved if we also enforce standard box constraints on the joints during usage. These box constraints are set wider than the range of motion and are only used to exclude clearly impossible poses. For example, in our experiments we loosely define the box constraint of the rotation DOF in the shoulder to be and of the DOF in the elbow to be . We do not set box constraints on the other two DOFs in the shoulder. A valid joint configuration must satisfy both the learned jointlimit constraint and the default box constraints. This also implies that sampling outside the box constraints during training is unnecessary.
IiiB Enforcing joint limits as dynamic constraints
Given the learned function, , the implicit equation represents a hyperpsurface in the joint configuration space where one side to the surface () is infeasible. In physics simulation, this is analogous to a contact constraint that prevents two objects interpenetrating each other. To maintain such dynamic constraints during simulation, we must apply constraint impulses to the simulated bodies when they are exactly on the hypersurface (i.e. ). The constraint impulses can be computed in different ways, one of which formulates a Linear Complementarity Problem (LCP) to solve all the active constraints at the current time step. In this paper, we incorporate our learned jointlimit constraints in the LCP framework, but they can be applied to other constraint handling methods, such as the optimizationbased approach proposed by Todorov [21].
The governing equation in LCP is the relationship between the rate of constraint violation () and the constraint impulse that stops the violation (). In an implicitstepping formulation, we define as the rate of jointlimit violation in the next time step:
where is the joint velocity at the next time step. The Jacobian, , can be efficiently calculated via back propagation of the trained network.
When the constraint is active, i.e. , LCP solves for and simultaneously such that the following three conditions are met:

: The unilateral jointlimit constraint must be satisfied at the next time step.

: The constraint impulse must only push in the direction that prevents constraint violation.

: Either or must be zero. If , the joint limit will no longer be violated and thus there should be no constraint impulse (). If , the jointlimit constraint must continue to be active ().
Once the LCP is solved, we can compute the corresponding joint torques by:
where is a vector in . By applying to the corresponding joints, the posedependent joint limits are enforced via dynamics. Note that the jointlimit constraints are solved together with other constraints (e.g. contact constraints) and the dynamic equations of motion, such that the next state of the agent is dynamically valid within the range of motion.
IiiC Enforcing joint limits in inverse kinematics (IK)
Satisfying jointlimit constraints is important in kinematic trajectory planning. A typical planning algorithm uses IK to compute a reference trajectory which is then tracked by feedback controllers. Incorporating the jointlimit constraint in IK planning prevents the agent from hitting joint limits during execution:
where is the forward kinematics routine that computes the Cartesian location of a body point from a joint configuration and is the target location for the body point.
This nonconvex constrained optimization can be solved by various optimization techniques, such as Sequential Quadratic Programming (SQP) or interior point methods. Here we simply enforce the constraint as a penalty in the objective function when the inequality is violated:
In practice, we make the inequality constraint slightly tighter, , to strictly enforce the joint limits.
Iv Evaluation
We first validated that the learned networks are an adequate differentiable substitute for by reporting the accuracy on the test datasets. We then evaluated the jointlimit constraints represented by the networks in two applications: physics simulation and inverse kinematics.
Iva Accuracy of classification
After training the neural networks using TensorFlow [22], we tested each neural network on a set of random joint configurations sampled from the same distribution as the training data. The confusion matrices for and are given in Fig. 1. Rejection sampling reduces bias in the trained classifiers by balancing the false negative and false positive ratio. Fig. 2 shows the learning curves of and .
IvB Physics simulation
We used the open source physics engine DART [6], which provides APIs to implement userdefined constraints without modifying the core code that formulates and solves LCP. For feedforward and backpropagation operations on a neural network, we incorporated the lightweight, C++ library tinydnn [23], with the trained weights imported from TensorFlow. To clearly visualize the effect of jointlimit constraints, the following experiments only simulate one limb with the torso fixed in place.
Satisfaction of joint limits
To evaluate how precisely the jointlimit constraints are enforced in the simulation, we applied random joint torques to the agent and reported the statistics of jointlimit violation. With over joint configurations sampled along the simulated trajectory, Fig. 3 shows the histogram of , indicating the distribution of jointlimit violation. It is evident that the random joint torques frequently drive the agent to invalid joint configurations without the proposed jointlimit constraints. We use trained as ground truth because it has been established in IVA that is an accurate approximator of .
Posedependent joint range
Akhter and Black [7] used the term ”poseconditioned” to emphasize that some joint ranges depend on the configuration of their parent joints. To test whether our jointlimit constraints exhibit the same dependency, we simulated elbow flexion with different fixed upper arm positions. Fig. 4 shows the moments when the elbow limit is reached. When the upper arm is beside the torso, the elbow can flex much more than when the upper arm is behind the torso. Similarly, Fig. 5 shows that the knee has a wider range of motion when the hip is flexing as opposed to abducting.
Emergence of realistic motion
In this experiment, the agent applied torque on the elbow when the arm is behind the back. Unlike the previous example shown in Fig. 4 Right, we unlocked the upper arm and left it completely passive. As the elbow continues to flex, we observe that the upper arm starts to abduct and rotate back towards the front, adjusting its position to allow more elbow flexion (Fig. 6 Top). In contrast, without our jointlimit constraint, the upper arm stays in the same position while the elbow flexes beyond the range of human motion (Fig. 6 Bottom).
IvC Inverse kinematics
In this set of experiments, the agent is commanded to reach a 3D location by solving an inverse kinematics problem. We formulated the IK problem as an optimization and used gradient descent to solve it.
Satisfaction of joint limits
We randomly generated a set of 3D target locations and solved IK for each location sequentially. The joint configurations generated by solving these IK problems were then evaluated by the constraint functions . Fig. 7 shows the histogram of . The joint configurations strictly stay in the valid region when the optimization incorporates the learned jointlimit constraints as penalty. In contrast, most of the joint configurations are invalid when only box jointlimit constraints are enforced.
Emergence of realistic motion
Fig. 8 demonstrates the situation where the target is unreachable within the range of human motion. Without our jointlimit constraint, the IK solution reaches the target by an unrealistic pose (Bottom). The IK solution that satisfies our joint limits ”correctly” fails to reach the target (Top). In addition, we observe that our jointlimit constraints can mitigate the issue of selfcollision because many of the selfpenetrating poses are not presented in the database learns from. As a result, our jointlimit constraints also classify these poses as invalid. Fig. 9 shows that, without activating collision detection and handling, our jointlimit constraints alone can make IK solution collisionfree.
V Conclusions
This paper has proposed a general, computationally efficient, and easy to implement method to model accurate human joint limits. We have learned a jointlimit constraint function from real world data and represented it as a neural network. The differentiability of the function allows us to compute the gradient which is required for enforcing the joint limits using constraint forces in a physics simulation. In addition, we have shown that the jointlimit constraint can be incorporated into pose optimization problems and solved by gradientbased optimization methods.
Our method can be further improved. The trained network is specific to a particular joint configuration space with specific joint types, axis orders, and the rest pose. However, the proposed algorithm is general to learn the function for different joint configuration spaces. For example, one future direction is to use quaternion to represent ball joints to avoid the Gimbal lock. For another possible improvement, our learned function can evaluate the feasibility of a pose, but not the comfort level associated with it. Learning a function to address the preference of the human poses can be a possible future direction.
References
 [1] A. Engin and S.M. Chen, “Statistical data base for the biomechanical properties of the human shoulder complex Part I: Kinematics of the shoulder complex,” Journal of Biomechanical Engineering, vol. 108, no. 3, pp. 215–221, 1986.
 [2] X. Wang, M. Maurin, F. Mazet, N. D. C. Maia, K. Voinot, J. P. Verriest, and M. Fayet, “Threedimensional modelling of the motion range of axial rotation of the upper arm,” Journal of Biomechanics, vol. 31, no. 10, pp. 899–908, 1998.
 [3] H. Hatze, “A threedimensional multivariate model of passive human joint torques and articular boundaries,” Clinical Biomechanics, vol. 12, no. 2, pp. 128–135, 1997.
 [4] Bullet Physics Development Team, “Bullet Physics Library: Real Time Physics Simulation.” http://bulletphysics.org.
 [5] Roboti LLC, “MuJoCo Physics Engine.” http://mujoco.org.
 [6] J. Lee, M. X. Grey, S. Ha, T. Kunz, S. Jain, Y. Ye, S. S. Srinivasa, M. Stilman, and C. K. Liu, “DART: Dynamic animation and robotics toolkit,” The Journal of Open Source Software, 2018.
 [7] I. Akhter and M. J. Black, “Poseconditioned joint angle limits for 3D human pose reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1446–1455, 2015.
 [8] M. Kozanek, S. Wang, P. G. Passias, Q. Xia, G. Li, C. M. Bono, K. B. Wood, and G. Li, “Range of motion and orientation of the lumbar facet joints in vivo,” Spine, vol. 34, no. 19, pp. E689–E696, 2009.
 [9] R. S. Snell, Clinical Anatomy by Systems, pp. 427–428. Lippincott Williams & Wilkins, 2007.
 [10] J. Chen, S. Nie, and Q. Ji, “Datafree prior model for upper body pose estimation and tracking,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 4627–4639, 2013.
 [11] J. M. Rehg, D. D. Morris, and T. Kanade, “Ambiguities in visual tracking of articulated objects using two and threedimensional models,” The International Journal of Robotics Research, vol. 22, no. 6, pp. 393–418, 2003.
 [12] C. Sminchisescu and B. Triggs, “Estimating articulated human motion with covariance scaled sampling,” The International Journal of Robotics Research, vol. 22, no. 6, pp. 371–391, 2003.
 [13] D. Demirdjian, “Enforcing constraints for human body tracking,” in Computer Vision and Pattern Recognition Workshop, 2003., vol. 9, pp. 102–102, IEEE, 2003.
 [14] K. Grochow, S. L. Martin, A. Hertzmann, and Z. Popović, “Stylebased inverse kinematics,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 522–531, 2004.
 [15] P. Zhang, K. Siu, J. Zhang, C. K. Liu, and J. Chai, “Leveraging depth cameras and wearable pressure sensors for fullbody kinematics and dynamics capture,” ACM Transactions on Graphics (TOG), vol. 33, no. 6, p. 221, 2014.
 [16] J. Chai and J. K. Hodgins, “Constraintbased motion optimization using a statistical dynamic model,” ACM Transactions on Graphics (TOG), vol. 26, no. 3, p. 8, 2007.
 [17] J. Min, Y.L. Chen, and J. Chai, “Interactive generation of human animation with deformable motion models,” ACM Transactions on Graphics (TOG), vol. 29, no. 1, p. 9, 2009.
 [18] M. Brand and A. Hertzmann, “Style machines,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 183–192, 2000.
 [19] E. Brau and H. Jiang, “A bayesian partbased approach to 3D human pose and camera estimation,” Pattern Recognition (ICPR), 2016 23rd International Conference on, pp. 1762–1767, 2016.
 [20] L. Herda, R. Urtasun, and P. Fua, “Hierarchical implicit surface joint limits for human body tracking,” Computer Vision and Image Understanding, vol. 99, no. 2, pp. 189–209, 2005.
 [21] E. Todorov, “Convex and analyticallyinvertible dynamics with contacts and constraints: Theory and implementation in MuJoCo,” Robotics and Automation (ICRA), 2014 IEEE International Conference on, pp. 6054–6061, 2014.
 [22] The TensorFlow Developers, “TensorFlow: Largescale machine learning on heterogeneous systems.” https://tensorflow.org/.
 [23] tinydnn Authors, “tinydnn: Header only, dependencyfree deep learning framework in C++14.” http://tinydnn.readthedocs.io.