Robot selfcalibration using multiple kinematic chains – a simulation study on the iCub humanoid robot
Abstract
Mechanism calibration is an important and nontrivial task in robotics. Advances in sensor technology make affordable but increasingly accurate devices such as cameras and tactile sensors available, making it possible to perform automated selfcontained calibration relying on redundant information in these sensory streams. In this work, we use a simulated iCub humanoid robot with a stereo camera system and endeffector contact emulation to quantitatively compare the performance of kinematic calibration by employing different combinations of intersecting kinematic chains—either through selfobservation or selftouch. The parameters varied were: (i) type and number of intersecting kinematic chains used for calibration, (ii) parameters and chains subject to optimization, (iii) amount of initial perturbation of kinematic parameters, (iv) number of poses/configurations used for optimization, (v) amount of measurement noise in endeffector positions / cameras. The main findings are: (1) calibrating parameters of a single chain (e.g. one arm) by employing multiple kinematic chains (“selfobservation” and “selftouch”) is superior in terms of optimization results as well as observability; (2) when using multichain calibration, fewer poses suffice to get similar performance compared to when for example only observation from a single camera is used; (3) parameters of all chains (here 86 DH parameters) can be subject to calibration simultaneously and with 50 (100) poses, endeffector error of around 2 (1) mm can be achieved; (4) adding noise to a sensory modality degrades performance of all calibrations employing the chains relying on this information.
Humanoid robots, calibration and identification, force and tactile sensing, kinematics, optimization and optimal control.
1 Introduction
\IEEEPARstartRobots performing manipulation tasks rely on models of their bodies and their success is largely determined by their accuracy. However, inaccuracies creep in many ways as for example in the assembly process, in mechanical elasticity, or simply because of cheap design of components. Therefore, the actual model parameters of every robot exemplar have to be found by means of a calibration procedure, usually relying on external metrology systems. For kinematic calibration, such apparatuses can measure one or more of the components of the endeffector pose employing mechanical, visual, or laser systems (see [1] for a survey). Different arrangements have different accuracy, requirements on the environment, and cost. These conditions have to be present for recalibration to be performed.
Current trends in the robotics industry make classical calibration procedures less practical: with the advent of the socalled “collaborative robots”, for example, the machines are becoming cheaper, lightweight, compliant, and they are being deployed in more versatile ways according to the needs of customized production of smaller batches rather than being fixed in a single production line for their entire lifetime. All these factors increase the need for calibration to be performed more frequently. At the same time, the machines, including home and service robots, often come with richer sets of powerful sensory devices that are affordable and not difficult to operate. Both these trends speak for alternative solutions to the selfcalibration problem that are more “selfcontained” and can be performed autonomously by the robot.
Hollerbach et al. [1] classify different calibration methods into openloop—where one or more of the components of the endeffector pose is measured employing mechanical, visual, or laser systems—and closedloop where physical constraints on the endeffector position or orientation can substitute for measurements. Observing the endeffector—or in general any other points on the kinematic chain—using a camera falls into the openloop calibration family, although components of the endeffector pose can be observed only indirectly through projection into the camera frame. Selftouch configurations employing two arms of the humanoid robot could be framed as a constraint if contact measurement only (e.g. from force/torque sensors) was available and hence treated as closedloop. In this work, we follow up on [2] and emulate sensitive skin measurements, which provide the position of contact (and hence fit more naturally with openloop calibration).
Our work is a simulation study that draws on calibration in the real world—like different approaches to kinematic calibration of the iCub humanoid robot relying on selfobservation [3, 4] and selftouch [2]. Using the model of the robot with identical parameters, but exploiting the fact that we have complete knowledge of the system and capacity to emulate different levels of model perturbation and measurement noise, our goal is to get insights into the pros and cons of different optimization problem formulations. In particular, we study how the calibration performance is dependent on the type and number of intersecting kinematic chains, the number of parameters calibrated, number of robot configurations, and the measurement noise. Accompanying video is available here https://youtu.be/zP3c7Eq8yVk and dataset at [5].
This article is structured as follows. Related work is reviewed in the next section, followed by Materials and Methods, Data Acquisition and Description, and Simulation Results. We close with a Discussion and Conclusion.
2 Related Work
We focus on humanoid robots or humanoidlike setups with many Degrees of Freedom (DoF) of two arms that can possibly selftouch, equipped with cameras and tactile or inertial sensors. These are challenging setups for calibration but they create new opportunities for automated selfcontained calibration based on closing kinematic loops by touch (selfcontact) and vision.
Most often, the loops are closed through selfobservation of the endeffector using cameras located in the robot head (openloop calibration method per [1]). Hersch et al. [6] and MartinezCantin et al. [7] present online methods to calibrate humanoid torso kinematics relying on gradient descent and recursive least squares estimation, respectively. The iCub humanoid was employed in [3, 4]. Vicente et al. [4] used a model of the hand’s appearance to estimate its 6D pose and used that information to calibrate the joint offsets. Fanello et al. [3] had the robot observe its fingertip and learned essentially a single transformation only to account for the discrepancy between forward kinematics of the arm and the projection of the finger into the cameras.
Next to cameras, inertial sensors also contain information that can be exploited for calibration. Kinematic calibration was shown exploiting 3axis accelerometers embedded in the artificial skin modules distributed on robot body [8, 9] or in the control boards on the iCub [10] or CMU/Sarcos [11].
The advent of robotic skin technologies [12, 13] opens up the possibility of a new family of approaches, whereby the chain is closed through contact like in closedloop calibration, but the contact position can be extracted from the tactile array. Roncone et al. [2] showed this on the iCub robot that performs autonomous selftouch using a finger with sensitive fingertip to touch the skinequipped forearm of the contralateral arm; Li et al. [14] employed a dual KUKA arm setup with a sensorized “finger” and a tactile array on the other manipulator. Forward kinematics together with skin calibration provide contact position that can then be used for robot kinematic calibration. In this sense, the skin provides a pose measurement rather than constraint and as such, this may fall under openloop calibration. In this way, one arm of a humanoid can be used to calibrate the other. Khusainov et al. [15] exploit this principle using an industrial manipulator to calibrate the legs of a humanoid robot. Another variant is exploiting the sensitive fingertips to touch a known external surface [16].
Birbach et al. [17] were to our knowledge the only ones to employ truly “multisensorial” or “multimodal” calibration. Using the humanoid robot Justin observing its wrist, the error functions comparing the wrist’s position from forward kinematics with its projection into the left and right camera images, Kinect image, and Kinect disparity, together with an inertial term, were aggregated into a single cost function to be minimized. It is claimed that while pairwise calibration can lead to inconsistencies, calibrating everything together in a “mutually supportive way” is most efficient.
In this work, we compare calibration through selfobservation (with projection into cameras) and calibration through selftouch and the effect of their synergy. Our work makes a unique contribution, also compared to [17] who, first, employ essentially only “handeye” kinematic chains terminating in different visionlike sensors in the robot head, and, second, consider only the case where all chains are combined together using a single cost function.
3 Materials and Methods
3.1 iCub robot kinematic model and camera parameters
In this work, we use the upper body of the iCub humanoid robot (see Fig. 1) and its kinematic model expressed in the DenavitHartenberg convention, where every link is described by 4 parameters: . In this platform, all joints are revolute. We will consider several kinematic chains: all start in a single inertial or base frame—denoted iCub Root Reference Frame here. For every chain, the DH parameters uniquely define a chain of transformation matrices from the inertial frame to the endeffector. The position and orientation of the endeffector in the Root frame is thus given by where the homogeneous transformation matrices can be constructed from the DH representation and are current joint angles of the robot actuators.
The links are schematically illustrated in Fig. 1. iCub kinematics version 1 was used [18] with the following modification: the Root was moved from the waist area to the third torso joint, which is the new inertial frame for our purposes.
The four chains under consideration are:

Left Arm (LA). DH parameters in Table 1. Short names to denote the links/joints: RoottoLAshoulder, LA Shoulder Pitch, LA Shoulder Roll, LA Shoulder Yaw, LA Elbow, LA Wrist Prosup (for pronosupination), LA Wrist Pitch, LA Wrist Yaw.

Right Arm (RA). DH parameters analogous to LA (see [18]). Link/joint names: RoottoRAshoulder, RA Shoulder Pitch, RA Shoulder Roll, RA Shoulder Yaw, RA Elbow, RA Wrist Prosup, RA Wrist Pitch, RA Wrist Yaw.

Left Eye (LEye). DH parameters in Table 2. Link/joint names: Roottoneck, Neck Pitch, Neck Roll, Neck Yaw, Eyes Tilt, Left Eye Pan.

Right Eye (REye). DH parameters different than LEye in Table 3. Link/joint names: Roottoneck, Neck Pitch, Neck Roll, Neck Yaw, Eyes Tilt, Right Eye Pan.
Links or parameters not subject to calibration are showed shaded in grey in the corresponding tables. The first link always originates in the Root frame and is fixed in all chains (the torso joint is not moving) and is also excluded from calibration. The alpha parameter of the last link in the arm chains is also not being calibrated as it is not observable because we observe only position and not the orientation of the endeffectors. The right arm chain is further extended with a fixed transform from the endeffector in the palm to the tip of the index finger—not subject to calibration. The eye chains differ in the last link only.
Link(i)  a(i) [mm]  d(i) [mm]  [rad]  [rad] 

1  23.36  143.3  
2  0  107.74  
3  0  0  
4  15  152.28  
5  15  0  0  
6  0  137.3  
7  0  0  
8  62.5  16  0  0 
Link(i)  a(i) [mm]  d(i) [mm]  [rad]  [rad] 

1  2.31   193.3  
2  33  0  
3  0  1  
4   54  82.5  
5  0   34  
6  0  0 
Link(i)  a(i) [mm]  d(i) [mm]  [rad]  [rad] 

5  0  34  
6  0  0 
The camera intrinsic parameters were taken from the real robot cameras and were not subject to calibration: resolution x , focal length , ..
3.2 Optimization problem formulation
By calibration we mean estimation of the parameter vector with , where is a set of indices identifying individual links; , and are the first three parameters of the DH formulation and the offset that specifies the positioning of the encoders on the joints with respect to the DH representation. We often estimate a subset of these parameters only, assuming that the others are known. This subset can for example consist of a subset of links (e.g., only parameters of one arm are to be calibrated) or a subset of the parameters (e.g., only offsets are to be calibrated—sometimes dubbed “daily calibration” [19]).
The estimation of the parameter vector is done by optimizing a given objective function:
(1) 
where is the number of robot configurations and corresponding endeffector positions used for calibration (hereafter, often referred to as “poses” for short), is a real (observed) endeffector position, is an estimated endeffector position computed using forward kinematic function for a given parameter estimate and joint angles from joint encoders . For chains involving cameras, the reprojection error is used instead, as described in the next section.
3.3 Kinematic chain calibration
We study different combinations of intersecting chains and their performance in calibrating one another.
3.3.1 Two arms chain (LARA)
This corresponds to the selftouch scenario, with touch occurring directly at the endeffectors (the right arm endeffector being shifted from palm to tip of index finger using a fixed transform). The newly established kinematic chain for upper body includes both arms while head and eyes are excluded. To optimize parameters describing this chain, we minimize the distance between estimated positions in 3D space of left and right arm endeffectors. In this case, the parameter vector consists of the following parameters: , where and are parameters corresponding to the robot right and left arm, respectively. The objective function to be optimized is:
(2) 
where is the number of poses used for calibration, and are the ^{th} estimated endeffector positions in the Root frame for the right and left arm respectively, computed using a given parameter estimate and joint angles from joint encoders .
3.3.2 Hand to eye chains (LALEye, LAREye, RALEye, RAREye)
To predict position of the endeffector in each of the robot cameras (similar to [17]), the estimated endeffector position, , is given by a current hypothetical robot calibration of the parameter vector and is computed via forward kinematics. is then mapped to left camera coordinates () using a transformation matrix . Then we use a pinhole camera model to transform the 3D point () into image coordinates ():
(3) 
where , are focal lengths of the camera. Radial distortion of cameras was not considered.
This approach doesn’t require information from both eyes and enables us to estimate only one side of the robot body (e.g. parameters of the left arm and left eye). For example, the estimated parameter vector in the case of the kinematic chain connecting left arm and left eye consists of the following parameters: , where and are parameters corresponding to the robot left arm and to the left eye, respectively. The objective function is then defined as:
(4) 
where is the ^{th} 2D position of the estimated left arm endeffector projected to left eye image coordinates and is the th 2D position of the observed left arm endeffector in the left camera. For two arms and two eyes we get four possible combined chains: left/right arm to right/left eye. Since the results are similar due to symmetry, we present in the experimental section results only for the Left arm  Left eye (LALEye) chain.
3.3.3 Combining multiple chains (LARALEye, LARALEyeREye)
In order to estimate all kinematic parameters of the robot, we can take advantage of combining some or all of the above mentioned kinematic chains. For example, in the case that we combine LARA, LALEye and LAREye chains together into LARALReye, the estimated parameter vector consists of the following parameters: , where , , , and are parameters corresponding to the left arm, right arm, right eye, and left eye, respectively. The objective function is in this case defined as:
(5) 
where is the number of poses (configurations) used for calibration, and are the ^{th} estimated endeffector positions in the Root frame for the right and left arm, respectively. These are computed using a given parameter estimate and joint angles from joint encoders . Values and are the ^{th} positions of the estimated left arm endeffector projected to left eye and right eye image coordinates, respectively, and and are the ^{th} 2D position (pixel coordinates) of the left arm endeffector observed in the left and right eye/camera, respectively (variables , , and correspond to the right arm). Since the cost function contains both 3D and reprojection errors, the distances in space were multiplied by a coefficient determined from the intrinsic parameters of cameras and distance of the endeffector from the eye: .
3.4 Nonlinear least squares optimization
The objective functions (Eqs. [1] [5]) defined for the optimization problem described in Section 3.2 are of the leastsquares form and therefore can be minimized by LevenbergMarquardt algorithm for nonlinear least squares optimization (we used MATLAB implementation of the algorithm, same as in [17]). This iterative local algorithm performs minimization of a nonlinear objective function by linearizing it at the current estimate every iteration. It interpolates between the GaussNewton and gradient descent method, combining advantages of both.
3.5 Error metrics
For comparing the results achieved for individual settings, we make use of the following error metrics:
3.5.1 Cartesian error between poses (position)
Cartesian position error between two generic poses, A and B, where and are 3D Cartesian positions of the endeffector, is defined as:
(6) 
We evaluate the Cartesian error over the set of testing poses, which are selected as described in the section 4.2.
3.5.2 Quality of estimated parameters
For each estimated parameter we compute the mean difference () of the estimated parameter from the target parameter value (averaged over repetitions of the experiment):
(7) 
as well as standard deviation of the parameter.
4 Data acquisition and description
4.1 Pose set generation
With the goal of comparing different calibration methods on a humanoid robot, we chose a dataset where the two arms of the robot are in contact—thereby physically closing the kinematic chain through selftouch. At the same time, the robot gazes at the contact point (selfobservation). The points were chosen from a cubic volume in front of the robot. For each target, using the Cartesian solver and controller [20], the iCub moves the left hand with endeffector in the palm to the specified point. Then it moves the right hand, with endeffector in the tip of the index finger, to the same point, with the additional constraint that the finger can be at most away from the perpendicular direction of the palm. 5055 points and corresponding joint configurations were thus generated, with a difference on left and right effector position in every configuration of maximum —see Fig. 2, right. The gaze controller [21] was used to command the neck and eyes of the robot to gaze at the same target (code and video can be accessed at [22]). The full dataset thus consists of 5055 data vectors composed of target point coordinates (), corresponding right arm and left arm endeffector positions (, ), and joint angles for every joint of the torso, arms, neck, and eyes (). Note that the solvers work with a given tolerance and hence .
This way of dataset generation draws on previous work [2] and is hence feasible on the real robot provided sufficient quality of the initial model. Li et al. [14] provide an alternative control method: “tactile servoing”. The robot could be also manipulated into the desired configurations while in gravity compensation mode.
4.2 Training and testing dataset
We had 5055 configurations with . The error will at the same time constitute the lower bound on the maximum achievable calibration accuracy using the closure of the kinematic chain through selftouch. For the case of loop closure through the cameras, we employ the neck and eye joint values obtained from the solver in the simulator but reproject the endeffector positions directly and accurately into the cameras simulated in Matlab. The 5055 data points were further divided into training and testing datasets in the following way: out of 4755 poses are used as a training set on which the optimization process is performed (with a subset of 10, 20, 50, or 1000 poses chosen at random in different experiments) and 300 poses are used for testing purposes. Fig. 2, left, shows the distribution of joint values for individual joints in the dataset—this may impact the identifiability of individual parameters.
4.3 Measurement error
Measurement noise with a Gaussian distribution was added motivated by the sensory accuracy in the real robot. Since distance between individual taxels on the real iCub sensitive skin is around 5 mm, we decided to use Gaussian noise with zero mean and for touch as a baseline. For cameras, we introduce a 5px error (Gaussian noise with zero mean and px), inspired by the setup in [3] where the iCub is detecting its fingertip in the camera frame. These errors are used in all experiments in the Simulation results section if not stated otherwise. In Section 5.3 we evaluate how changing the size of these measurement errors affects the resulting accuracy of endeffector position detection for individual chains.
4.4 Perturbation of the initial parameters estimate
To evaluate the dependence of the optimization performance on the quality of the initial estimates of the parameters, we perturbed all estimated parameters by a perturbation factor . We perturbed all initial offset values as follows:
(8) 
It is reasonable to expect that the remaining DH parameters (, , and ) will be in general more accurate as they can be extracted from CAD models and there is no moving part and no encoder involved. Therefore, their perturbation was chosen as follows:
(9) 
5 Simulation Results
In this section we show the calibration results. We evaluated our approach using both error of the endeffector position—the cost function optimized (or distance in camera frame for projections into eyes)—as well as error in individual parameters (vs. their correct values). We compared kinematic chains used for calibration, number of free parameters which were estimated by the optimization process, different perturbation factor on individual parameters, number of training poses (data points), as well as measurement noise levels. Performance is always is evaluated on the testing dataset.
5.1 Results for different chain combinations and number of training poses
Fig. 3 (top) shows the performance in terms of endeffector position estimation when DH parameters of the left arm (LA) chain are calibrated, utilizing different kinematic chain combinations: The “selfobservation” from a single camera (LALEye) and “selftouch” only (LARA) are outperformed by “stereo selfobservation” (LALREye) and all the chains together provide the best results (LARALREye). Clearly, more training poses (50 vs. 20) improve calibration results; 1000 poses should be sufficient to reach an optimal value and serve as a lower bound on the error. The effect of initial parameter perturbation factor is also shown; for all perturbation levels, the performance is stable (low error variance).
In Fig. 3 (bottom) only the largest “multichain” LARALREye is employed for training but the chains whose parameters are subject to calibration are varied. The error of endeffector position estimation is increasing with higher number of parameters estimated; however, even if parameters of all chains (86 DH parameters) are perturbed and subject to calibration simultaneously, endeffector error of around 2 (1) can be achieved with 50 (100) poses.
To investigate the distribution of errors for individual chains, we examined error residuals for every testing pose. For a higher number of training poses, error residuals have a zero mean and Gaussian distribution. For lower number of poses (especially for higher perturbation), the residuals are bigger and skewed and the resulting calibration also strongly depends on initialization. In Fig. 4, the endeffector error residuals for perturbation factor are shown for their and coordinates (other 2D projections were qualitatively similar)—for different chains and different number of training poses.
5.2 Observability analysis of individual chains
We conducted an observability analysis using Singular Value Decomposition (SVD) of the identification Jacobian matrix , where is the number of configurations in the training pose set and , is the parameter to be estimated, denotes the error between the real/observed () and estimated () value of the ^{th} coordinate in the given chain.^{1}^{1}1 E.g., for LALEye, corresponds to 2 errors: error on the coordinate and as a reprojection of the endeffector position into the cameras; for LARA chain, will correspond to 3 numbers: distance in x, y and z coordinate between right () and left arm () endeffector 3D positions. The Jacobian matrix represents the sensitivity of endeffector positions or their camera reprojections to the change of individual parameters. Using SVD, we can obtain a vector of singular numbers . Comparison of the obtained singular numbers for individual chains for the task of estimating all DH parameters of the left arm (using same training pose set) can be seen in Fig. 5. We also evaluated observability indices [23] and [24] (performance of observability indices for industrial robot calibration was evaluated by Joubair [25]). index is defined as: , where is the number of independent parameters to be identified, is the ^{th} singular number, and is the number of calibration configurations. Index is defined as: . See Fig. 5 (bottom panels). Te chain LALEye for 10 poses has very low observability caused by not full rank Jacobian (we have 24 parameters to estimate but only 20 equations). The highest observability is achieved in all cases for the largest chain LARALREye, where the information from touch and both cameras was used.
5.3 Evaluation of error based on measurement noise
We evaluated the effect of measurement noise in individual sensors (touch, cameras) on the accuracy of endeffector position error on the testing data set—see Fig. 6. With same error in pixels on cameras and in on “touch sensors” (first two columns – /, /), LALREye chain (both eyes, no touch) and LARALREye (both eyes and touch) have smallest final endeffector errors, for the “multichain” even smaller. When error on cameras increases (, , ), the camera chains (LALEye, LALREye) are affected whereas the performance of the chain with touch (LARALREye) is not degraded. Conversely, more error on “touch” (, , ) impacts the “touch only” chain (LARA), but the LARALREye remains robust.
5.4 Quality of DH parameter estimates
To get further insight and take advantage of the simulation study where we have access to ground truth values of all parameters, we also studied whether the optimization based on endeffector error also leads to correct estimates of all DH parameters—focusing on the left arm (LA) chain.
Fig. 7 shows the results for all estimated parameters when the LARA (“selftouch”) chain was used for calibration, using different number of training poses. The errors on the length parameters (top panel) are on average distributed between approx. 1 and 10 . For the angular quantities, it is in the to range for the proximal joints.
Finally, having showed above that the “selftouch and selfobservation” (LARALREye) chain slightly outperforms the “stereo selfobservation” only chain (LALREye) (Fig. 3 top, Fig. 6), also in observability (Fig. 5), here in Fig. 8 we can observe a similar trend in the estimated parameters of the LA chain against their ground truth values. The parameter estimates obtained from LARALREye are significantly better for for all joints except for wristPr and elbow and for for all shoulder joints. The other parameters estimates are comparable. The wrist joint calibration seems to be sensitive on the selection of training poses and will need further study.
6 Discussion and Conclusion
We quantitatively and systematically investigated the potential of automatic selfcontained kinematic calibration (DH parameters including camera extrinsic parameters) of a humanoid robot employing different kinematic chains—in particular relying on selfobservation and selftouch. The parameters varied were: (i) type and number of intersecting kinematic chains used for calibration, (ii) parameters and chains subject to optimization, (iii) amount of initial perturbation of kinematic parameters, (iv) number of poses/configurations used for optimization, (v) amount of measurement noise in endeffector positions / cameras. We also tracked the computation time and while the details differ depending on the settings (chain calibrated, number of poses, etc.), a typical optimization run would not take more than tens of seconds on an older laptop PC. Next to results w.r.t. the cost function itself (error on endeffector or camera reprojection) a number of additional analyses were performed including error residuals, errors on estimated parameters compared to ground truth, and observability analysis.
While some results were expected (such as improvement when more configurations are added or poor performance when using selfobservation from a single camera), the most notable findings are: (1) calibrating parameters of a single chain (e.g. one arm) by employing multiple kinematic chains (“selfobservation” and “selftouch”) is superior in terms of optimization results (Fig. 3 top) as well as observability (Fig. 5); (2) when using multichain calibration, fewer poses suffice to get similar performance compared to when e.g. only observation from a single camera is used (Fig. 3 top); (3) parameters of all chains (here 86 DH parameters) can be subject to calibration simultaneously and with 50 (100) poses, endeffector error of around 2 (1) mm can be achieved (Fig. 3 bottom); (4) adding noise to a sensory modality degrades performance of all calibrations employing the chains relying on this information (Fig. 6). The last point is interesting to discuss in relation to Birbach et al. [17] who put forth the hypothesis that calibrating multiple chains simultaneously is superior to pairwise sequential calibration. Our results support this provided that measurement noise is small. Instead, if a certain modality is noisy, it may be beneficial to preferentially employ chains that rely on more accurate measurements first and then calibrate a “noisy chain” in a second step.
We have only reported results from simulation, however, we claim that this was the right tool for this type of investigation. At the same time, our setup and choice of parameters was drawing on experiments performed in the real robot—selftouch [2] and selfobservation [3, 4] in particular—which makes the results grounded in a real setting and should inform future experimentation on the iCub. The method to combine chains and analyze the results presented here can be transferred to other platforms as well.
There are several aspects that we want to further investigate in the future. First, we note that while we did control for the angle between the palm and the contralateral finger for selftouch in the dataset generation, we did not monitor whether the contact point would be also visible. Additional analyses revealed that the contact point would not be occluded and hence be visible by both cameras in 35% of the poses and by one of the cameras in 53%. We recomputed the observability with this subset of the dataset only and found no decrease. In the future, configurations with occlusions should be excluded from dataset generation. Second, we found that around 50 configurations (data points) suffice for reasonable calibration. Finding the optimal subset of not more than 10 configurations would be desirable, such that recalibration can be performed rapidly. Here, clever pose selection will be necessary to warrant adequate and stable performance. Third, the information from the two cameras can be used to reproject observed position of the endeffector in image coordinates of both eyes (pixel ) to 3D space () (similar to [3, 26])—leading onto yet another formulation of the optimization problem. Fourth, our investigation can be extended considering also the contribution of inertial sensors—in the robot head [17] or distributed on the robot body [10, 8]. Fifth, the present method can be compared with filtering approaches [4, 16] or with methods that pose fewer assumptions on the initial model available (e.g., [27]). Finally, the selftouch scenario can be also turned around from using a tactile array to calibrate kinematics [2, 14] to calibrating the skin itself [28].
Acknowledgment
We thank Alessandro Roncone for assistance with the models of the iCub robot in MATLAB and the source files leading to Fig. 1 left and Ugo Pattacini for discussions, tips, and assistance with the use of Cartesian solvers leading to the generation of selftouch configurations.
References
 [1] J. Hollerbach, W. Khalil, and M. Gautier, “Model identification,” in Springer Handbook of Robotics, 2nd ed., B. Siciliano and O. Khatib, Eds. Springer, 2016, pp. 113–138.
 [2] A. Roncone, M. Hoffmann, U. Pattacini, and G. Metta, “Automatic kinematic chain calibration using artificial skin: selftouch in the iCub humanoid robot,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, 2014, pp. 2305–2312.
 [3] S. R. Fanello, U. Pattacini, I. Gori, V. Tikhanoff, M. Randazzo, A. Roncone, F. Odone, and G. Metta, “3D stereo estimation and fully automated learning of eyehand coordination in humanoid robots,” in 2014 IEEERAS Int. Conf. on Humanoid Robots  HUMANOIDS ’14, 2014.
 [4] P. Vicente, L. Jamone, and A. Bernardino, “Online body schema adaptation based on internal mental simulation and multisensory feedback,” Frontiers in Robotics and AI, vol. 3, p. 7, 2016.
 [5] 2019 (Project webpage). [Online]. Available: https://sites.google.com/site/karlastepanova/robotselfcalibrationicub
 [6] M. Hersch, E. Sauser, and A. Billard, “Online learning of the body schema,” International Journal of Humanoid Robotics, vol. 5, pp. 161–181, 2008.
 [7] R. MartinezCantin, M. Lopes, and L. Montesano, “Body schema acquisition through active learning,” in Proc. Int. Conf. on Robotics and Automation (ICRA), 2010.
 [8] P. Mittendorfer and G. Cheng, “Openloop selfcalibration of articulated robots with artificial skins,” in Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012, pp. 4539–4545.
 [9] E. DeanLeon, K. RamirezAmaro, F. Bergner, I. Dianov, and G. Cheng, “Integration of robotic technologies for rapidly deployable robots,” IEEE Transactions on Industrial Informatics, vol. 14, no. 4, pp. 1691–1700, 2018.
 [10] N. Guedelha, N. Kuppuswamy, S. Traversaro, and F. Nori, “Selfcalibration of joint offsets for humanoid robots using accelerometer measurements,” in Humanoid Robots (Humanoids), 2016 IEEERAS 16th International Conference on. IEEE, 2016, pp. 1233–1238.
 [11] K. Yamane, “Practical kinematic and dynamic calibration methods for forcecontrolled humanoid robots,” in Humanoid Robots (Humanoids), 2011 11th IEEERAS International Conference on. IEEE, 2011, pp. 269–275.
 [12] C. Bartolozzi, L. Natale, F. Nori, and G. Metta, “Robots with a sense of touch,” Nature materials, vol. 15, no. 9, pp. 921–925, 2016.
 [13] R. S. Dahiya and M. Valle, Robotic Tactile Sensing. Springer, 2013.
 [14] Q. Li, R. Haschke, and H. Ritter, “Towards body schema learning using training data acquired by continuous selftouch,” in Humanoid Robots (Humanoids), 2015 IEEERAS 15th International Conference on. IEEE, 2015, pp. 1109–1114.
 [15] R. Khusainov, A. Klimchik, and E. Magid, “Humanoid robot kinematic calibration using industrial manipulator,” in Mechanical, System and Control Engineering (ICMSC), 2017 International Conference on. IEEE, 2017, pp. 184–189.
 [16] R. Zenha, P. Vicente, L. Jamone, and A. Bernardino, “Incremental adaptation of a robot body schema based on touch events,” in Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDLEpiRob), 2018.
 [17] O. Birbach, U. Frese, and B. Bäuml, “Rapid calibration of a multisensorial humanoidâs upper body: An automatic and selfcontained approach,” The International Journal of Robotics Research, vol. 34, no. 45, pp. 420–436, 2015.
 [18] 2017. [Online]. Available: http://wiki.icub.org/wiki/ICubForwardKinematics
 [19] K. Nickels, “HandEye calibration for Robonaut,” NASA Summer Faculty Fellowship Program Final Report, Tech. Rep., 2003.
 [20] U. Pattacini, F. Nori, L. Natale, G. Metta, and G. Sandini, “An experimental evaluation of a novel minimumjerk Cartesian controller for humanoid robots,” in Proc. IEEE/RSJ Int. Conf. Int. Robots and Systems (IROS), 2010.
 [21] A. Roncone, U. Pattacini, G. Metta, and L. Natale, “A Cartesian 6DoF gaze controller for humanoid robots,” in Robotics: Science and Systems, vol. 2016, 2016.
 [22] 2017. [Online]. Available: https://github.com/matejhof/icubselftouchwithgazegenerator
 [23] J.H. Borm and C.H. Menq, “Experimental study of observability of parameter errors in robot calibration,” in Robotics and Automation, 1989. Proceedings., 1989 IEEE International Conference on. IEEE, 1989, pp. 587–592.
 [24] A. Nahvi and J. M. Hollerbach, “The noise amplification index for optimal pose selection in robot calibration,” in Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, vol. 1. IEEE, 1996, pp. 647–654.
 [25] A. Joubair, A. Tahan, and I. A. Bonev, “Performances of observability indices for industrial robot calibration,” in Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016, pp. 2477–2484.
 [26] H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on pattern analysis and machine intelligence, vol. 30, no. 2, pp. 328–341, 2008.
 [27] P. Lanillos and G. Cheng, “Adaptive robot body learning and estimation through predictive coding,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, 2018.
 [28] A. Albini, S. Denei, and G. Cannata, “Towards autonomous robotic skin spatial calibration: A framework based on vision and selftouch,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. IEEE, 2017, pp. 153–159.