Unsupervised preprocessing for Tactile Data
Tactile information is important for gripping, stable grasp, and in-hand manipulation, yet the complexity of tactile data prevents widespread use of such sensors. We make use of an unsupervised learning algorithm that transforms the complex tactile data into a compact, latent representation without the need to record ground truth reference data. These compact representations can either be used directly in a reinforcement learning based controller or can be used to calibrate the tactile sensor to physical quantities with only a few datapoints. We show the quality of our latent representation by predicting important features and with a simple control task.
- stochastic gradient variational Bayes
- variational inference
- KL divergence
- Kullback-Leibler divergence
- variational auto-encoder
Tactile sensors are essential for proper in-hand manipulation, and many other tasks where fingers have to grasp, hold, and handle objects. Yet tactile sensor data is not easy to process. The sensors are often deformable, while the data is high-dimensional, very nonlinear, and difficult to relate to physical properties such as grip force or object shape. Their soft nature creates highly correlated data since stimulation of the sensor array activates nearby sensor points.
We therefore propose to use unsupervised learning techniques for transforming the tactile information into a compact, common space, independent of the physical properties of the tactile sensor. Contrary to standard calibration procedures , these algorithms do not need any ground truth like calibration does and are able to pre-process the tactile data and transform it into a decorrelated compressed format. This empowers their use in various control tasks as well as as measurement device for other applications.
I-a Related Work
There is a reasonable body of existing literature on extracting features from tactile sensors. In , the physical quantities force vectors, curvature, and point of contact are extracted. For this supervised task ground-truth was recorded. The authors also had a first take on supervised preprocessing using ICA and PCA. They also questioned whether raw data should be used instead of tactile sensors calibrated to ground truths.
In  external stimuli like point of contact or forces and torques were applied and recorded together with the tactile data. The paper describes the use of supervised machine learning algorithms to predict those stimuli from tactile data. The results show that the sensors can measure physical quantities roughly similar to those humans can.
Such physical quantities, supervisedly extracted from tactile sensors, were used in  and  for control.  designed a pipeline for preprocessing tactile data from a BioTac tactile sensor. Each step in the pipeline is carefully designed, implemented and requires manual tuning. All of those applications of tactile sensors used mostly supervised methods including calibration. These type of methods feature several problems which are discussed in the following.
Another problem with tactile sensors is that their sensor information is the result of several external modalities.  recorded micro vibrations from sliding over silk, suede and sandpaper surfaces and showed that different force levels changes the spectrum significantly. The two informations surface type and force can therefore not be separated easily. This shows that learning only the mappings from raw tactile data to those ground truths in a supervised way will lead too a lot of loss of information.
I-B Unsupervised Learning
Calibrating tactile sensors in this supervised way poses several problems. The procedure of recording the labelled data needs to be redone for each individual sensor and task and is very time-consuming. Another problem lies in the choice of ground truth applied to the sensor as one might miss important features if they are not carefully chosen. Unsupervised learning algorithms might be able to solve both problems. It does not require any ground truth and can find intrinsic structures in the tactile data itself. We are looking for models creating a mathematically compact or dimensionally-reduced representation of the data. This representation is more suited for control algorithms than the raw, high-dimensional, nonlinear tactile data. In the case where a calibration to physical values is still required, only a small number of labelled data is needed to find the relation between the compressed representation and the physical ground truth.
Our approach is based on graphical models with latent variables. These latent variables represent the compact representation and can be found by probabilistic inference.
Ii Variational Auto-Encoder
Ii-a Linear latent models
Latent variable models offer a mathematically well-founded framework for the extraction of features from data. Sparse coding, Independent component analysis (ICA) and principal component analysis (PCA) are linear variants thereof and can be formulated as solutions to the optimisation of the marginal likelihood of the data . The observations are conditioned on the latent variables , which are subsequently marginalised out:
When the aforementioned methods are cast into this framework, approximations are typically used, such as maximum a posteriori inference for the latent variables in sparse coding.
A drastic limitation of linear latent variable models is the fact that depends only linearly on , e.g. , where is a matrix, a vector offset and i.i.d. noise variables. Arguably, these models are unable to perform nonlinear transformations of the data. In settings where complex sensors (e.g., an array of spherically arranged sensors) or nonlinear relationships between physics and sensor are involved, this is clearly not sufficient. Representing the observations through a non-linear transformation of the latent variables , i.e. is appealing but challenging. We will review a method for that case in the next section.
Ii-B Stochastic Gradient Variational Bayes
Recently, an efficient method to estimate such nonlinear functions called stochastic gradient variational Bayes (SGVB) has been proposed [8, 9]. In SGVB, the latent variables have to be continuous random variables, and are typically chosen as zero-centred Gaussians with identity covariance matrix.
variational inference (VI) lies at the basis of SGVB. In VI, probability distributions are approximated by finding the closest member of a restricted family of distributions by means of optimisation. It turns out that in the case of latent variable models, a tractable objective function can be found: the variational upper bound on the negative log-likelihood. The derivation can be summarised as follows:
|using Jensen’s inequality:|
Herre, is the Kullback-Leibler divergence between two probability densities, and expresses how different they are.
SGVB takes this one step further and implements as a neural network conditioned on the input, i.e., where is a set of weights of the neural network. The neural network with input and weights would generate a mean and standard deviation for modelling as a Gaussian distribution. The likelihood is also implemented as a neural network ; hence the name variational auto-encoder (VAE). The objective loss function then is
Given a solution to this problem, we will obtain an efficient mean to evaluate with a simple forward pass through a neural network. Further, it can be shown that will be close to the true but intractable posterior . SVGB therefore poses a mean to efficiently extract latent variables from observations .
Obtaining a solution can be done by stochastic gradient descent: by sampling from , we can approximate the expectation in the loss. In the case of both and being diagonal Gaussians, the KL-divergence can be evaluated efficiently in closed form.
Iii-a Tactile Sensors
We used two types of tactile sensors. The BioTac Sensor [10, 11, 12] and the tactile sensor from the iCub robot . The BioTac sensor consists of a soft, liquid-filled silicone membrane over a hard core while the iCub is comparatively stiffer with a soft but very thin coating. Both sensor also differ in the measurement principle with the BioTac measuring the electrical impedance of the liquid and the iCub measuring the capacity of its coating. The BioTac sensor also features sensors measuring the pressure, vibrations and temperature of the liquid but these values were ignored in the following experiments. Further details on the amount and types of sensors can be found in Table: I.
|DC Pressure Range||0–100 kPa||–|
|DC Temp. Range||0–75||–|
|AC Pressure spectrum||10–1040 Hz||–|
|AC Temp. spectrum||0.45–22.6 Hz||–|
Iii-B Test Bed
For verifying our unsupervised learning methods we performed several experiments with different stimuli and recorded the tactile data. These stimuli include force, shore hardness, surface angles and curvatures. They were chosen such that they represent important information which are relevant for grip and manipulation. To verify the representation in the latent space learned by the neural networks, we also recorded ground truth during our data set measurement. These ground-truth data are not used in the neural network training process.
For measuring accurate and repeatable datasets, a small 3-DoF robot with an additional linear actor was set up to fit our needs.
It was equipped with a mount for holding different types of tactile sensors as well as a force-torque sensor (ATI Nano 17) to control the force applied with a linear actor. Additionally, a gimbal platform was added for measuring materials at different angles. The robotic setup is shown in Fig. 3. The two degrees of freedom of this platform were actuated by computer-controlled stepper motors. The centre platform of the gimbal axis is replaceable, allowing us to evaluate different materials. The electronics of the robot are connected to a PC using an FPGA PCI-Card (Mesa 5i25) to support real-time control using Matlab Simulink. The operating system for this desktop computer is the Matlab Simulink Real-Time XPC operating system. All sensors are either directly connected or connected through a microprocessor to this FPGA card. This ensures proper time synchronisation and a constant delay of 30 ms between sampled data points.
The external stimuli were chosen to include shore hardness, surface normal and curvature.
Surface angle estimation.
The angular dataset was created by setting the gimbal axis to a fixed angle followed by linear increasing the force of the tactile sensor pressing against the gimbal platform. After reaching 5 N, the force was decreased at the same speed to capture possible hysteresis effects. The angles of the gimbal platform are then changed and the application of the tactile sensor is repeated. As material in the gimbal axis centre a flat plastic surface was used. The angle ranges from ° to 19° in the roll direction and from ° to 18° in the pitch direction.
Shore hardness estimation.
The material samples for shore hardness were created using a two component silicone (Smooth On Ecoflex) and then calibrated using a Shore A measurement tool. Using this technique we managed to get a uniform distribution between 0 and 30 Shore A. These material samples are shaped as cylindrical plates with a diameter of 4 cm and a height of 1 cm. A 3-D printed plastic holder for receiving these silicone plates was mounted inside the centre of the gimbal axis. The dataset was not only recorded for a planar angle, but also measured at different angles using the gimbal platform. See Fig. 4, Dataset B: each Shore hardness shown in the last plot in that row is recorded together with the shown variation of angles.
Curvature estimation The curvature dataset is obtained from six different spherical curvatures. With five radii ranging from 40 mm to 5 mm. The force was controlled to be uniformly distributed between 0–5 N for both sensors. These curvature samples are shown in Fig. 5.
An overview over these datasets, their attribute ranges and distributions are shown in Fig. 4. The rows represent the different Datasets and the columns different attributes. Note that every dataset consists of all possible combinations of its shown attribute values.
For all experiments and different tactile sensors we used the following common VAE network configuration. The generative model is defined as
where is a constant in all dimensions of and is a neural network with two layers each 512 elements wide. is part of the parameters and thus subject to the optimisation.
The recognition model is defined as
where a neural network model outputs and as a concatenated vector. The recognition model neural network has the same size, transfer functions and number of hidden layers as the generative model. We used the identity function for all output transfer function of the neural networks.
The only differences between the networks for BioTac and iCub VAE networks lies in the transfer functions and optimiser used. For the BioTac we used sigmoid transfer functions for all hidden layers in both recognition and generative model networks whereas the network for iCub data used rectifier transfer functions for all hidden layers. The optimiser used for the iCub VAE was adadelta with step rate 0.1, and for the BioTac VAE rmsprop with step-rate 0.001.
For showing the differences between preprocessed data and raw data and verifying that all information is still encoded in the features after applying the VAE we evaluated the prediction quality on the measured ground truth. We used linear regression, decision tree regression and multilayer perceptrons to evaluate the quality of the latent space.
|Linear Regression||Decision Trees|
|Linear Regression||Decision Tree Regression|
|Linear Regression||Decision Trees|
|Shore [Shore A]||2.07||1.94||2.21||2.21|
|Linear Regression||Decision Tree Regression|
|Shore [Shore A]||1.99||1.37||2.35||1.76|
Iv-a Linear Regression
We noticed a large difference in the results between raw and pre-processed data when using linear regression. As seen in Tables II, III, IV and V ,linear regression on the Variational Auto-Encoder pre-processed data almost always outperforms the results on raw data for both surface and shore dataset. The inferior quality of linear regression on raw data can be explained by the highly nonlinear, sparse representation of the stimuli in the raw tactile data as shown in Fig. 6. The plots show the recorded raw tactile data for both sensors while applying the same force profile for both sensors. The force profile consists of linearly increasing the force for approximately 15 seconds from 0 N to 5 N and then decreasing it again at the same speed.
The BioTac sensor shows a highly non-linear relation to the applied force and almost all of the 19 taxels respond to the force change. The iCub sensor shows a different reaction with less nonlinearity and only a few taxels active at the same time. Both nonlinearity and the selective activation of taxels are disadvantageous for algorithms like linear regression. The improved results after preprocessing make sense as the VAE is factorising important features into individual latent variables which helps the linear regression to predict the ground truth.
Iv-B Decision Tree Regression
Smaller differences between results can be seen in the case of Decision Tree Regression. The Decision Trees can represent more complex relations than linear transformations and can therefore incorporate the non-linear transformations which would otherwise be applied by the Variational Auto-Encoder.
Iv-C Linear Classification on Curvature
The curvature dataset consisted of data from five discrete curvature samples. We used linear classification for evaluating the predictive capabilities of the latent and the raw sensor space. Results are shown in Fig. 7 and Fig. 8. The left plot shows the confusion matrix for the raw sensor space and the right plot shows the result for the latent sensor space. A clear diagonal represents a good prediction result. We see that a better result for curvature classification can be achieved by transforming the raw sensor data into a more compact latent state.
|raw sensor space||latent sensor space|
|raw sensor space||latent sensor space|
Iv-D Evaluation of latent space
We saw that all information about the applied stimulation is still present after preprocessing using the VAE and the tactile information is now in a representation suited for linear regression or linear controller. In a real-world robotics setup with several tactile sensors we will not have any access to the real ground truth unless we perform a tedious calibration. We would therefore benefit from a method capable of preprocessing tactile data to a similar format like a sensor calibrated to physical quantities.
We found out that the VAE algorithm we used is able to capture and separate feature in the same way as the real physical stimuli are represented: forces, angles and shore hardness are unsupervisedly learned by individual latent variables. In Fig. 9 a nearly linear relation between the latent variables with the highest correlation to the real attribute is shown.
The other elements of the latent representation correspond less to the physical value and feature a very high variance. This makes it possible to reduce the dimension of the latent space to the minimum needed for the current dataset in an unsupervised way. Even though the latent space is 128 elements wide, only some elements will contain information about the current tactile state. This happens due to the fact that the VAE tries to compress the data and factorises independent components of the data. The preprocessing also manages to find the same linear relations independent of the sensor as shown in Fig. 9. Even though both sensor are so different in their raw sensor space they now show almost the same relation between the real physical value and the corresponding latent value.
V Sensor calibration
Even though the VAE is able to unsupervisedly represent and factorise the physical quantities, it is unknown where exactly in the latent representation such quantities can be found. This may not be hindering for control algorithms such as reinforcement learning, but can cause problems when specific features such as force are needed in a certain physical unit. This can however be solved with a simple calibration procedure which requires only a few labelled sensor measurements, since it is only required to find the index of the element with highest correlation to the desired feature. We recommend to rather use the full latent representation together with a suitable control algorithm since (1) the VAE also encodes tactile features which are not definable by simple physical descriptions; (2) the full resolution for specific features is only obtainable when the full latent space is used since some information is still spread among the other elements of the latent space; and (3) it completely eliminates the need to record a ground truth together with the tactile data.
We used the VAE preprocessing to stabilise a inverted pendulum using model predictive control in latent space, in order to show that the unsupervised trained features are suitable for controlling a robot. The gimbal platform was extended with an inverted pole with a BioTac sensor touching the tip of the pole as shown in Fig. 11. The task was to bring the pole in an upright position.
For this control task we used a neural network to model the system dynamics as a one-step predictor. This is done by using the state of the robot together with the current action as the neural network input and training it to predict the next state. The network consisted of one hidden layer of 20 neurons and a rectifier activation function in each hidden unit. We used the mean squared error as the loss for training this network. Choosing an action is done by evaluating the neural network for all possible actions from a discrete set. The action chosen for controlling the robot is the one where the predicted next state is the best in terms of the reward function. We chose the reward function to be maximal at zero in the latent tactile space. This zero position corresponded to an angle close at the centre position in angular sensor space.
The results using this method can be seen in Fig. 10. The plot shows the average reward over 10 experiments. Training the model is performed after each of the 30 rollouts during one experiment. As shown in Fig. 10 the reward is steadily increasing until it almost reaches the maximum of 1.0 at the end of each full experiment.
We showed that unsupervised learning can overcome the difficult data representation that are posed by high-dimensional tactile sensors. The preprocessing algorithm that we propose, based on the Variational Auto-Encoder, transforms the high-dimensional, sparse nonlinear tactile space into an easy-to-use compact latent space which can be directly used for control tasks. The latent space automatically factorises the tactile features into independent components which are linearly related to real physical ground truths. These effects can be observed in two fundamentally different tactile sensors, proving the method to be independent of the tactile sensor. This reduces the effort to manually design or tune the preprocessing and to work completely sensor-independent. A small control task proves the applicability of our preprocessing together with model predictive control.
Part of this work has been supported in part by the TACMAN project, EC Grant agreement no. 610967, within the FP7 framework programme.
-  M. Karl, A. Lohrer, D. Shah, F. Diehl, M. Fiedler, S. Ognawala, J. Bayer, and P. van der Smagt, “ML-based tactile sensor calibration: A universal approach,” arXiv preprint arXiv:1606.06588, 2016.
-  N. Wettels and G. Loeb, “Haptic feature extraction from a biomimetic tactile sensor: Force, contact location and curvature,” in 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO), Dec. 2011, pp. 2471–2478.
-  C.-H. Lin, J. A. Fishel, and G. E. Loeb, “Estimating point of contact, force and torque in a biomimetic tactile sensor with deformable skin.”
-  Z. Su, J. A. Fishel, T. Yamamoto, and G. E. Loeb, “Use of tactile feedback to control exploratory movements to characterize object compliance,” Frontiers in Neurorobotics, vol. 6, p. 7, 2012.
-  S. Zhe, K. Hausman, Y. Chebotar, A. Molchanov, G. E. Loeb, G. S. Sukhatme, and S. Schaal, “Force Estimation and Slip Detection for Grip Control using a Biomimetic Tactile Sensor.”
-  V. Ciobanu, A. Petrescu, N. Hendrich, and J. Zhang, “Tactile sensor value preprocessing pipeline,” in System Theory, Control and Computing (ICSTCC), 2013 17th International Conference, Oct. 2013, pp. 674–680.
-  N. Wettels, J. Fishel, Z. Su, C. Lin, and G. Loeb, “Multi-modal synergistic tactile sensing,” Tactile Sensing in Humanoids â Tactile Sensors and Beyond Workshop, 2009.
-  D. P. Kingma and M. Welling, “Stochastic Gradient VB and the Variational Auto-Encoder,” arXiv:1312.6114 [cs, stat], Dec. 2013. [Online]. Available: http://arxiv.org/abs/1312.6114
-  D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic Backpropagation and Approximate Inference in Deep Generative Models,” arXiv:1401.4082 [cs, stat], Jan. 2014. [Online]. Available: http://arxiv.org/abs/1401.4082
-  J. Fishel, G. Lin, B. Matulevich, and G. Loeb, “Biotac product manual,” 2015. [Online]. Available: http://www.syntouchllc.com/Products/BioTac/_media/BioTac_Product_Manual.pdf
-  J. Fishel, “Design and use of a biomimetic tactile microvibration sensor with human-like sensitivity and its application in texture discrimination using Bayesian exploration,” Ph.D. dissertation, University of Southern California, 2012.
-  J. Fishel and G. Loeb, “Sensing tactile microvibrations with the BioTac - Comparison with human sensitivity,” in 2012 4th IEEE RAS EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob), Jun. 2012, pp. 1122–1127.
-  A. Schmitz, M. Maggiali, L. Natale, B. Bonino, and G. Metta, “A tactile sensor for the fingertips of the humanoid robot iCub,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2010, pp. 2212–2217.
-  M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” arXiv:1212.5701 [cs], Dec. 2012, arXiv: 1212.5701. [Online]. Available: http://arxiv.org/abs/1212.5701
-  T. Tieleman and G. Hinton, “Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, 2012.