A Novel Adaptive Controller for Robot Manipulators based on Active Inference
Corrado Pezzatoa,*, Riccardo Ferrarib, Carlos Hernandeza
a Cognitive Robotics (CoR), TU Delft, The Netherlands
b Delft Center for Systems and Control (DCSC), TU Delft, The Netherlands
* Corresponding author
More adaptive controllers for robot manipulators are needed, which can deal with large model uncertainties. This paper presents a novel active inference controller (AIC) as an adaptive control scheme for industrial robots. This scheme is easily scalable to high degrees-of-freedom, and it maintains high performance even in the presence of large unmodeled dynamics. The proposed method is based on active inference, a promising neuroscientific theory of the brain, which describes a biologically plausible algorithm for perception and action. In this work, we formulate active inference from a control perspective, deriving a model-free control law which is less sensitive to unmodeled dynamics. The performance and the adaptive properties of the algorithm are compared to a state-of-the-art model reference adaptive controller (MRAC) in an experimental setup with a real 7-DOF robot arm. The results showed that the AIC outperformed the MRAC in terms of adaptability, providing a more general control law for robot manipulators. This confirmed the relevance of active inference for robot control.
Traditional control approaches for industrial manipulators rely on an accurate model of the plant. However, there is an increasing demand in industry for robot controllers that are more flexible and adaptive to run-time variability. Often, robot manipulators are placed in dynamically changing surrounding, and they are subject to noisy sensory input and unexpected events. In these new applications, obtaining such a model is a major problem. For example, in pick and place tasks, the dynamics of the robot manipulators can change unpredictably while handling unknown objects. Recent research has focused on the use of machine learning methods to obtain accurate inverse dynamic models [24, 20]. In general, learning models using Neural Networks (NN) requires experts for defining the best topology for a particular problem . Even though it is possible to exploit the physical knowledge of the system to simplify and improve the learning performance , the need of large amount of training data and several iterations for learning, still remains a problem and hard to generalise [13, 12]. Controllers that can dynamically adapt are required, but existing solutions in adaptive control either need an accurate model, or are difficult to tune and to scale to higher number of DOFs. In this paper, we present a novel adaptive controller for robot manipulators, inspired by a recent theory of the brain, which does not require a detailed dynamical model of the plant, and that is less sensitive to large parameters variation.
The proposed control scheme is based on the general free-energy principle proposed by Karl Friston , and redefined in engineering terms  . The main idea at the basis of Friston’s neuroscientific theory, is that the brain’s cognition and motor control functions could be described in terms of energy minimization. It is supposed  that we, as humans, have a set of sensory data and a specific internal model to characterize how the sensory data could have possibly been generated. Then, given this internal model, the causes of sensory data are inferred. Usually, the environment acts on humans to produce sensory impression, and humans can act on the environment to change it. In this view, the motor control of human body can be considered as the fulfillment of a prior expectation about proprioceptive sensations . The fact that this theory tries to capture the adaptive nature of humans’ sensorimotor control, suggested the use of the free-energy principle to obtain adaptive control schemes for robotics. In practice, in a robotic application, the sensory input available can be used to infer the most probable states of the robot through the minimization of the free-energy as cost function. The same minimisation schema can be used to obtain the control actions to the motors in order to fulfill a prior expectation about a specific desired goal. The use of active inference for robot control allows state estimation and control only using sensory data and internal models for these data. The contributions of this paper are twofold:
Derivation of an online active inference control law for the control of a generic n-DOF robot manipulator in joint space.
Comparison of the adaptability performance of the AIC with a state-of-the-art model reference adaptive controller.
Both contributions have been experimentally validated in a setup with a 7-DOF collaborative industrial manipulator performing point to point motions.
1.1 Related work
At present, the application of active inference for robot control is still limited, and no actual comparison with other advanced adaptive control techniques has been carried out. In , the authors simulated the behaviour of a PR2 robot in a reaching task. The manipulator was controlled in Cartesian space, but the generative models describing the relation between sensory data and states of the robot were supposed to be known exactly. In addition, the computational complexity of the algorithm precluded the use of active inference for any online applications. More recent work  formalised the use of the free-energy for static state estimation, using a real UR5 robot arm equipped with proprioceptive and visual sensors. Even though the results of the state estimation were promising, no control actions were included. The same authors presented in  the body estimation and control in the joint space of a simulated 2-DOF robot arm through active inference. This solution included state-of-the art regressors to estimate online the generative models. However, during the simulations, the estimation of the acceleration was unreliable and substituted with the ground truth. Regardless of the fact that only forward dynamics models had to be learned, the authors pointed out how this approach is not simpler compared with classical inverse dynamics techniques. In a parallel, related work on active inference , the authors successfully controlled a real 3-DOF robot arm using velocity commands. Our approach is focused on the adaptability properties for low-level torque control, providing a comparison with an state-of-the-art controller and insights for controller design and tuning.
The adaptive control branch of control theory  offers solutions to deal with robotic manipulators subject to parameters variation and abrupt changes in the dynamics. Within adaptive controllers, two main categories can be identified: the model reference adaptive systems, and the self-tuning regulators . The first technique being studied for robot manipulators was the model reference adaptive control (MRAC) . The idea behind this technique is to derive a control signal to be applied to the robot actuators which will force the system to behave as specified by a chosen reference model. Furthermore, the adaptation law is designed to guarantee stability using either Lyapunov theory or hyperstability theory . The other most common approach for robot control is the self-tuning adaptive control  . The main difference between this technique and the MRAC is that the self-tuning approach represents the robot as a linear discrete-time model and it estimates online the unknown parameters, substituting them in the control law. The literature for adaptive control of robot manipulators shows the ability of these techniques to perform well in presence of uncertain dynamics and varying payloads. Having said that, the complexity of the controller increases usually with increasing number of DOF.
Among all the possible adaptive controllers, in this paper we choose the MRAC with hyperstability theory  for comparison. This choice is motivated by the fact that this approach provides adaptability to abrupt changes in the robot dynamics, and it does not require the kinematic or dynamic description of the manipulator. These characteristics make the MRAC suitable for a fair comparison with the AIC.
1.2 Paper structure
The paper is organised as follows: In Sec. 2 we present the free-energy principle and active inference in control engineering terms. In Sec. 3 we derive an active inference controller for a 7-DOF robot manipulator, and we explain the model assumptions and simplifications. In Sec 4 a model reference adaptive controller is presented for comparison. In Sec. 5 we compare the performance of the two control architectures in a simulated pick and place task, and we evaluate their adaptive properties. The simulations are then validated in the real setup in Sec. 5.4. Finally, in Sec. 6 we discuss the advantages and the adaptability properties of the novel active inference controller, highlighting its relevance and applicability for online robotic applications.
2 The active inference framework
2.1 The free-energy principle
The free-energy principle is formulated in terms of Bayesian inference . In this view, body perception for state estimation is framed using Bayes rule:
where is the probability of being in the -dimensional state given the current -dimensional sensory input . However, instead of exactly inferring the posterior, which often involves intractable integrals, an auxiliary probability distribution , called recognition density, is introduced. By minimizing the Kullback-Leibler divergence () between the true posterior and , the most probable state given a sensory input is inferred . is defined as:
In the equation above, the scalar is the so called free-energy. By minimizing , is also minimized and the recognition density approaches the true posterior. According to the Laplace approximation , the controller only parametrises the sufficient statistics (e.g. mean and variance) of the recognition density. is then assumed Gaussian and sharply peaked at its mean value . This approximation allows to simplify the expression for the free-energy which results:
The mean is the internal belief about the true states . Minimizing , the controller is then continuously adapting the internal belief about the states based on the current sensory input . Exploiting the product rule, can be further simplified as:
2.2 Free-energy equation
Equation Eq. 4 is still general and it has to be adapted to the specific control case to be able to numerically evaluate . To do so, two probability densities have to be defined. This is done by introducing two generative models, one to predict the sensory data , according to the current belief , and another to describe the dynamics of the evolution of the belief .
2.2.1 Generative model of the sensory data
The sensory data is modeled using the following expression :
where represents the non-linear mapping between sensory data and states of the environment, and is Gaussian noise . Note that the covariance matrix also represents the controller’s confidence about each sensory input.
2.2.2 Generative model of the state dynamics
In presence of time varying states , the controller has to encode a dynamic generative model of the evolution of the belief . The generative function can be then defined as:
where is a generative function dependant on the belief about the states and is Gaussian noise .
2.2.3 Generalised motions
To describe the dynamics of the states, or better the belief about these dynamics, we have to introduce the concept of generalised motions . Generalised motions are used to represent the states of a dynamical system, using increasingly higher order derivatives of the states of the system itself. They apply to sensory inputs as well, meaning that the generalised motions of a position measurement, for example, correspond to its higher order temporal derivatives (velocity, acceleration, and so on). The use of generalised motions allows a more accurate description of the system’s states. More precisely, the generalised motions of the belief under local linearity assumptions  are:
We indicate the generalised motions of the states up to order 111Note that the generalised motions can extend up to infinite order. However, the noise related to high orders is predominant and this allows to decide on the number of derivatives to consider . as .
The generalised motions of the sensory input are:
We indicate the generalised motions of the sensory input up to order as .
2.2.4 General free-energy expression
Equipped with the extra theoretical knowledge about the generalised motions, we can define an expression for the free-energy for a multivariate case in a dynamically changing environment. Under the assumption of Gaussian noise, combining Eq. 5 and Eq. 6 with Eq. 4, leads to express as a sum of prediction errors:
where is the number of generalised motions chosen and:
The minimisation of this expression can be done by refining the internal belief, thus performing state estimation, but also computing the control actions to fulfill the prior expectations and achieve a desired motion. The next two subsections describe the approach proposed by Friston   to minimise , using gradient descent.
2.3 Belief update for state estimation
2.4 Control actions
In the free-energy principle the control actions play a fundamental role in the minimisation process. In fact, the control input allows to steer the system to a desired state while minimising the prediction errors in . This is done as before using gradient descent. Since the free-energy is not a function of the control actions directly, but the actions can influence the free-energy by modifying the sensory input, we can write :
Dropping the dependencies for a more compact notation, the dynamics of the control actions can be written as:
where is the tuning parameter to be chosen.
3 Robot arm control with active inference
The theory presented so far is now adapted to derive an active inference control scheme for a generic -DOF robot manipulator.
The robot manipulator is equipped with position and velocity sensors, which respectively provide the two variables .
where we supposed that the controller associates four different variances to describe its confidence about sensory input and internal belief.
The states of the environment are set as the joint positions of the robot arm. Doing so, we can control the robot arm in joint space through free-energy minimization, and simplify the equations for states update and control actions.
3.1 Generative models
In order to numerically evaluate the free-energy as in Eq. 9, the two functions and still have to be chosen.
3.1.1 Generative model of the sensory data
indicates the relation between the sensed values and the states. Since we chose the states to be the joint positions and the sensory data provides directly the noisy values and , it holds:
3.1.2 Dynamic generative model of the world
is defined following the one-dimensional example presented in . In particular, the world dynamics are chosen such that the robot is steered to a desired position . In other words, the controller believes that the states will evolve in such a way that they will reach the goal with the dynamics of a first order system with unitary time constant:
The value is a constant corresponding to the desired set-point for the joints of the manipulator.
3.2 Free-energy for a robot manipulator
3.3 Belief update and state estimation for a robot manipulator
3.4 Control actions for a robot manipulator
The final step in order to be able to steer the joints of a robot manipulator to a desired value , is the definition of the control actions.
3.4.1 General considerations
Having said that, the actions update is expressed as:
Active inference requires then to define the change in the sensory input with respect to the control actions, namely and . This is usually a hard task and it can be seen as a forward dynamic problem. One approach to compute these relations is through online learning using high-dimensional space regressors. However, this increases the complexity of the overall scheme and can produce unreliable results, as shown by the authors in . In this paper we propose to approximate the partial derivatives relying on the high adaptability of the active inference controller against unmodeled dynamics, as suggested in the conclusive remarks in .
3.4.2 Approximation of the true relation between actions and sensory input
Let us first analyse the structure of the partial derivative matrices in Eq. 23. The control action is a vector of torques applied to the joints of the robot manipulator. Each torque has a direct effect only on the corresponding joint to which it is applied. This allows us to conclude that and are diagonal matrices. Furthermore, considering the second Newton’s law, the total torque applied to a rotational joint equals the moment of inertia times the angular acceleration. The diagonal terms of the partial derivatives matrices are then time varying positive values which depend on the current robot configuration. In other words, this means that a positive torque applied to a joint will always result in a positive contribution for both position and velocity of that specific joint. In this control scheme we propose to approximate the true positive time-varying relation with a positive constant, making use of the learning rate as tuning parameter to achieve a sufficiently fast actions update. The control update law is finally given by:
The positive definite diagonal constant matrices are then set to the identity, meaning that we only encode the sign of the relation between and the change in .
3.4.3 Tuning parameters AIC
The tuning parameters for the active inference controller are:
: the standard deviations representing the confidence of the controller regarding its sensory input and internal belief about the states;
, : the learning rates for state update and control actions respectively.
4 Model reference adaptive controller
The controller chosen for comparison is a MRAC. This adaptive controller allows to obtain decoupled joint dynamics, forcing every single joint to respond as a second order linear system with transfer function:
The control architecture is taken from , where the control is specified in terms of feedforward and feedback adaptive gain matrices. These time-varying gain matrices are adjusted by means of adaptation laws to guarantee closed loop stability in case of large parameters perturbation. Supposing zero initial conditions for the gains, and neglecting the derivative terms as described in , it holds:
The variables and are the desired references to track. The diagonal matrices and , and the vector with and , are the tuning parameters for the proportional-integral adaptation law. The term is called modified joint angle error vector :
with and diagonal weighting matrices. The MRAC, similarly to the AIC, does not need the dynamic description of the robot manipulator, and it is scalable to high DOF. However, the number of the tuning parameters increases with the degrees of freedom, unlike for the AIC.
5 Experimental Evaluation
This section presents the performance comparison of the two controller architectures described before. To analyse the adaptability of the algorithms against unmodeled dynamics, the controllers are tuned using an approximated model of the robot, and then tested with the more accurate system description. This will allow to evaluate the performance degradation. The tests to be performed are based on a pick and place cycle using the Franka Emika Panda 7-DOF robot manipulator, as depicted in Fig. 1. The desired joint values to perform the task are chosen such that the arm simulates the pick and place of an object from one bin to the other. More specifically, the following sequence of set-points is given to the robot arm:
From the initial position of the robot at , the goal is set to be to reach the first bin A;
The goal is set to at , to move to the central position B;
The goal is set to at , to reach the second bin C;
At the goal is set to to move back to the central position, and at the goal is set again to to re-start the cycle.
5.1 Remarks about the tuning procedure for the controllers
In the previous sections we introduced the structure and the tuning parameters for the MRAC and AIC. In the following, some remarks regarding the number of parameters and the tuning procedure are reported.
5.1.1 Number of tuning parameters
The number of tuning parameters for the MRAC equals the number of DOFs times the number of weighting terms. According to Sec. 4, this results in parameters to be tuned. Regarding the AIC, instead, the number of tuning parameters is independent from the DOF and it equals 6, following the formulation presented in Sec. 3. The lower number of parameters resulted in an overall easier tuning procedure for the active inference controller. As a final remark, to modify the behaviour of the step response for the AIC, such as rise time and settling time, one should change the internal model instead of fine tuning the controller’s parameters.
5.1.2 AIC tuning procedure
To obtain a satisfactory response for the AIC, we followed the tuning procedure below reported.
We set the controller confidence about sensory input and internal belief to one;
We disabled the control actions and incremented the learning rate until the state estimation in a static situation was fast enough;
We included the control actions and increased the learning rate until the robot was steered to the desired position, showing significant oscillations;
We dampened the oscillatory behaviour decreasing the sensory confidence about the most noisy sensors and the internal belief about velocities.
5.2 Pick and place cycle with approximated model
The pick and place performance are now presented. The controllers have been tuned using a considerably inaccurate model of the robot arm on purpose. The links have been approximated as cuboids, and 20% random uncertainty in each link’s mass has been assumed. This will allow to evaluate later on the adaptability performance of the same controller when using an accurate description of inertia tensors and masses for the manipulator. The joint values, and the computed control actions for the controlled system using AIC and MRAC, are depicted in Fig. 2 and Fig. 3 respectively. Note that, for the MRAC, saturation of the control input at is reached for some of the joints, after providing the new goal position.
5.3 Performance in case of large parameters variations
The same controller tuned using the approximated model of the 7-DOF robot arm is now applied to control the manipulator for which accurate dynamics have been specified. For clarity, we present the performance analysing the difference between the responses of the models with approximated and accurate dynamics. The two control architectures should adapt to the large changes, and keep the difference between the responses limited. The results are presented Fig. 4. As can be seen, the performance degradation using the AIC is one order of magnitude smaller than the one of the MRAC. The convergence to zero of the error is also faster in the AIC.
5.4 Experiments in the real setup
The simulation results were validated in the real robot setup. The two controllers, tuned in simulation with the approximated model, are applied to the real 7-DOF Franka Emika Panda.
It is important to notice that, besides having different physical parameters, the real setup is already gravity compensated. The AIC and MRAC are simply applied on top of this intrinsic controller. This is already a considerable change in the system’s dynamics, but to further increase the level of uncertainties, an end effector is attached to the robot. From a modeling point of view, the system used for tuning the controllers in simulation is completely different from the real one. The goal is to perform a point to point motion in the joint space using AIC and MRAC. The start and end points are set such that the robot moves from a low position to a higher position in the workspace. More in details we set and , as defined in the simulation task.
Rarely a controller tuned in simulation will directly work on a real setup, especially if the initial model was not accurate. To be able to control the robot with MRAC, in fact, a severe re-tuning of the controller had to be performed, to stabilise the response. The level of unmodeled dynamics was simply too high compared to the adaptability of the controller. However, the AIC did not required this substantial re-tuning. Only one parameter, the learning rate , was reduced to conform with the physical limits of the robot in terms of torque rates.
5.5 Results and discussion
Step responses to the A-B point to point motions are reported in Fig. 5. The AIC provides a less oscillatory behaviour, and a slightly faster response. Joints 2 and 3 are the most solicited during the motion, thus they present the highest oscillations, especially using the MRAC. This is due to lack of derivative action in the adaptation law for the MRAC. Better performance could have probably been achieved with an accurate fine tuning of the parameters of the MRAC. However, the purpose of this study was to show the robustness of the AIC, which basically required no re-tuning from simulation to real setup, that had completely different system’s dynamics. The source code for simulations222https://github.com/cpezzato/panda_simulation and experiments333https://github.com/cpezzato/active_inference is freely available on GitHub.
In this paper we presented a novel model-free adaptive controller for robot manipulators, using active inference. Our approach makes use of the alleged adaptability and robustness of active inference, to introduce simplifications for the generative model for the state dynamics and the relation between the sensory input and the action, reducing the computational complexity of previous approaches. As a result, we derive a schema for online control in joint space, which does not require any dynamic or kinematic model of the robot, is less sensitive to unmodeled dynamics, and is easily scalable to high DOF. Results from simulations and experiments in a real set up with a 7-DOF robot arm validate that our active inference controller is suitable for tasks in which the dynamic model of the plant is unknown or subject to large changes. The performance of our novel AIC has been compared with that of a state-of-the-art MRAC. The active inference controller shows better adaptability performance in the case of large parameter variations, with performance degradation due to unmodeled dynamics more than ten times lower for the AIC. In addition, the active inference controller resulted easier to tune. With this work we confirmed the value of active inference to develop more adaptive control of robot manipulators. This is only the first step in this direction, future work should proof the stability of the closed-loop active inference scheme, the definition of generative functions for the state evolution to account for dynamic requirements and motion constraints, and the extension to other control modalities, such as control in Cartesian space or impedance control.
-  (1983) Theory and applications of adaptive control - a survey. Automatica Vol. 19, No. 5, pp. 471–486. Cited by: §1.1.
-  (2015) A tutorial on the free-energy framework for modelling perception and learning. Journal of mathematical psychology. Cited by: §1.
-  (2017) The free energy principle for action and perception: a mathematical review. Journal of Mathematical Psychology 81, pp. 55–79. Cited by: §1, §2.1, §2.2.1, §2.2.2, §2.3, §2.4, §2, §3, §3.1.2.
-  (2010) Action and behavior: a free-energy formulation. Biological cybernetics 102(3), pp. 227–260. Cited by: §1.
-  (2011) Action understanding and active inference. Biological cybernetics 104(1-2), pp. 137–160. Cited by: §1.
-  (2010) The free-energy principle: a unified brain theory?. Nature Reviews Neuroscience 11(2), pp. 27–138. Cited by: §1, §2.2.4, §2.
-  (2009) Reinforcement learning or active inference?. PloS one 4(7), e6421. Cited by: §2.2.4.
-  (2007) Variational free energy and the laplace approximation. Neuroimage 34(1), pp. 220–234. Cited by: §2.1, §2.2.3, §3.
-  (2010) Generalised filtering. Mathematical Problems in Engineering. Cited by: §2.2.3, §2.3.
-  (2008) Hierarchical models in the brain. PLoS computational biology 4(11), e1000211. Cited by: footnote 1.
-  (1986) Adaptive control of robot manipulators - a review. In Proc of IEEE international conference on robotics and automation (ICRA), pp. 183–189. Cited by: §1.1.
-  (2014) Incremental learning of context-dependent dynamic internal models for robot control. In Proc of the IEEE International Symposium on Intelligent Control (ISIC), Cited by: §1.
-  (2017) A new data source for inverse dynamics learning. In Proc of IEEE/RJS Conference on Intelligent Robots and Systems, Cited by: §1.
-  (1983) A review on model reference adaptive control of robotic manipulators. IEEE Transactions and Automatic Control AC-28, pp. 162–171. Cited by: §1.1.
-  (2018) Adaptive robot body learning and estimation through predictive coding. In Proc of the IEEE International Conference on Intelligent Robots and Systems (IROS), Cited by: §1.1.
-  (2018) Active inference with function learning for robot body perception. In International Workshop on Continual Unsupervised Sensorimotor Learning (ICDL-Epirob), Cited by: §1.1, §3.4.1.
-  (2017) First-order-principles-based constructive network topologies: an application to robot inverse dynamics. In Proc of the IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), Cited by: §1.
-  (1972) Bayesian statistics, a review. SIAM 2. Cited by: §2.1.
-  (2006) ELeaRNT: evolutionary learning of rich neural network topologies. Note: Carnegie Mellon UniversityTechnical repository Cited by: §1.
-  (2008) Local gaussian process regression for real time online model learning. In Proc of Neural Information Processing Systems (NIPS2008), pp. 1193–1200. Cited by: §1.
-  (2019) Active inference body perception and action for humanoid robots. Note: arXiv:1906.03022v2 Cited by: §1.1.
-  (2016) Active inference and robot control: a case study. Journal of The Royal Society Interface 13(122). Cited by: §1.1.
-  (1991) Hyperstability approach to the synthesis of adaptive controllers for robot manipulators. In Proc of IEEE international conference on robotics and automation (ICRA), Cited by: §1.1, §1.1, §4.
-  (2000) Locally weighted projection regression: incremental real time learning in high dimensional space. In Proc. of International Conference on Machine Learning (ICML2000), pp. 1079–1086. Cited by: §1.
-  (1991) Application of a self-tuning pole-placement regulator to an industrial manipulator. In Proc of 21st IEEE Conference on Decision and Control, pp. 323–329. Cited by: §1.1.
-  (2017) A review on model reference adaptive control of robotic manipulators. Annual Reviews in Control 43, pp. 188–198. Cited by: §1.1.