A memory of motion for visual predictive control tasks
This paper addresses the problem of efficiently achieving visual predictive control tasks. To this end, a memory of motion, containing a set of trajectories built off-line, is used for leveraging precomputation and dealing with difficult visual tasks. Standard regression techniques, such as k-nearest neighbors and Gaussian process regression, are used to query the memory and provide on-line a warm-start and a way point to the control optimization process. The proposed technique allows the control scheme to achieve high performance and, at the same time, keep the computational time limited. Simulation and experimental results, carried out with a 7-axis manipulator, show the effectiveness of the approach.
Image-based visual servoing (VS) is a well established technique to control robots using visual information  . Its classic formulation consists in the simple control law , where is the velocity of the camera, is the control gain and is the pseudo-inverse of the image Jacobian (or interaction matrix) ; the hat symbol denotes an approximation. This control law ensures an exponential convergence to zero of the visual error, i.e., the difference between the measured and desired visual features ( and , respectively). Although the VS control law is easy to implement and fast to execute, it has some limitations. For large values of the error, the behavior can be unstable, and for some configurations the Jacobian can become singular causing dangerous commands . Being purely reactive, VS does not perform any sort of anticipatory behavior that would improve the tracking performance. Furthermore, it cannot easily include (visual or Cartesian) constraints, which are very useful in real-life robotic experiments.
Planning techniques  can be employed to compute trajectories that achieve the desired visual task while respecting constraints. Alternatively, VS can be formulated as an optimization process, allowing to easily include constraints. In , VS is written as a quadratic program (QP) so that it can account for the constrained whole-body motion of humanoid robots. Similarly, a virtual VS written as a QP is proposed in  to achieve manipulation tasks. Visual planning and control can be solved together using a model to predict the feature motion and the corresponding commands over a preview window . Indeed, the model predictive control (MPC) technique can be applied to the VS case, by obtaining the so-called visual predictive control (VPC) framework  . The main drawback of VPC is the computation time. The flatness property   can be used to reduce the problem complexity, but it is not applicable to all kinds of dynamics.
In this work, we propose to use a dataset of pre-processed solutions to improve VPC performance (recalled in Sect. II). To this end, an initialization and a way point is inferred on-line from the dataset. Section III reports the literature on methods used to exploit stored data; the proposed approach is detailed in Sect. IV. Simulation and experiments, showing the effectiveness of the approach, are presented in Sect. V. Section VI concludes the paper and discusses future work.
where the cost function is defined as
and the optimization variable consists in the sequence of control actions to take along the preview window
In (2) and (3), is the number of iterations defining the size of the preview window, while is the control horizon defined such that from to the control is constant and equal to ; and are two matrices used to weight the error and penalize the control effort, respectively. In the preview window, i.e. for , the problem is subject to
with the difference between the measured and the first previewed feature, constant over the preview window. is the sampling time. Constraints on the optimization variable
account for actuation limits, the ones on the visual features
achieve visibility constraints: (6) forces the features to stay in an area, e.g., to prevent from leaving the image plane, (7) allows to avoid occlusions or spots on the lens. The ensemble of (5)-(7) compose the set of non-linear constraints in (1).
Following the MPC rationale, at each iteration , VPC measures the visual features , predicts the motion over the preview window using the model in (4), minimizes the cost function (2) and finally computes the commands . Only the first control of this sequence is applied to the real system which moves, providing a new set of features. Then, the loop starts again. To achieve a satisfactory behavior, the control is usually kept constant over the preview window (), while is tuned as a trade-off between a long (better tracking performance) and a short preview window (lower computational cost). More constraints (e.g., on camera position) can be added. In (4) a local model of the visual features is used, but a global model of the camera motion can also be considered. More details can be found in  .
Solving (1) with the constraints (5)-(7) is a non-convex optimization problem. As such, the solution depends on the solver initialization. If it is far from the global optimum, the convergence can be slow, or get stuck in local minima providing unsatisfactory results. Thus, it is important to provide the solver with a warm-start, i.e., an initial commands sequence already close to the optimal solution. To avoid the constraints, the warm-start can guide the motion away from the target momentarily. However, providing only warm-starts may not be sufficient. In fact, a solver with short time horizon might consider the warm-start to be sub-optimal and modify it to move towards the goal and, as a consequence, get stuck at the local optima at the constraints. One possible solution is to consider a long preview window and set the cost only at the end of the horizon, but this is computationally expensive. A better idea would be to adjust the cost function with a proper way point as sub-target to follow.
We propose to use a memory of motion, i.e., a dataset of precomputed trajectories, to infer both a warm-start and a way point during the on-line VPC execution. In this way, we leverage precomputation to shorten the VPC preview window while maintaining high performance.
Leveraging information stored in a memory to control or plan robotic motions has been the object of a lively research. In , a library of trajectories is queried by k-nearest neighbor (k-NN) to infer the control action to take during the experiment. A similar method  selects from the library a control which is then refined by differential dynamic programming. As an alternative to plan from scratch, the framework in  starts the planner from a trajectory learned from experiences. In  Gaussian process regression (GPR) is used to adapt the motion, stored as dynamic motion primitives, to the actual situation perceived by the robot. The line of works [28, 11] considers a robot motion database built from human demonstrations. This gives the controller a guess of the motion to make, possibly modified by the presence of obstacles. Demonstrations and optimization techniques are used in  to handle constraints in a visual planner.
To improve the convergence of planning or control frameworks written as optimization problems, the memory can be used to provide the solvers with a warm-start. In , a memory is iteratively built, expanding a probabilistic road map (PRM) using a local planner. A neural network (NN) is trained, in parallel, with the current trajectories stored in the PRM and used to give the local planner a warm-start to better connect the map. The final NN is then used to infer the warm-start for the on-line controller. In the context of a trajectory optimizer, the initialization is computed by applying k-NN and locally weighted regression to a set of pre-optimized trajectories . In  a k-NN infers from a memory of motion the warm-starts for a planner. The same kind of problem is addressed in  with different techniques, i.e. k-NN, GPR and Bayesian Gaussian mixture regression, that allows to also cope with multi-modal solutions.
Other approaches consider the possibility to reshape the cost function to guide the solver towards an optimal solution. For example, the interior point method  solves an inequality constrained problem by introducing the logarithmic barrier function to the cost. In this way, the search for the solution starts from the inner region of the feasible space and then moves to the boundary region. In humanoid motion planning , heuristic sub-goals are introduced in the early stage of the optimization based on the zero-moment point stability criterion. In , to avoid discontinuity, the contact dynamics are smoothened such that virtual contact forces can exist at a distance. In reinforcement learning, it is common to modify the sparse reward function, that is difficult to achieve, by providing intermediate rewards as way points .
To build our framework and successfully achieve VPC tasks, we took inspiration from the different approaches existing in the literature. In particular, we decided to exploit the information contained in a memory of motion to infer: (i) warm-start to well initialize our optimization solver; and (ii) way point to be used in the cost in lieu of the final target.
Iv The Proposed Approach
As recalled in Sect. II, VPC computes a control sequence by solving a minimization problem. To efficiently find an optimal solution, the process has to converge fast and avoid local minima. Thus, it is important to initialize the solver with a warm-start , and reshape the cost function using a way point in place of the target . This section explains how to infer the warm-start and way point from a memory.
The memory of motion is a dataset , of samples.
Each feature describes a particular visual configuration and is composed of a set of visual features, the area and the orientation of the visual pattern
where , is the dimension of the visual feedback . We consider and along with in (8) to make the samples distinguishable, not only in terms of the visual appearance but also w.r.t. the corresponding camera poses. The output variable contains the proper control action to take and the way point to follow in function of . Since the control is constant in the preview window (, see Sect. II), it is enough to store the single command
where , with the actuated degrees of freedom of the camera. All the samples are collected in the matrices
The whole process computing warm-start and way point consists in off-line building and on-line querying the memory.
Iv-a Building the memory of motion
The memory of motion is built by running VPC off-line for different sets of initial visual features. The aim is to compute successful trajectories able to achieve the visual task. To this end, the same solver of the on-line executions is used to build the memory. However, since the aim is to build ‘high-quality’ samples and there is no strict constraint on the execution time (the memory is built off-line), the solver is set up with low thresholds on the solution optimality, a high number of max iterations allowed, and a large VPC preview window.
The process building the memory of motion is presented in the algorithm of Fig. 1. For random initial conditions , if the VPC solver succeeds to find a feasible solution (no constraint is violated) and the task is achieved ( converge to in the given time), then all the visual features from to are saved ( is the length of the trajectory). Thus, , the following actions are executed:
the area and angle of the corresponding visual pattern are computed;
the way point is computed as the visual features at samples ahead (); if , ;
the corresponding solution is selected.
With this information, the vectors and are obtained and finally stored in and . The initial value of the visual features is generated randomly at the start of the memory building, while at the later stage it is biased toward the distributions corresponding to the set of unsuccessful initial conditions (estimated by Gaussian Mixture Model), so that the solver attempts to solve the difficult cases when the database has contained a sufficient number of samples. The algorithm uses the function ’Find Solution’ which tries to find an optimal solution, employing the strategies detailed in the algorithm of Fig. 2. It implements an iterative mechanism by which the memory building process benefits from the current status of the memory itself. Indeed, if there are enough trajectories in the memory, and the features are close to the constraints (in which case the function ’Is_Close’ returns True), the solver is provided with a warm-start and way point inferred by a k-NN algorithm (details in Sect. IV-B). Otherwise, the algorithm tries to solve the VPC using the previous solution as warm-start. If the solver does not manage to find a successful solution, two recovery strategies are executed: the solver is warm-started with one of: (i) 12 pre-defined; or (ii) 10 random camera velocity directions. In the presented algorithms, ’’, and ’’ denote the AND and NOT logic operator, respectively. Once the memory is built, it is ready to be queried on-line.
Iv-B Querying the memory of motion
The aim of querying the memory of motion is to infer from the dataset proper initial guess and way point for the on-line VPC solver, given the current visual features configuration. This means that we need to learn the map from so that an estimate can be computed for a novel feature . The map is learned using standard regression techniques, i.e. k-NN and GPR, as also proposed in . In what follows, we describe the adaptation required for the VPC application.
The k-NN algorithm is a simple non-parametric method selecting the closest samples in the dataset , given a new feature . The distance between samples is computed as Euclidean norm. The corresponding closest outputs are thus averaged to provide the estimated output
In the case of GPR , the inference is computed by
where can be computed off-line, so that only a vector sum and a matrix multiplication, fast to compute, are left for the on-line estimation; is the identity matrix
Finally, recalling that the control is constant in the preview window, the warm-start is built from the first entries of :
where ’’ is the Kronecker product. The way point, instead, is obtained from the remaining elements of :
Note that in the absence of constraints, the solution found at the previous iteration is already a good warm-start for the solver and there is no need to reshape the cost with a way point. Thus, the memory-based strategy is activated only when the visual features are “close” to one of the visibility constraints, i.e., when the distance between any feature and the border of the constraints is lower than a given threshold.
In this section we present the results carried out with the proposed framework to efficiently achieve VPC tasks.
As visual features , we considered four points (). The visual task consisted in making them match with four corresponding desired points . The image Jacobian in (4) has been approximated using the points depth at the target, known in advance. The approach has been implemented in Python. As optimization solver, we used the SLSQP method available in the open source library SciPy . Actuation and visibility constraints were implemented as bounds and non-linear inequality constraints. The OR logic operation in (7), to be implemented, was converted into AND with a -norm formulation . We choose for our k-NN, that is thus mainly used to select samples as they are in the memory; we considered the GPy library  as GPR implementation. As explained in Sect. II, VPC was set-up with and s (since Hz is the camera nominal framerate).
For the simulations, we considered a hand-held camera free to move in the Cartesian space (), with an image resolution of pixels. As visibility constraints, we considered four convex and concave areas on the image (- plane) simulating occlusions and spots on the lens. As actuation constraints, we limited the linear and angular velocity components of the camera to m/s and rad/s. We set (decreasing to towards the convergence) and with .
The memory of motion was generated following the procedure of Sect. IV-A. In particular, the solver was set-up with an optimality precision of and maximum iterations. VPC was set with . The choice of these parameters was driven by the need to store ‘high-quality’ samples, at the cost of a high computational time that we were willing to pay since the memory is built off-line. We generated trajectories, for a total of samples. Fig. 3 shows the visual features trajectories stored in the memory. The visibility constraints are depicted as shadowed areas, while the target are the red circles. For the on-line executions, we relaxed the solver parameters with as optimality precision and maximum iterations. This set-up, along with a smaller , allowed faster computations. However, thanks to the memory-based strategies presented in Sect. IV, performances are not invalidated, but even improved.
The approach was first evaluated with a statistical analysis, comparing: VPC warm-started with the previous-iteration solution (for brevity denoted “prev.-it.”) (i) using and (ii) using ; using warm-start and way point provided (iii) by k-NN and (iv) by GPR, both with . The memory-based strategies were activated at pixels from the occlusions and we set . For GPR, data were sub-sampled by a factor . The comparison is performed w.r.t. the success rate , the average of the solver convergence time and the average of the cost divided by for all (successful and unsuccessful) trajectories. Each execution is considered successful if no constraint is violated (with a tolerance of pixels) and the visual task is achieved ( converges to in the given time of s). Each strategy was tested using the same random initial configuration. The results, run on a laptop with an i7- GHz 4-cores and GiB RAM, are reported in Table I. The prev-it strategy with allowed to obtain % of success rate (note that among the test samples, many had an easy task execution). In order to improve , for the considered scenario, we had to increase the preview window to , but this also increased the computation time. The proposed memory-based strategies allowed us to keep the preview window short, so that both and have low values, and increase at the same time. This is due to the effect of warm-start and way point which help the execution of the task.
The main reason of the prev.-it. strategy failures is that the solution gets stuck at the visual occlusions. The memory-based strategies reduces the occurrence of these situations. As an example, in Fig. 4 we present the plots related to a single task execution, where the big blue dot is the initial value of the features, the smaller blue dots are the VPC solutions at each iteration, whereas the red circles are the target. The prev.-it. strategy stops at an occlusion border (see Fig. (a)a), as effect of conflicting gradients that produce zero velocity commands (Fig. (d)d). Instead, the memory-based approaches manage to overcome the occlusion, as shown in Figs (b)b and (c)c. In particular, the GPR solution, thanks to its interpolation capabilities, produces a smoother behavior w.r.t. our k-NN implementation (cf. Figs (b)b-(e)e with (c)c-(f)f ).
V-B Robot experiments
For the experiments, we used the 7 degrees-of-freedom robot arm Panda by Franka Emika, with an Intel RealSense RGB-D sensor mounted at the end-effector. The sensor, used as monocular camera, outputs images with a resolution of pixels at a nominal framerate of Hz. The image processing, used to detect the point features, was implemented using the open source library OpenCV . A calibration procedure computed the intrinsic camera parameters and the camera–end-effector displacement. The camera velocity commands, computed by VPC, were transformed in the robot frame and sent to the robot Cartesian controller. As task, the robot had to place an object inside a box where we placed four known markers. Without knowing the box pose, VPC was used to drive the robot over the box and, after convergence, release the object. On the image we considered two constraints to take into account the occlusion of the object grasped by the robot, and emulate a spot in the center of the lens as a blurred area. VPC was set with , (decreasing it to approaching the convergence), while the commands bounds were set to m/s and rad/s.
The memory ( trajectories for a total of samples) was built with , solver optimality tolerance of and maximum iterations. The iterative building and the adaptive sampling were not used. To be conservative, the spot considered in the memory was bigger than the one in the experiments. Given the simulation results, we decided to use the GPR-based strategy. Data were subsampled by a factor , with . The trigger signal to query the memory was activated at pixels from the occlusions.
For the on-line experiments, we set , the solver was given maximum iterations and as optimality tolerance. With this setting, and for some initial robot-box configuration, the previous iteration strategy was not capable to achieve the task, as shown in the snapshots of Fig. 5. While moving the visual features (blue dots, see Fig. (a)a) towards the target (red circle), the features met the blurred spot (Fig. (b)b) causing the loss of a feature and the consequent failure of the task (Fig. (c)c). The same experiment has been carried out with the GPR-based approach, see Fig. 6 where both robot and camera view are shown. Starting from the same initial condition (Fig. (a)a), at the proximity of the constraint (Fig. (b)b), the memory provides proper way point (depicted as red crosses on the image plane) and warm-start which allow to successfully achieve the desired task (Fig. (c)c). In Fig. 7 are shown the velocity commands sent to the robot during the execution. The experiments are shown in the accompanying video.
Vi Conclusion and Future Work
In this paper, we addressed the problem of efficiently achieving visual predictive control tasks. Using a memory of motion, we could exploit previous solutions to better fulfill on-line tasks. Furthermore, leveraging pre-computation contained in the memory, we could set a short VPC preview window without invalidating the results. The algorithm performances rely on the pre-computed dataset; we plan to improve the quality of the memory using a global optimizer or a planner. Furthermore, more sophisticated paradigm of active learning can be employed to build a minimal memory, containing less but more informative samples. In the presented work, the memory is queried using k-NN and GPR. As shown with both simulations and experiments, these methods were able to outperform the standard VPC scheme. However, we believe that the performance could be even improved by considering other kinds of regressors that can cope with multimodality, as done in . In the presented results we show that the use of a memory of motion helps also to keep the computation time limited. However, more effort will be done in order to ensure full real-time performances. Finally, further developments will be devoted to include the proposed scheme within the optimization framework of more complex systems such as humanoids.
- For example, if point features are used, the visual pattern is the polygon having the visual features as vertexes.
- Hereafter, , and refer to the identity, all-ones and null matrix. When not explicitly marked, the dimensions are inferred from the context.
- (2017) Visual servoing in an optimization framework for the whole-body control of humanoid robots. IEEE Robot. and Autom. Lett. 2 (2), pp. 608–615. Cited by: §I.
- (2010) Predictive control for constrained image-based visual servoing. IEEE Trans. Robot. 26 (5), pp. 933–939. Cited by: §I, §II, §II.
- (2010) Visual servoing via nonlinear predictive control. In Visual Servoing via Advanced Numerical Methods, G. Chesi and K. Hashimoto (Eds.), pp. 375–393. Cited by: §I, §II, §II.
- ([2008) Real-time visual predictive controller for image-based trajectory tracking of a mobile robot. IFAC Proceedings Volumes 41 (2), pp. 11244–11249. Cited by: §I.
- (2012) A robot path planning framework that learns from experience. In IEEE Int. Conf. on Robotics and Automation, pp. 3671–3678. Cited by: §III.
- (2006) Visual servo control, Part I: basic approaches. IEEE Robot. Autom. Mag. 3 (4), pp. 82–90. Cited by: §I.
- (2007) Visual servo control, Part II: advanced approaches. IEEE Robot. Autom. Mag. 14 (1), pp. 109–118. Cited by: §I.
- (1998) Potential problems of stability and convergence in image-based and position-based visual servoing. In The Confluence of Vision and Control, pp. 66–78. Cited by: §I.
- (2012) On-line motion synthesis and adaptation using a trajectory database. Robotics and Autonomous Systems 60 (10), pp. 1327–1339. External Links: Cited by: §III.
- GPy: a gaussian process framework in python. External Links: Cited by: §V.
- (2019) Avoidance of convex and concave obstacles with convergence ensured through contraction. IEEE Robot. and Autom. Lett. 4 (2), pp. 1462–1469. Cited by: §III.
- (2017) A new framework for optimal path planning of rectangular robots using a weighted norm. IEEE Robot. and Autom. Lett. 2 (3), pp. 1460–1465. Cited by: §V.
- (2009) Trajectory prediction: learning to map situations to robot trajectories. In Int. Conf. on Machine Learning, pp. 449–456. Cited by: §III.
- SciPy: open source scientific tools for Python. External Links: Cited by: §V.
- (2009) Standing balance control using a trajectory library. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3031–3036. Cited by: §III.
- (2018) Using a memory of motion to efficiently warm-start a nonlinear predictive controller. In IEEE Int. Conf. on Robotics and Automation, pp. 2986–2993. Cited by: §III.
- (2018) Leveraging precomputation with problem encoding for warm-starting trajectory optimization in complex environments. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 5877–5884. Cited by: §III.
- (2002) Path planning for robust image-based control. IEEE Trans. Robot. Autom. 18 (4), pp. 534–549. Cited by: §I.
- (2012) Discovery of complex behaviors through contact-invariant optimization. ACM Trans. Graph. 31 (4), pp. 1–8. Cited by: §III.
- Open source computer vision library. External Links: Cited by: §V-B.
- (2018) Interlinked visual tracking and robotic manipulation of articulated objects. IEEE Robot. and Autom. Lett. 3 (4), pp. 2746–2753. Cited by: §I.
- (2018) Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37 (4), pp. 1–14. Cited by: §III.
- (2018) Vision-based reactive planning for aggressive target tracking while avoiding collisions and occlusions. IEEE Robot. and Autom. Lett. 3 (4), pp. 3725–3732. Cited by: §I.
- (2010) Interior point methods for nonlinear optimization. In Nonlinear optimization, pp. 215–276. Cited by: §III.
- (2006) Gaussian processes for machine learning. MIT Press, Cambridge, MA, USA. Cited by: §IV-B.
- (2020) Memory of motion for warm-starting trajectory optimization. IEEE Robot. and Autom. Lett. 5 (2), pp. 2594–2601. Cited by: §III, §IV-B, §VI.
- (2006) Image based visual servoing through nonlinear model predictive control. In IEEE Conf. on Decision and Control, pp. 1776–1781. Cited by: §I.
- (2014) Distance based dynamical system modulation for reactive avoidance of moving obstacles. In IEEE Int. Conf. on Robotics and Automation, pp. 5618–5623. Cited by: §III.
- (2018) Optimized vision-based robot motion planning from multiple demonstrations. Autonomous Robots 42 (6), pp. 1117–1132. Cited by: §III.
- (2006) Policies based on trajectory libraries. In IEEE Int. Conf. on Robotics and Automation, pp. 3344–3349. Cited by: §III.
- (2011) A convex, smooth and invertible contact model for trajectory optimization. In IEEE Int. Conf. on Robotics and Automation, pp. 1071–1076. Cited by: §III.