Using a memory of motion to efficiently achieve visual predictive control tasks
Abstract
This paper addresses the problem of efficiently achieving visual predictive control tasks. To this end, a memory of motion, containing a set of trajectories built offline, is used for leveraging precomputation and dealing with difficult visual tasks. Regression techniques, such as knearest neighbors and Gaussian process regression, are used to query the memory and provide online the control optimization process with a warmstart and way points. The proposed technique allows the robot to achieve difficult tasks and, at the same time, keep the execution time limited. Simulation and experimental results, carried out with a 7axis manipulator, show the effectiveness of the approach.
I Introduction
Imagebased visual servoing (VS) is a well established technique to control robots using visual information [6] [7]. Its classic formulation consists in the simple control law , where is the velocity of the camera, the control gain and is the pseudoinverse of the image Jacobian (or interaction matrix) ; the hat symbol denotes an approximation. This control law ensures an exponential convergence to zero of the visual error, i.e., the difference between the measured and desired visual features ( and , respectively). Although the VS control law is easy to implement and fast to execute, it has some limitations. For large values of the error, the behavior can be unstable, and for some configurations the Jacobian can become singular causing dangerous commands [8]. Being purely reactive, VS does not perform any sort of anticipatory behavior that would improve the tracking performance. Furthermore, it cannot easily include (visual or Cartesian) constraints, very useful in reallife robotic experiments.
Planning techniques [18] can be employed to compute trajectories that achieve the desired visual task while respecting constraints. Alternatively, VS can be formulated as an optimization process, allowing to easily include constraints. In [1], VS is written as a quadratic program (QP) so that it can account for the constrained wholebody motion of humanoid robots. Similarly, a virtual VS written as a QP is proposed in [21] to achieve manipulation tasks. Visual planning and control can be solved together using a model to predict the feature motion and the corresponding commands over a preview window [27]. Indeed, the model predictive control (MPC) technique can be applied to the VS case, by obtaining the socalled visual predictive control (VPC) framework [2] [3]. The main drawback of VPC is the computation time. The flatness property [23] [4] can be used to reduce the problem complexity, but it is not applicable to all kinds of dynamics.
In this work we propose to improve the VPC performance (recalled in Sect. II) using a dataset of preprocessed solutions to provide a proper initialization and subtarget in an online manner. Section III reports the related literature while the proposed approach is detailed in Sect. IV. Simulation and experimental results, showing the effectiveness of the approach, are presented in Sect. V. Finally, Section VI concludes the paper and discusses future developments.
Ii Background
The VPC paradigm [2] [3] aims at solving planning and control simultaneously. To this end, it computes a control sequence over a preview window by solving the optimization
(1) 
where the cost function is defined as
(2) 
and the optimization variable consists in the sequence of control actions to take along the preview window
(3) 
In (2) and (3), is the number of iterations defining the size of the preview window, while is the control horizon defined such that from to the control is constant and equal to . In the preview window, i.e., for , the problem is subject to
(4) 
with the difference between the measured and the first previewed feature, constant over the preview window. is the sampling time. Constraints on the optimization variable
(5) 
accounts for actuation limits, the ones on the visual features
(6)  
(7) 
achieves visibility constraints: (6) forces the features to stay in an area, e.g., to prevent from leaving the image plane, (7) allows to avoid occlusions or spots on the lens. The ensemble of (5)(7) compose the set of nonlinear constraints in (1).
Following the MPC rationale, at each iteration , VPC measures the visual features , predicts the motion over the preview window using the model in (4), minimizes the cost function (2) and finally computes the commands . Only the first control of this sequence is applied to the real system which moves providing a new set of features. Then, the loop starts again. To achieve a satisfactory behavior, the control is usually kept constant over the preview window (), while is tuned as a tradeoff between a long (better tracking performance) and a short preview window (lower computational cost). More constraints (e.g., on camera position) can be added. In (4) a local model of the visual features is used, but a global model of the camera motion can also be considered. More details can be found in [2] [3].
Solving (1) with the constraints (5)(7) is a nonconvex optimization problem. As such, the solution depends on the solver initialization. If it is far from the global optimum, the convergence can be slow, or get stuck in local minima providing unsatisfactory results. Thus, it is important to provide the solver with a warmstart, i.e., an initial commands sequence already close to the optimal solution. To avoid the constraints, the warmstart can guide the motion away from the target momentarily. However, providing only warmstarts may not be sufficient. In fact, a solver with short time horizon might consider the warmstart to be suboptimal and modify it to move towards the goal and, as a consequence, get stuck at the local optima at the constraints. One possible solution is to consider a long preview window and set the cost only at the end of the horizon, but this is computationally expensive. A better idea would be to adjust the cost function with proper way points as subtarget to follow. We propose to use a memory of motion, i.e., a dataset of offline precomputed trajectories, to infer proper warmstart and way points during the online VPC execution. In this way, we leverage precomputation to shorten the VPC preview window while maintaining the performance high.
Iii StateoftheArt
Leveraging information stored in a memory to control or plan robotic motions has been the object of a lively research. In [29], a library of trajectories is queried by knearest neighbor (kNN) to infer the control action to take during the experiment. A similar method [15] selects from the library a control which is then refined by differential dynamic programming. As an alternative to plan from scratch, the framework in [5] starts the planner from a trajectory learned from experiences. In [9] Gaussian process regression (GPR) is used to adapt the motion, stored as dynamic motion primitives, to the actual situation perceived by the robot. The line of works [28, 11] considers a robot motion database built from human demonstrations. This gives the controller a guess of the motion to make, possibly modified by the presence of obstacles.
To improve the convergence of planning or control frameworks written as optimization problems, the memory can be used to provide the solvers with a warmstart. In [16], a memory is iteratively built, expanding a probabilistic road map (PRM) using a local planner. A neural network (NN) is trained, in parallel, with the current trajectories stored in the PRM and used to give the local planner a warmstart to better connect the map. The final NN is then used to infer the warmstart for the online controller. In the context of a trajectory optimizer, the initialization is computed by applying kNN and locally weighted regression to a set of preoptimized trajectories [13]. In [17] a kNN infers from a memory of motion the warmstarts for a planner. The same kind of problem is addressed in [26] with different techniques, i.e. kNN, GPR and Bayesian Gaussian mixture regression, that allow the system to also cope with multimodal solutions.
Other approaches consider the possibility to reshape the cost function to guide the solver towards an optimal solution. For example, the interior point method [24] solves an inequality constrained problem by introducing the logarithmic barrier function to the cost. In this way, the search for the solution starts from the inner region of the feasible space and then moves to the boundary region. In humanoid motion planning [19], heuristic subgoals are introduced in the early stage of the optimization based on the zeromoment point stability criterion. In [30], to avoid discontinuity, the contact dynamics are smoothened such that virtual contact forces can exist at a distance. In reinforcement learning, it is common to modify the sparse reward function, that is difficult to achieve, by providing intermediate rewards as way points [22].
To build our framework and successfully achieve VPC tasks, we took inspiration from the different approaches existing in the literature. In particular, we decided to exploit the information contained in a memory of motion to infer: (i) warmstart to well initialize our optimization solver; and (ii) way points to be used in the cost in lieu of the final target.
Iv A Memory of Motion for the Vpc
As recalled in Sect. II, VPC computes a control sequence by solving a minimization problem. To efficiently find an optimal solution, the process has to converge fast and avoid local minima. Thus, it is important to initialize the solver with a warmstart , and reshape the cost function using way points in place of the target . This section explains how to infer warmstart and way point from a memory.
The memory of motion is a dataset , of samples.
Each feature describes a particular visual configuration and is composed of a set of visual features, the area and the orientation of the visual pattern
(8) 
where , is the dimension of the visual feedback . The output variable contains the proper control action to take and the way point to follow in function of . Since the control is constant in the preview window (, see Sect. II), it is enough to store the single command
(9) 
where , with the actuated degrees of freedom of the camera. All the samples are collected in the matrices
(10) 
The whole process computing warmstart and way point consists in offline building and online querying the memory.
Iva Building the memory of motion
The memory of motion is built by running VPC offline for different sets of initial visual features. The aim is to compute successful trajectories able to achieve the visual task. To this end, the same solver of the online executions is used to build the memory. However, since the aim is to build ‘highquality’ samples and there is no strict constraint on the execution time (the memory is built offline), the solver is set up with small thresholds on the solution optimality, a high number of max iterations allowed, and a large VPC preview window.
The whole process building the memory of motion is presented in the Algorithm of Fig. 1. For initial conditions , if the VPC solver succeeds to find a feasible solution (no constraint is violated) and the task is achieved ( converge to in the given time), then all the visual features from to are saved ( is the time iterator along the trajectory). Thus, , the following action are executed:

the area and angle of the corresponding visual pattern are computed;

the way point is computed as the visual features at samples ahead (); if , ;

the corresponding solution in selected.
With this information, the vectors and are obtained and finally stored in and . The initial value of the visual features is generated randomly in the beginning, while at the later stage it is biased toward the distributions corresponding to the set of unsuccessful initial condition (estimated by using Gaussian Mixture Model), so that the solver attempts to solve the difficult cases when the database has contained sufficient number of samples. The algorithm uses the function ’Find Solution’ which tries to find an optimal solution, employing the strategies detailed in the Algorithm of Fig. 2. It implements an iterative mechanism by which the memory building process benefits from the current status of the memory itself. Indeed, if there are enough trajectories in the memory, and the features are close to the constraints, the solver is provided with a warmstart and way point inferred by a kNN algorithm (details in Sect. IVB). Otherwise, the algorithm tries to solve the VPC using the previous solution as warmstart. If the solver does not manage to find a successful solution, two recovery strategies are executed: the solver is warmstarted with one of: (i) 12 predefined; or (ii) 10 random camera velocity directions. In the presented algorithms, ’’, ’’ and ’’ denote the AND, OR and NOT logic operator, respectively. Once the memory is built, it is ready to be queried online.
IvB Querying the memory of motion
The aim of querying the memory of motion is to infer from the dataset proper initial guess and way point for the online VPC solver, given the current visual features configuration. This means that we need to learn the map from so that an estimate can be computed for a novel feature . The map is learned using regression techniques. In particular, kNN and GPR have been employed.
The kNN algorithm is a simple nonparametric method selecting the closest samples in the dataset , given a new feature . The distance between samples is computed as Euclidean norm. The corresponding closest outputs are thus averaged to provide the estimated output
(11) 
In the case of GPR [25], the inference is achieved by computing
(12) 
where can be computed offline, so that only a vector sum and a matrix multiplication, fast to compute, are left for the online estimation; is the identity matrix
Finally, recalling that the control is constant in the preview window, the warmstart is built from the first entries of :
(13) 
where ’’ is the Kronecker product. The way point, instead, is obtained from the remaining elements of :
(14) 
Note that in absence of constraints, the problem has a continuous evolution. In this case, the solution found at the previous iteration is already a good warmstart for the solver and there is no need to reshape the cost with way points. For this reason, the memorybased strategy is activated only when the visual features are “close” to one of the visibility constraints, i.e., when the distance between any features and the border of the constraints is lower than a given threshold.
V Results
In this section we present the results carried out with the proposed framework to efficiently achieve VPC tasks.
As visual features , we considered four points (). The visual task consisted in making them match with four corresponding desired points . The image Jacobian in (4) has been approximated using the points depth at the target, known in advance. The approach has been implemented in Python. As optimization solver, we used the SLSQP method available in the open source library SciPy [14]. Actuation and visibility constraints were implemented as bounds and nonlinear inequality constraints. The OR logic operation in (7), to be implemented, was converted in an AND with a norm formulation [12]. We choose for our kNN and used the GPy library [10] for the GPR implementation. As explained in Sect. II, VPC was setup with and s (since Hz is the camera nominal framerate).
Va Simulations
For the simulations, we considered a handheld camera free to move in the Cartesian space (), with an image resolution of pixels. As visibility constraints, we considered four convex and concave areas on the image ( plane) simulating occlusions and spots on the lens. As actuation constraints, we limited the linear and angular velocity components of the camera to m/s and rad/s. We set (decreasing to towards the convergence) and with .
The memory of motion was generated following the procedure of Sect. IVA. In particular, the solver was setup with an optimality precision of and maximum iterations. VPC was set with . The choice of these parameters was driven by the need to store ‘highquality’ samples, at the cost of a high computational time that we were willing to pay since the memory is built offline. We generated trajectories, for a total of samples. Fig. 3 shows the visual features trajectories stored in the memory. The visibility constraints are depicted as shadowed areas, while the target are the red circles. For the online executions, we relaxed the solver parameters with as optimality precision and maximum iterations. This setup, along with a smaller , allowed faster computations. However, thanks to the memorybased strategies presented in Sect. IV, performances are not invalidated, but even improved.
Strategy  (%)  (s)  

Previousiteration ()  80  0.085  49.3 
Previousiteration ()  83  0.550  52.9 
kNNbased  92  0.074  19.6 
GPRbased  93  0.080  16.4 
The approach was first evaluated with a statistical analysis, comparing: (i) VPC warmstarted with the previousiteration solution (for brevity denoted “prev.it.”) using and (ii) using ; (iii) using warmstart and way point provided by the kNN; and (iv) GPR, both with The memorybased strategies were activated at pixels from the occlusions and we choose . For GPR, data were subsampled by a factor . The comparison is performed w.r.t. the success rate , the average of the solver convergence time and , the average of the cost divided by , for all (successful and unsuccessful) trajectories. Each execution is considered successful if no constraint is violated (with a tolerance of pixels) and the visual task is achieved ( converges to in the given time of s). Each strategy was tested using the same random initial configuration. The results, run on a laptop with an i7 GHz core and GiB RAM, are reported in Table I. The previt strategy with allowed to obtain % of success rate (note that among the test samples, many had an easy task execution). In order to improve , for the considered scenario, we had to increase the preview window to , but this also increased the computation time. The proposed memorybased strategies allowed us to keep the preview window short, so that both and have low values, and increase at the same time. This is due to the effect of warmstart and way point which help the execution of the task.
The main reason of the prev.it. strategy failures is that the solution gets stuck at the visual occlusions. The memorybased strategies, instead, can better handle these situations. As an example, in Fig. 4 we present the plots related to a single task execution, where the big blue dot is the initial value of the features, the smaller blue dots are the VPC solutions at each iteration, whereas the red circles are the target. The prev.it. strategy stops at an occlusion border (see Fig. (a)a), as effect of the conflict gradients that produce zero velocity commands (Fig. (d)d). Instead, the memorybased approaches manage to overcome the occlusion, as shown in Figs (b)b and (c)c. In particular, the GPR solution, thanks to its interpolation capabilities, produces a smoother behavior w.r.t. our kNN implementation (cf. Figs (b)b(e)e with (c)c(f)f ).
VB Robot experiments
For the experiments, we used the 7 degreesoffreedom robot arm Panda by Franka Emika, with an Intel RealSense RGBD sensor mounted at the endeffector. The sensor, used as monocular camera, outputs images with a resolution of pixels at a nominal framerate of Hz. The image processing, used to detect the point features, was implemented using the open source library OpenCV [20]. A calibration procedure computed the intrinsic camera parameters and the camera–endeffector displacement. The camera velocity commands, computed by VPC, were indeed transformed in the robot frame and sent to the robot Cartesian controller. As task, the robot had to place an object inside a box where we placed four known markers. Being the box pose unknown, VPC was used to drive the robot over the box and, after convergence, release the object. On the image we considered two constraints to take into account the occlusion of the object grasped by the robot, and emulate a spot in the center of the lens as a blurred area. VPC was set with , (decreasing it to approaching the convergence), while the commands bounds were set to m/s and rad/s.
The memory ( trajectories for a total of samples) was built with , solver optimality tolerance of and maximum iterations. The iterative building and the adaptive sampling were not used. To be conservative, the spot considered in the memory was bigger than the one in the experiments. Given the simulation results, we decided to use the GPRbased strategy. Data were subsampled by a factor , with . The trigger signal to query the memory was activated at pixels from the occlusions.
For the online experiments, we set , the solver was given maximum iterations and as optimality tolerance. With this setting, and for some initial robotbox configuration, the previous iteration strategy was not capable to achieve the task, as shown in the snapshots of Fig. 5. While moving the visual features (blue dots, see Fig. (a)a) towards the target (red circle), the features met the blurred spot (Fig. (b)b) causing the loss of a feature and the consequent failure of the task (Fig. (c)c). The same experiment has been carried out with the GPRbased approach, see Fig. 6 where both robot and camera view are shown. Starting from the same initial condition (Fig. (a)a), at the proximity of the constraint (Fig. (b)b), the memory provides proper way point (depicted as red crosses on the image plane) and warmstart which allow to successfully achieve the desired task (Fig. (c)c). In Fig. 7 are shown the velocity commands sent to the robot during the execution. The experiments are shown in the accompanying video.
Vi Conclusion and Future Work
In this paper, we addressed the problem of efficiently achieving visual predictive control tasks. Using a memory of motion, we could exploit previous solutions to better fulfill online tasks. Furthermore, leveraging precomputation contained in the memory, we could set a short VPC preview window without invalidating the results. The algorithm performances rely on the precomputed dataset; we plan to improve the quality of the memory using a global optimizer or a planner. Furthermore, more sophisticated paradigm of active learning can be employed to build a minimal memory, containing less but more informative samples. In the presented work, the memory is queried using kNN and GPR. As shown with both simulations and experiments, these methods were able to outperform the standard VPC scheme. However, we believe that the performance could be even improved by considering other kinds of regressors that can cope with multimodality, as done in [26]. In the presented results we show that the use of a memory of motion helps also to keep the computation time limited. However, more effort will be done in order to ensure full realtime performances. Finally, further developments will be devoted to include the proposed scheme within the optimization framework of more complex systems such as humanoids.
Footnotes
 For example, if point features are used, the visual pattern is the polygon having the visual features as vertexes.
 Hereafter, , and refer to the identity, allones and null matrix. When not explicitly marked, the dimensions are inferred from the context.
References
 (2017) Visual servoing in an optimization framework for the wholebody control of humanoid robots. IEEE Robot. and Autom. Lett. 2 (2), pp. 608–615. Cited by: §I.
 (2010) Predictive control for constrained imagebased visual servoing. IEEE Trans. Robot. 26 (5), pp. 933–939. Cited by: §I, §II, §II.
 (2010) Visual servoing via nonlinear predictive control. In Visual Servoing via Advanced Numerical Methods, G. Chesi and K. Hashimoto (Eds.), pp. 375–393. Cited by: §I, §II, §II.
 ([2008) Realtime visual predictive controller for imagebased trajectory tracking of a mobile robot. IFAC Proceedings Volumes 41 (2), pp. 11244–11249. Cited by: §I.
 (2012) A robot path planning framework that learns from experience. In IEEE Int. Conf. on Robotics and Automation, pp. 3671–3678. Cited by: §III.
 (2006) Visual servo control, Part I: basic approaches. IEEE Robot. Autom. Mag. 3 (4), pp. 82–90. Cited by: §I.
 (2007) Visual servo control, Part II: advanced approaches. IEEE Robot. Autom. Mag. 14 (1), pp. 109–118. Cited by: §I.
 (1998) Potential problems of stability and convergence in imagebased and positionbased visual servoing. In The Confluence of Vision and Control, pp. 66–78. Cited by: §I.
 (2012) Online motion synthesis and adaptation using a trajectory database. Robotics and Autonomous Systems 60 (10), pp. 1327–1339. External Links: ISSN 09218890 Cited by: §III.
 GPy: a gaussian process framework in python. External Links: Link Cited by: §V.
 (2019) Avoidance of convex and concave obstacles with convergence ensured through contraction. IEEE Robot. and Autom. Lett. 4 (2), pp. 1462–1469. Cited by: §III.
 (2017) A new framework for optimal path planning of rectangular robots using a weighted norm. IEEE Robot. and Autom. Lett. 2 (3), pp. 1460–1465. Cited by: §V.
 (2009) Trajectory prediction: learning to map situations to robot trajectories. In Int. Conf. on Machine Learning, pp. 449–456. Cited by: §III.
 SciPy: open source scientific tools for Python. External Links: Link Cited by: §V.
 (2009) Standing balance control using a trajectory library. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3031–3036. Cited by: §III.
 (2018) Using a memory of motion to efficiently warmstart a nonlinear predictive controller. In IEEE Int. Conf. on Robotics and Automation, pp. 2986–2993. Cited by: §III.
 (2018) Leveraging precomputation with problem encoding for warmstarting trajectory optimization in complex environments. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 5877–5884. Cited by: §III.
 (2002) Path planning for robust imagebased control. IEEE Trans. Robot. Autom. 18 (4), pp. 534–549. Cited by: §I.
 (2012) Discovery of complex behaviors through contactinvariant optimization. ACM Trans. Graph. 31 (4), pp. 1–8. Cited by: §III.
 Open source computer vision library. External Links: Link Cited by: §VB.
 (2018) Interlinked visual tracking and robotic manipulation of articulated objects. IEEE Robot. and Autom. Lett. 3 (4), pp. 2746–2753. Cited by: §I.
 (2018) Deepmimic: exampleguided deep reinforcement learning of physicsbased character skills. ACM Trans. Graph. 37 (4), pp. 143. Cited by: §III.
 (2018) Visionbased reactive planning for aggressive target tracking while avoiding collisions and occlusions. IEEE Robot. and Autom. Lett. 3 (4), pp. 3725–3732. Cited by: §I.
 (2010) Interior point methods for nonlinear optimization. In Nonlinear optimization, pp. 215–276. Cited by: §III.
 (2006) Gaussian processes for machine learning. MIT Press, Cambridge, MA, USA. Cited by: §IVB.
 (2019) Memory of motion for warmstarting trajectory optimization. In submitted, pp. –. External Links: Link Cited by: §III, §VI.
 (2006) Image based visual servoing through nonlinear model predictive control. In IEEE Conf. on Decision and Control, pp. 1776–1781. Cited by: §I.
 (2014) Distance based dynamical system modulation for reactive avoidance of moving obstacles. In IEEE Int. Conf. on Robotics and Automation, pp. 5618–5623. Cited by: §III.
 (2006) Policies based on trajectory libraries. In IEEE Int. Conf. on Robotics and Automation, pp. 3344–3349. Cited by: §III.
 (2011) A convex, smooth and invertible contact model for trajectory optimization. In IEEE Int. Conf. on Robotics and Automation, pp. 1071–1076. Cited by: §III.