MPCbased Controller with Terrain Insight for Dynamic Legged Locomotion
Abstract
We present a novel control strategy for dynamic legged locomotion in complex scenarios, that considers information about the morphology of the terrain in contexts when only onboard mapping and computation are available. The strategy is built on top of two main elements: first a contact sequence task that provides safe foothold locations based on a convolutional neural network to perform fast and continuous evaluation of the terrain in search of safe foothold locations; then a model predictive controller that considers the foothold locations given by the contact sequence task to optimize target ground reaction forces. We assess the performance of our strategy through simulations of the hydraulically actuated quadruped robot HyQReal traversing rough terrain under realistic onboard sensing and computing conditions.
 RCF
 Reactive Controller Framework
 CNN
 Convolutional Neural Network
 MPC
 Model Predictive Control
 LF
 left front
 RF
 right front
 LH
 left hind
 RH
 right hind
 IMU
 Inertial Measurement Unit
 LO
 Leg Odometry
 EKF
 Extended Kalman Filter
 DDT
 Drift per Distance Traveled
 AICP
 Autotunned Iterative Closest Point
 FF
 feedforward
 GRFs
 ground reaction forces
 GRF
 ground reaction force
 COM
 Center of Mass
 CTT
 Center of Mass (COM) tracking task
 CST
 contact sequence task
 VFA
 Visionbased Foothold Adaptation
 LF
 leftfront
 RF
 rightfront
 LH
 lefthind
 RH
 righthind
 HPU
 Hydraulic Pump Unit
 HPUs
 Hydraulic Pump Units
 DOFs
 degrees of freedom
 DOF
 degree of freedom
 ISA
 Integrated Smart Actuator
 IIT
 Istituto Italiano di Tecnologia
 VO
 Visual Odometry
 REU
 Remote Electronics Unit
 LIP
 Linear Inverted Pendulum
 QP
 Quadratic Program
 NMPC
 Nonlinear Model Predictive Control
 iLQR
 Iterative Linear Quadratic Regulator
 NLP
 Nonlinear Program
 MICP
 MixedInteger Convex Program
 ZOH
 zeroorder hold
 JSIM
 jointspace inertia matrix
 RMS
 root mean square
See pages  of preprintCover
I Introduction
Considering terrain morphology allows legged robots to traverse more complex scenarios (e.g., [kalakrishnan09iros, fankhauser18icra, belter11jfr]). Nevertheless, building a model of the terrain is often computationally costly, mainly because of the dense nature of visual data. On top of the mapping problem, to traverse the terrain safely feasible contact sequences are needed. Computing these contact sequences can also be costly [lin19icra, tonneau18tro]. In general, strategies that consider visual information of the terrain are mostly focused on trajectory optimization [farshidian17icra, winkler18ral, fernbach18iros, herzog15humanoids]. In most approaches, contact sequences and COM trajectory are computed prior to the motion, or are limited to (quasi) statically stable gaits to not compromise stability due to these time constraints [fankhauser18icra, kalakrishnan2011, belter19icra]. In this work, we combine the low computational time from our previous Visionbased Foothold Adaptation (VFA) strategy from [villarreal19ral] with a Model Predictive Control (MPC)based trunk controller. These two approaches are mutually beneficial to each other. On one hand, we exploit the computational gain that we obtain from the Convolutional Neural Network (CNN) in the VFA strategy, to generate safe contact sequences to be used in the MPCbased COM tracking controller. On the other hand, the VFA benefits from the MPCbased controller with respect to the foothold prediction. A foothold prediction is a future landing position based on the nominal trajectory of the legs and the trunk velocity. We stress that the foothold predictions are different from the state predictions computed using the MPC. We start from the premise that optimizing ground reaction forces (GRFs) accounting for future states using MPC will lead to better foothold predictions, since they depend both on foot trajectories and robot states. If the robot states have large acceleration peaks, the foothold prediction is affected negatively. An improved selection of the target GRFs would reduce acceleration peaks, improving foothold predictions. This allows the robot to handle more difficult scenarios, such as changes in elevation and orientation, in a safer and more reliable way, as demonstrated in simulations. We perform simulations using the quadruped robot HyQReal [hyqreal19irim]. Its four legs weigh in total (between 37% and 45% of the total weight of the robot, with and without onboard hydraulic/electric power units, respectively). This means that when fast motions are required, swing legs play a significant role in the robot dynamics. In this paper, we directly compensate for these effects by computing the wrench on the body due to the desired joint accelerations of the legs, improving state tracking and foothold prediction. The result is a more stable locomotion strategy, which is robust to a wider range of disturbances and is able to act preemptively to obstacles based on visual information.
We summarize the contributions of this paper as follows:

We devised a locomotion strategy that evaluates the terrain, generates safe contact sequences and allows for dynamic locomotion in difficult scenarios. The new strategy displayed an improvement in foothold prediction with respect to [villarreal19ral], reducing the prediction error by a percentage between 7% to 36% (approximately between and ).

We combine a CNNbased foothold adaptation strategy and an MPCbased trunk controller and show how they mutually benefit from each other. To the best of our knowledge, this is the first time that an MPCbased locomotion controller makes usage of the terrain geometry in combination with a machine learning strategy.

We improve the performance of the MPCbased controller, by directly compensating for the wrench exerted by the legs during swing phase due to their large inertia with respect to the total weight of the robot. This compensation renders the model used for the state prediction of the MPCbased controller more representative, since the leg inertia is handled separately, further reducing the error in foothold prediction.
This paper is structured as follows: Section II summarizes the previous work relevant to this research; Section III details the methodology used to derive our locomotion strategy; Section IV summarizes our results. Conclusions and future work are presented in Section V.
Ii Related Work
The spectrum of strategies when using MPC varies mostly depending on the tradeoff between model accuracy and computational cost. The work of Di Carlo et al. [iros18dicarlo] considers a simplified version of the centroidal dynamics model, neglecting leg inertia, and ignoring the effects of nonzero roll/pitch on the dynamics of the body. The optimization problem is still convex and it is solved in realtime as a Quadratic Program (QP). The strategy keeps the robot in balance during a range of highly dynamic motions (e.g., trot, bound, and gallop) on the Cheetah 3 robot [bledt18iros]. Some other approaches tackle the tradeoff between computational cost and model accuracy not by simplifying the wholebody dynamics, but by relying on reducing the computational cost of the optimization problem solver. A remarkable example is the one devised by Neunert et al. in [neunert17ral]. Their approach performs Nonlinear Model Predictive Control (NMPC) and relies on a custom solver based on the Iterative Linear Quadratic Regulator (iLQR) algorithm and exploits automatic differentiation [giftthaler17ar]. Using MPC has provided a systematic and robust way to deal with the quadruped locomotion problem. Nevertheless, in general it does not take into account future terrain within their prediction. Instead, it reacts to the terrain and relies on the fact that the continuous update of the state for the initialization of the optimization provides enough robustness. It is still challenging to leverage terrain information to improve the performance of MPC strategies. There are some trajectory optimization methods that consider terrain information for quadrupeds. The method devised by Winkler et al. in [winkler18ral] is able to optimize gait, COM trajectory and contacts on nonflat terrain based on a simplified centroidal dynamics model using an offtheshelf Nonlinear Program (NLP) solver. In a similar fashion, Aceituno et al. showcase a motion planning algorithm that computes gait pattern, contact sequences and COM trajectory as an outcome of a MixedInteger Convex Program (MICP) on several nonplanar convex surfaces in [aceituno2017ral]. However, in their work either the trajectory is computed only once before the motion is executed, or the terrain is assumed to be known and there are no experiments with vision sensors during the execution of the motion. The early works of Kalakrishnan et al. [kalakrishnan09iros, kalakrishnan2011] on LittleDog pioneered methods to include vision sensors for locomotion by relying on external motion capture systems. Belter et al. [belter11jfr] and Fankhauser et al. [fankhauser18icra] devised control architectures that allowed their legged platforms to traverse complex scenarios only using onboard sensing enhanced with vision, which were mostly demonstrated for statically stable motions. On our previous work [villarreal19ral], we presented a strategy that was able to adapt footholds based on a CNN. The approach generated swing leg trajectory adaptations in less than . This allowed us to execute dynamic locomotion in complex scenarios.
Iii Locomotion Strategy
Our goal is to produce robust and stable locomotion in complex scenarios using terrain information provided by onboard vision sensors. We combine MPC with a CNNbased foothold adaptation strategy [villarreal19ral]. The combination of these two strategies benefits each other. State predictions in the MPC are computed using the centroidal dynamics model and safe contact sequences are based on the VFA. Future footholds can be continuously computed by VFA approximately every , enabling the MPC to reason about the effects of future contacts, without having to consider them in the optimization. We then build upon these contacts to provide the reference pose for the robot along the prediction horizon. The block diagram shown in Fig. 1 describes our locomotion strategy. It entails three main elements: the contact sequence task, the COM tracking task and the Reactive Controller Framework (RCF) [barasuol13icra]. The contact sequence task provides the future contact locations according to the robot current states and the gait timing parameters. The COM tracking task is in charge of both generating and following a COM trajectory according to the contact sequence task, the robot current states, and the gait parameters. We use the RCF [barasuol13icra] as controller interface. This modular framework allows us to combine the RCF reactive layer block in Fig. 1 with our visionbased strategy. This layer is comprised by several modules that allow the robot to perform robust locomotion in rough terrain only using proprioception. The RCF helps us to seamlessly combine this reactive modules with MPC and the VFA. The user commands (see Fig. 1) are: forward velocity (composed of and velocities), yaw rate , duty factor , step frequency and gait sequence . and are provided via joystick commands. The rest of the parameters are preset by the user according to the desired gait and range of speeds. Below we explain the two main elements of our strategy: the contact sequence and the COM tracking tasks.
Iiia Contact Sequence Task
We extend the usage of the VFA [villarreal19ral] to provide the subsequent eight reference footholds (two strides). These footholds are used to generate the COM reference trajectory and provide the contacts to be used in the model described in Section IIIB.
Visionbased Foothold Adaptation
the purpose of the VFA is to continuously compute adjustments for the trajectory of the feet in order to avoid collisions, and unstable or unreachable landing positions. For a more detailed description on this method we refer the reader to [villarreal19ral]. For a leg in swing phase, we initially compute a prediction of its landing position based on the current velocity of the trunk (taken from the state estimator) and the trajectory of the foot (in our case a halfellipse) using the approximation:
(1) 
where is the predicted foothold position of leg , is the center of the ellipse of leg , is the time remaining to the next stance change of leg (for ), is the step length vector, and corresponds to the velocity of the base. All vector variables are given in world coordinates. In the case of the next touchdown of a swing leg, , where is the duty factor, is the step frequency and is the elapsed swing time since the latest liftoff of leg . The first two terms in (1) are related to the leg trajectory, while the third term is related to the displacement of the base. After computing the prediction of the next foothold, a 2D representation of the terrain around that foothold is acquired, namely a heightmap. We pretrain a CNN to learn the optimal footholds from heightmaps [villarreal19ral]. The CNN takes on average to evaluate a heightmap and output a safe foothold. This fast computation time allows us to continuously adapt the trajectory of the swing leg to reach the adapted foothold.
Reference Contact Sequence
We use the computational gain obtained by the VFA to evaluate further ahead in the terrain. Knowing that the gait is periodic and defined by the step frequency and the duty factor , we can estimate the timings for the nonimmediate foot contacts. Using these timings, one can compute the predicted foothold locations for each of the legs at every stance change (liftoff or touchdown) replacing them for in (1). We then use our CNNbased foothold adaptation to adjust the predicted foothold location. This is done for the next two gait cycles (eight contacts in total and 16 stance changes). On the right side of the series of snapshots of Fig. 2 an example of a safe foothold sequence can be seen. Namely, is the contact location of leg at stance change , for . In (1), is assumed constant in between stance changes. The CNN continuously provides safe contact sequences at task frequency (). These contact sequences are used both as future foot positions and to inform the MPC controller to improve the COM regulation, as explained in Section IIIB. This interaction is depicted in Fig. 1. One key feature of this approach is that the safe footholds are computed without including them as optimization variables in the MPC controller, which significantly decreases the complexity of the problem.
IiiB Com Tracking Task
Com Reference Generation
To provide the reference trajectory for the COM along the prediction horizon, we compute its location at every stance change based on the desired gait timings using and . For two gait cycles, there are a total of 16 stance changes, so we compute a total of 16 COM positions. Similarly to the third term of (1), we compute the reference yaw using the desired yaw rate as
(2) 
where is the yaw reference at stance change , is the current yaw angle of the body, and is the remaining time before the next stance change of . Using the reference for the yaw angle, we compute the reference position of the COM with respect to the world
(3) 
where is the reference position for the COM at stance change , is the rotation matrix around the axis about (with ) and is the reference velocity obtained from and . This provides the reference for the next and positions of the COM with respect to the world. The reference for the body roll and pitch relies on the contact configuration at each stance change. We estimate the orientation of the terrain and define that orientation as reference for the body. We also use the contacts to define a height reference position for the body (namely, ), setting it to remain at a constant distance from the center position of the approximated plane in the direction of the world axis. To obtain , and we derive numerically between samples of , and , respectively. Finally, we evenly sample the 16 reference points given by the stance changes, filling the gaps in between samples using a zeroorder hold (ZOH). We define a desired reference vector at evenly sampled time as
(4) 
with and . A series of COM references can be seen in Fig. 2.
Dynamic Model
our MPCbased balance controller is inspired by the work of Di Carlo et al. [iros18dicarlo]. We also model the robot as a rigid body subject to contact patches at each stance foot and we neglect the effects of precession and nutation as in [focchi2016]. However, there are two key differences in our approach: firstly, we do not define the reference roll and pitch angles to be zero. Additionally, although we do not explicitly consider the leg inertia in our model for control, we compensate for it by computing the wrench exerted by the legs using the actuated part of the jointspace inertia matrix and the desired accelerations of the joints. We explain how this is done by the end of this section. The dynamic model of the rigid body and its rotational kinematics are given by
(5)  
(6)  
(7) 
where is the position of the COM, is the ground reaction force (GRF) at foot , is the robot mass, is the gravitational acceleration, is the inertia tensor of the robot, is the th foot contact position, is the rotation matrix from body to world coordinates according to roll , pitch and yaw angles and is the robot’s angular velocity. The operator is the skewsymmetric matrix such that . In (6) we are neglecting precession and nutation effects, namely . We rewrite equations (5), (6) and (7) in statespace representation. Initially, from (7) we can obtain the angular velocity in terms of the body’s Euler angles as
(8) 
which can be rewritten as
(9) 
where and is the matrix that maps from euler angle rates to angular velocities. The only condition on to be invertible is , which in practice does not happen (it implies that the robot is pointed vertically). Thus, the angular rate can be obtained as
(10) 
We then define state vector , and rearranging (5), (6) and (10) the system can be described in statespace form as
(11) 
where is the vector of GRFs. Note that no assumptions are made about the orientation of the robot^{1}^{1}1If then: (except for ) and that we explicitly denote the dependence of and with respect to . In a similar fashion to [iros18dicarlo], we approximate the system dynamics in (11) to a discretetime linear system. Namely, for each reference vector (for , where is the prediction horizon length), we compute the approximate linear, discrete system matrices and . We first substitute the feet locations obtained from the contact sequence task into matrix for every contact configuration at time instant . However, matrices and are still dependent on the body orientation (in a nonlinear fashion). To obtain the linear, discrete time versions of these matrices, we follow a similar argument to [iros18dicarlo]. Assuming that the MPCbased controller will follow sufficiently close the reference trajectory given by , we substitute the values of into system matrices and . We also consider the values of and computed by the COM reference trajectory. We then discretize the system matrices using a ZOH. The discretetime linear system dynamics can be described as
(12) 
Model Predictive Control
we can obtain a discrete time evolution of the system by successive substitution of states into (12) to obtain the state evolution from to . Then, we can describe the dynamics as
(13) 
where is the stacked vector of states along the prediction horizon , and are the matrices built by successive substitution along the prediction horizon, is the actual robot state vector and is the stacked vector of ground reaction forces along the prediction horizon . We formulate the optimization problem to minimize the weighted leastsquares error between the states and the reference along the prediction horizon. We enforce the gait pattern and friction consistency by setting appropriate constraints. Namely, we solve the following optimization problem
subject to  
(14) 
where is the stacked vector of desired states along the prediction horizon^{2}^{2}2We redefine to match the state definition of , vectors , and correspond to the components of vector associated to coordinates , and , respectively, of the GRFs, is the friction coefficient between the ground and the feet, and are the limits on the component of , matrix is a matrix that selects the components of the GRFs that are in contact according to gait , and matrices and are weighting matrices. The optimization problem defined by (14) is a QP and can be efficiently solved by several offtheshelf solvers. After solving the problem in (14), we take the first 12 entries of the optimized control input vector , which correspond to the set of GRFs at time instant and compute the desired wrench coming from the MPCbased controller as
(15) 
where is the optimized GRF of foot .
Leg Inertia Compensation
the MPC model used for prediction neglects leg inertia. This assumption is acceptable for quasistatic motions. However, if the legbody weight ratio is significantly large, the wrench exerted by the legs on the body plays a significant role in the dynamics. We compensate for these effects in a simple, yet effective, manner. The floating base dynamics of a robot can be described by
(16) 
where is the floatingbase robot velocity, is the joint configuration, and are the direct unactuated and actuated parts of the jointspace inertia matrix, whereas and correspond to the cross terms between actuated and unactuated parts of the jointspace inertia matrix, and are the unactuated and actuated vectors of Coriolis, centrifugal and gravitational terms, is the vector of joint torques, and are the unactuated and actuated contact Jacobians and is the vector of GRFs. The crossterm matrix maps the joint accelerations to the robot spatial force acting on the floatingbase of the robot, namely
(17) 
In (17), the can be computed directly using measurements coming from the sensors. However, using the actual joint acceleration might lead to high frequency wrench signals. Instead, We use the desired joint acceleration coming from the torque mapper (see Fig. 1). Then, the leg inertia compensation wrench is given by . Thus, the total desired wrench is given by .
Iv Results
We performed simulations on HyQReal [hyqreal19irim], a hydraulically actuated quadruped robot^{3}^{3}3Link to video: https://youtu.be/CqlLRdohFwM. The leg and joint configuration of the robot are shown on the right side of Fig. 1. We use Gazebo [gazebo] to perform our simulations. Control commands are executed at a frequency of . Wrench values from the MPCbased controller are sent at a maximum frequency of and we use a ZOH in between control signals. The prediction horizon is set to comprise 2 gait cycles, partitioned in 20 samples. We solve the QP in (14) with a modified version of uQuadProg++ [quadprog++] to work with the C++ linear algebra library, Eigen. The leg inertia compensation wrench is computed at task frequency. The mapping is done using the Grid Map interface from [fankhauser14clawar]. Weighting marices are chosen as and , where defines the identity matrix.
Iva Simulations
LF  RF  LH  RH  

QP+LI+GC  0.012  0.012  0.012  0.012  
0.095  0.095  0.088  0.082  
MPC+IC  0.009  0.007  0.007  0.009  
0.0740  0.0611  0.0604  0.0761 
We perform three different simulations to assess the improvements in foothold prediction and locomotion robustness. Below we explain in detail the outcome of these tests.
Leg Inertia Compensation
we perform simulations commanding the robot to trot on flat terrain with a forward velocity of , a step frequency of , and a duty factor of 0.6. We start the simulation with our previous trunk controller [focchi2016]. We keep and change the controller configuration as the robot continues to trot. There are six possible configurations shown in Fig. 3 which combine the following control components: a) QP: standard QP trunk controller b) LI: stance leg impedance c) GC: gravity compensation d) IC: leg inertia compensation and e) MPC: model predictive controller. Figure 3 shows the error in velocity with respect to the commanded for one of the trials. The vertical dashed red lines indicate the moments when the controller configuration was changed. We check six different configurations, although we would like to stress that configuration C3 acts merely as a a transition between the standard QP and the MPC. This is because the MPC controller already compensates for the gravitational effects in the model. It can be noticed that when the inertia compensation wrench is applied, the accelerations of the body are greatly reduced. The best performing configuration corresponds to the combination C5 (fifth portion of graph in Fig. 3). Under this configuration, the robot dynamics resemble more those of the MPC model (which neglects leg inertia), since the leg inertia is being accounted for outside of the optimization.
Foothold Predictions and Robustness in the Presence of Disturbances
for the second simulation the robot is also commanded to trot on flat terrain with the same gait parameters as in the first simulation. This time we perturb it three times with of force with a duration of . Table I shows the root mean square (RMS) error and the maximum absolute value of the foothold prediction error for this simulation. The table helps us to compare the previous controller configuration with the MPCbased controller with leg inertia compensation. The RMS error when using MPC and leg inertia compensation is between 25% and 41% less with respect to the previous controller configuration. This represents between and of improvement. However, even if the average of the error is low in both cases, a single wrong prediction compromises the robot stability. In this scope, the maximum absolute value of the error is more representative of the reliability of the prediction under disturbances. In this case, the reduction of the error is between 7% and 36%. This represents between and of reduction of the foothold prediction error when using the MPCbased controller in combination with the leg inertia compensation.
Locomotion on Challenging Terrain
to verify the improvement in performance regarding locomotion on difficult terrain, we designed the challenging scenario shown in Fig. 2. The robot is commanded to trot with a forward velocity of , a step frequency of and a duty factor of 0.6. In order to select appropriate footholds, we use the VFA with two different control configurations a) QP + LI + GC and b) MPC + IC. To test the performance repeatability we did four trials with each control configuration. Figure 4 contains the plots corresponding to pitch angle, forward velocity, body height and an example of the foot trajectories for one of the trials with the MPC + IC configuration. Figure 2 shows 11 overlapped snapshots of the RVIZ visualization as the robot crosses the scenario, builds the map, and adjusts its footholds on the fly. Figure 2 also shows the reference trajectory of the center of mass given at the specific moment when the snapshot was taken, and the foot trajectories as the robot moves through the scenario. From Fig. 4 it can be seen that all four different trials using the MPC + IC strategy were successful and the variations in linear velocity, pitch and body height are significantly reduced with respect to the QP + IC + GC configuration. For this last configuration, the robot was not able to reach the end of the scenario for any of the trials. This task clearly shows the mutual benefits between the VFA and the MPCbased controller. The foothold prediction error is reduced when using the strategy here presented. Specifically, in the case of the MPC + IC, for all four trials and all legs (LF, RF, LH and RH), the maximum absolute value of the error in foothold prediction in all four trials was , while in the case of the previous controller configuration the error was up to .
V Conclusions and future work
We developed a dynamic locomotion strategy to traverse difficult terrain using visual information only coming from onboard sensors. We based this strategy on the combination of an MPCbased controller and a CNNbased foothold adaptation (namely the VFA). We showed that the interaction between these approaches is mutually beneficial and improves locomotion reliability and robustness. We also demonstrated that considering a compensation term accounting for the wrench due to the inertia of the legs, improves the performance of the MPCbased controller, due to a closer resemblance to the model used for state prediction. The various simulations validated these improvements. As future work we plan to validate the strategy here developed on the hydraulically actuated quadruped robot HyQReal.
References
LN@col1 writefiletoc LN@col2 \grid