Rolling in the Deep – Hybrid Locomotion for WheeledLegged
Robots using Online Trajectory Optimization
Abstract
Wheeledlegged robots have the potential for highly agile and versatile locomotion. The combination of legs and wheels might be a solution for any realworld application requiring rapid, and longdistance mobility skills on challenging terrain. In this paper, we present an online trajectory optimization framework for wheeled quadrupedal robots capable of executing hybrid walkingdriving locomotion strategies. By breaking down the optimization problem into a wheel and base trajectory planning, locomotion planning for high dimensional wheeledlegged robots becomes more tractable, can be solved in realtime onboard in a model predictive control fashion, and becomes robust against unpredicted disturbances. The reference motions are tracked by a hierarchical wholebody controller that sends torque commands to the robot. Our approach is verified on a quadrupedal robot that is fully torquecontrolled, including the nonsteerable wheels attached to its legs. The robot performs hybrid locomotion with different gait sequences on flat and rough terrain. In addition, we validated the robotic platform at the Defense Advanced Research Projects Agency (DARPA) Subterranean Challenge, where the robot rapidly maps, navigates, and explores dynamic underground environments.
I Introduction
Legged robots offer the possibility of negotiating challenging environments and thus, are versatile platforms for various types of terrains [bellicoso2018jfr]. In research and industry, there is an emphasis on replicating nature to improve the hardware design and algorithmic approach of robotic systems [eckert2015comparing, nyakatura2019reverse]. Even with extensive research, matching the locomotion skills of conventional legged robots to their natural counterparts remains elusive. In contrast, wheels offer a chance to extend some capabilities, particularly speed, of these legged robotic systems beyond those of their natural counterparts, which can be crucial for any task requiring rapid and longdistance mobility skills in challenging environments. With this motivation, the central contribution of this work involves locomotion planning on a wheeledlegged robot to perform dynamic hybrid walkingdriving motions on various terrains, as shown in Fig. 1.
Ia Related Work
Finding control policies for performing walking motions in an articulated mobile robot is an involved task because of the system’s many degrees of freedom (DOF) and its nonlinear dynamics. This demands substantial computational power and introduces the challenge of overcoming local minima, making onthefly computations hard. Therefore, the online generation of optimal solutions for dynamic motions has been an active research area for conventional legged robots. Methods like trajectory optimization (TO) and model predictive control (MPC) are pervasive and recommended in the literature for aiding robots to be reactive against external disturbances and modeling errors.
In the literature concerning wheeledlegged robots, hybrid walkingdriving motions are scarce. The focus is mostly on staticallystable driving motions where the legs are used for active suspension alone [reid2016actively, giordano2009kinematic, cordes2014active, giftthaler2017efficient, suzumura2014real, grand2010motion]. These applications do not show any instance of wheel liftoffs, and hence sophisticated motion planning for the wheels is unnecessary and hence usually skipped.
Agile motions over steps and stairs were demonstrated for the first time in our previous work [bjelonic2019keep] where the motion trajectories were tracked by a hierarchical wholebody controller (WBC) that included the rolling conditions associated with the wheels. The robot was able to execute walking and driving motions, but not simultaneously due to the missing wheel trajectory generation. The work in [viragh2019trajectory] extends the approach by computing base and wheel trajectories in a single optimization framework. This approach, however, decreased the update rate to 50 Hz, and no hybrid walkingdriving motions were shown on the real robot.
CENTAURO, a wheeledlegged quadruped with a humanoid upperbody, performs a walking gait with automatic footstep placement using a linear MPC framework [laurenzi2018quadrupedal]. The authors, however, only perform walking maneuvers without making use of the wheels. Among the robots that employ hybrid walkingdriving motions, Jet Propulsion Laboratory’s (JPL) Robosimian uses a TO framework [bellegarda2019trajectory], but for passive wheels and results are only shown in a simulation. Skaterbots [geilinger2018skaterbots] provide a generalized approach to motion planning by solving a nonlinear programming (NLP) problem. This approach, however, is impractical to update online in a receding horizon fashion, i.e., in a MPC fashion, due to excessive computational demand.
Given the state of the art, we notice a research gap in trajectory generation methods for hybrid walkingdriving motions on legged robots with actuated wheels, which can be both reliable on various terrains and be used onthefly. Fortunately, research in traditional legged locomotion offers solutions to bridge this gap. The quadrupedal robot ANYmal (without wheels) performs highly dynamic motions using MPC [bellicoso2017dynamic, grandia2019feedback] and TO [winkler2018gait, carius2019trajectory] approaches. Impressive results are shown by MIT Cheetah, which performs blind locomotion over stairs [di2018dynamic] and jumps onto a desk with the height of 0.76 m [nguyen2019optimized]. The quadrupedal robot HyQ shows an online, dynamic foothold adaptation strategy based on visual feedback [magana2019fast]. Therefore, we conjecture that extending these approaches to wheeledlegged systems can aid in producing robust motions.
IB Contribution
We present an online TO framework for wheeledlegged robots capable of running in a MPC fashion by breaking the problem down into separate wheel and base TOs. For dynamicallyconsistent motions, our wheel TO takes the rolling constraints of the wheels into account, while our base TO accounts for the robot’s balance during locomotion using the idea of the zeromoment point (ZMP) [vukobratovic2004zero]. A hierarchical WBC [bjelonic2019keep] tracks these motions by computing torque commands for all joints. Our hybrid locomotion framework extends the capabilities of wheeledlegged robots in the following ways:

Our framework is versatile over a wide variety of gaits, such as, pure driving, statically stable gaits, dynamically stable gaits, and gaits with fullflight phases.

We generate wheel and base trajectories for hybrid walkingdriving motions in the order of milliseconds. Thanks to these fast update rates, the resulting motions are robust against unpredicted disturbances, making realworld deployment of the robot feasible. Likewise, we demonstrate the performance of our system at the DARPA Subterranean Challenge, where the robot autonomously maps, navigates, and searches dynamic underground environments.
Ii Motion Planning
The wholebody motion planner is based on a task synergy approach [farshidian2017planning] which decomposes the optimization problem into wheel and base TOs. By breaking down the problem into these two tasks, we hypothesize that the problem of locomotion planning for highdimensional (wheeled)legged robots becomes more tractable. The optimization can be solved in realtime in a MPC fashion. With high update rates, the locomotion is able to cope with unforeseen disturbances.
The main idea behind our approach is visualized in Fig. 2. Given a fixed gait pattern and the reference velocities ^{1}^{1}1The reference velocities are generated from an external source, e.g., an operator device, or a navigation planner as used in the DARPA Subterranean Challenge. with respect to (w.r.t.) the robot’s base frame as shown in Fig. 3, i.e., the linear velocity vector of its center of mass (COM) and the angular velocity vector , desired motion plans are generated in two steps, where the wheel TO is followed by a base TO which satisfies the ZMP [vukobratovic2004zero] stability criterion. The latter simplifies the system dynamics for motion planning of the COM to enable realtime computations onboard. Finally, a controller tracks these motion plans by generating torque commands which are sent to the robot’s motor drives. Due to this decomposition of the locomotion problem, the wheel TO, the base TO, and the tracking controller can run in parallel.
The following two sections discuss the main contribution of our work and show how the locomotion of the independent wheel and base TOs are synchronized to generate feasible motion plans.
Iii Wheel Trajectory Optimization
We formulate the task of finding the wheel trajectories, i.e., the , and trajectories w.r.t. a wheel coordinate frame as illustrated in Fig. 3, as a quadratic programming (QP) problem given by
(1)  
subject to 
where is the vector of optimization variables. The quadratic objective is minimized while respecting the linear equality and inequality constraints. In the following, the parameterization of the optimization variable is presented, and we introduce each of the objectives, equality constraints and inequality constraints which form the optimization problem.
Iiia Parameterization of Optimization Variables
We describe the wheel trajectories as a sequence of connected splines. In our implementation, one spline is allocated for segments where the wheel is in contact with the ground, and two splines are used for describing the trajectory of the wheels in air. Therefore, the total number of splines for one gait sequence is (see Fig. 3). These two types of trajectory segments, i.e., corresponding to leg in air and in contact, are defined by different parameterizations as described next.
IiiA1 Wheel segments in air
We parameterize each coordinate of the wheel trajectory in air as quintic splines. Thus, the position vector at spline segment is described by
(2) 
where and contains the polynomial coefficients. Here, describes the time interval of spline with a duration of , where is the sum of all the previous splines’ durations (see example of the fourth spline in Fig. 3). We seek to optimize the polynomial coefficients for all coordinates of spline segment and hence contain them in the vector .
IiiA2 Wheel segments in contact
As shown in our previous work [viragh2019trajectory], we employ a different parameterization for wheel segments in contact, such that they inherently capture the velocity constraints corresponding to nolateralslip of the wheel. For this purpose, we represent the wheel’s velocity in the coordinate, i.e., the rolling direction, as a quadratic polynomial, while the velocities of the remaining directions are set to zero. Thus, the velocity vector of the th spline is
(3) 
and the position vector is obtained by integrating w.r.t. and adding the initial position and of the trajectory as
(4) 
where the rotation matrix describes the change in the wheel’s orientation caused by the reference yaw rate, i.e., the vector . By assuming a constant reference yaw rate over the optimization horizon, the integration is solved analytically, giving a linear expression w.r.t. the coefficients . Thus, the velocity and acceleration trajectories of spline are described by and , respectively.
IiiB Formulation of Trajectory Optimization
To achieve robust locomotion, we deploy an online TO which is executed in a MPC fashion, i.e., the optimization is continuously reevaluated providing a motion over a time horizon of seconds, where can be chosen as the stride duration of the locomotion gait.
The complete TO of the wheel trajectories is formulated as a QP problem as follows,
(5)  
if leg in contact:  
if leg in air:  
s.t.  
where each element is described in more detail in the following sections.
IiiC Objectives
The following sections describe the cost terms introduced in (5) in detail.
IiiC1 Acceleration minimization
The acceleration of the entire wheel trajectory is minimized to generate smooth motions. The cost term for a wheel in air over the time duration of spline is given by
(6) 
where is the hessian matrix, and is the corresponding weight matrix. Here, the linear term of (1) is null, i.e., . Similar, for a spline segment in contact, the hessian matrix, , is obtained by squaring and integrating the acceleration of the wheel trajectory over the time duration . Here, the time matrix , and hence, is dependent on the reference yaw rate as discussed in (4).
IiiC2 Minimize deviations from previous solution
For a TO with high update rate, large deviations between successive solutions can produce quivering motions. To avoid this, we add a cost term that penalizes deviations of kinematic states between consecutive solutions. We penalize the position deviations between the optimization variables from the current solution and the previous solution as
(7) 
where is the position vector of the wheel from the previous solution shifted by the elapsed time since computing the last solution, and is the corresponding weight matrix. This cost is penalized over the time horizon with sampling points, where is the time at time step and . Objectives for minimizing velocity and acceleration deviations are added in a similar formulation.
IiiC3 Track reference velocity of wheels in contact
As shown in (3), the velocity along the rolling direction of the wheel trajectory is described by a quadratic polynomial which inherently satisfies the noslip constraint. To track the reference velocity , we minimize the norm which gives
(8) 
where .
IiiC4 Minimize deviations from default wheel positions
When a wheel is in contact, differences in heading velocities of the wheels and the base can lead to configurations where the corresponding leg can get extended in the forward or backward direction. To guide the optimizer towards solutions within a desired leg configuration, we minimize the distance of the wheel from a default position along the rolling direction as
(9) 
where is the corresponding weight, and the sampling over the th contact segment’s time duration is the same as shown in the paragraph below (7).
IiiC5 Foothold projection
The placement of the wheel after a swing phase is crucial for hybrid locomotion (and for legged locomotion in general) because it contributes to maintaining balance and reacting to external disturbances. As shown in (5), the cost term to guide the foothold placement is given by , where is the weight matrix, and is the touchdown time of spline segment in air, i.e., at the end of the spline in air representing the second half of the swing phase (see Fig. 3). The subscript indicates that only footholds on the terrain plane are considered, i.e., the component is given by the height of the terrain estimation.
The position vector guides the locomotion depending on the reference velocity given by the linear velocity vector and the angular velocity vector as
(10) 
where is a specified default wheel position similar to (9), and is the position vector from the robot’s COM to the projection of the measured wheel position onto the terrain plane.
Decoupling the locomotion problem into wheel and base TOs requires an additional heuristic to maintain balance. Balancing is achieved by adding a feedback term to the foothold obtained from reference velocities, through an inverted pendulum model [gehring2016practice, raibert1986legged] given by
(11) 
where and are the reference and the measured velocity between the associated hip and base frame, respectively. Here, is the height of the hip above the ground, represents the gravitational acceleration, and is the gain for balancing.
IiiC6 Swing height
Similar to the objective in Section IIIC3, we guide the wheel TO to match a predefined height. The objective given in (5) can be expanded, with a weight of , to
(12) 
with , and is the time at maximum swing height of spline segment in air, i.e., at the end of the spline in air representing the first half of the swing phase (see Fig. 3).
Similarly, we set the and coordinates of the swing trajectory at maximum swing height to match the midpoint of liftoff and touchdown position.
IiiD Equality Constraints
The following sections describe the equality constraints introduced in (5) in more detail.
IiiD1 Initial states
To achieve a reactive behaviour, every optimization is initialized with the current state of the robot. As discussed in (4), the initial position of the wheel segments in contact are set as equality constraints given by
(13) 
where the initial values and are the measured positions of the wheel.
If the optimization problem begins with a wheel trajectory in air, we set the initial position, velocity, and acceleration to the measured state of the wheels, i.e., , , and .
IiiD2 Spline continuity
We constrain the position, velocity and acceleration at the junction of the two wheel trajectory segments and in air as
(14) 
Junction constraints between air and contact phases are only formulated on position and velocity level. Here, the acceleration is not constrained so that the optimizer accepts abrupt changes in accelerations, allowing liftoff and touchdown events.
IiiE Inequality Constraints
The following section describes the inequality constraint introduced in (5) in more detail.
IiiE1 Avoid kinematic limits
To avoid overextension of the legs, we keep the wheel trajectories in a kinematic feasible space which is approximated by a rectangular cuboid centered around the default positions as in (9). As introduced in (5), the kinematic limits , , and are enforced over the full time horizon as , , , with a fixed sampling time similar to (7).
Iv Base Trajectory Optimization
The online TO of the base motion relies on a ZMP [vukobratovic2004zero]based optimization which continuously updates reference trajectories for the freefloating base. Here, we extend the approach shown in our previous work [bjelonic2019keep] which originates from the motion planning problem of traditional legged robots [bellicoso2017dynamic]. Given the wheel TO in (5), we are now able to generalize the idea of the ZMP to wheeledlegged systems taking into account the moving contact points when in contact.
As shown in Figure 2, the motion planner of the freefloating base is described by a nonlinear optimization problem, which minimizes a nonlinear cost function subjected to nonlinear equality and inequality constraints . Here, the vector of optimization variables is composed of the position of the COM and the yawpitchroll Euler angles of the base .
Iva Parameterization of Optimization Variables
The trajectories for each DOF of the freefloating base is represented as a sequence of quintic splines, which allows to set position, velocity and acceleration constraints. Thus, the parameterization is formulated similar to the definition of the wheel trajectories in air given in Section IIIA1.
IvB Formulation of Trajectory Optimization
The online TO of the base has a similar setup as the TO described in (5). Cost terms are added to maintain smooth motions, and to track the reference velocity. The equality constraints initialize the variables with the current measured state of the base, and add junction constraints between consecutive splines. For balancing, we add a ZMP inequality constraint, which is described in more detail in the next section, since this is the only part of the base optimization problem which is affected by the computed wheel trajectories in Section III. A complete list of each objective and constraint can be obtained in [bjelonic2019keep].
IvC Generalization of ZMP Inequality Constraint
The ZMP position is constrained to lie inside the support polygon^{2}^{2}2A support polygon is defined by the convex hull of the expected wheels’ contact locations. throughout the optimization horizon to ensure dynamic stability of the robot [bellicoso2017dynamic]. This nonlinear inequality constraint is given by
(15) 
where [sardain2004forces] and is the the terrain normal. The gravitoinertial wrench [caron2017zmp] is given by and , where is the mass of the robot, is the angular momentum of the COM, and is the gravity vector. In contrast to [bellicoso2017dynamic, bjelonic2019keep], the line coefficients that describe an edge of a support polygon depend on time , since the contact points of wheeledlegged robots continue to move even when a leg is in contact, unlike conventional legged robots. The ZMP inequality constraint is sampled over the time horizon with a fixed sampling time .
V Experimental Results and Discussion
To validate the performance of our hybrid locomotion framework, this section reports experiments and realworld applications conducted on ANYmal equipped with nonsteerable, torquecontrolled wheels (see Fig. 1). A video^{3}^{3}3Available at https://youtu.be/ukY0vyMyfY showing the results accompanies this paper.
Va Implementation
The wheel TO, base TO, tracking controller, and state estimator are running on a single PC (Intel i77500U, 2.7 GHz, dualcore 64bit). All computation regarding the autonomy, i.e., perception, mapping, localization, path planning, path following, and object detection, is carried out by three different PCs. The robot is fully selfcontained in terms of computation and perception.
As shown in our previous work [bjelonic2019keep], the computed trajectories in Section III and Section IV are tracked by a hierarchical WBC which generates torque commands for each actuator by accounting for the full rigid body dynamics and physical constraints, i.e., nonholonomic rolling constraint, friction cone, and torque limits. The WBC runs together with state estimation [bloesch2018two] in a 400 Hz loop. Similar to [bloesch2013stateconsistent], we fuse the inertial measurement unit (IMU) reading and the kinematic measurements from each actuator to acquire the robot’s state. Moreover, the frame in Fig. 3 requires an estimate of the terrain normal. In this work, the robot is locally modeling the terrain as a threedimensional plane, which is estimated by fitting a plane through the most recent contact locations [bjelonic2019keep].
We model and compute the kinematics and dynamics of the robot based on the opensource Rigid Body Dynamics Library (RBDL) [rbdl] which uses the algorithms described in [featherstone2014rigid]. The nonlinear optimization problem in Section IV is solved with a custom sequential quadratic programming (SQP) algorithm, which solves the problem by iterating through a sequence of QP problems. Each QP problem including the optimization problem in Section III is solved using QuadProg++ [quadProg], which internally implements the GoldfarbIdnani activeset method [goldfarb1983numerically]. To maintain a positive definite Hessian in (1) and to ensure the convexity of the resulting QP problem, a regularizer is added to its diagonal elements, e.g., as in [bellicoso2017dynamic].
VB Solver Time of Different Contact Scheduler and Gait Switching
As shown in Table I, the wheel and base optimizations are solved in the order of milliseconds, and a great variety of gaits from driving, i.e., all legs in contact, up to gaits with fullflight phases are possible. Besides, the accompanying video shows manual gait switches between driving and hybrid walkingdriving gaits, which can be useful for future works regarding automatic gait switches.
Gait  / (s)  Wheel TO / (ms)  Base TO / (ms) 

Driving  1.7  0.14  6.93 
Hybrid walk  2.0  0.81  14.83 
Hybrid pace  0.95  0.42  1.88 
Hybrid trot  0.85  0.47  2.4 
Hybrid running trot  0.64  0.58  5.77 
VC Rough Terrain Negotiation
The robot is capable of blind locomotion in a high variety of unstructured terrains, e.g., inclines, steps, gravel, mud, and puddles. Fig. 1 and the accompanied video shows the performance of the robot in these kinds of environments.
VD High Speed and Cost of Transport
On flat terrain, the robot achieves a mechanical cost of transport (COT) [bjelonic2018skating] of 0.2 while hybrid trotting at the speed of 2 m/s and the mechanical power consumption is 156 W. The COT is by a factor of two higher than a pure driving gait at the same speed. A comparison to traditional walking and skating with passive wheels [bjelonic2018skating] shows that the COT is lower by 42 % w.r.t. the traditional trotting gait and by 9 % w.r.t. skating motions.
VE DARPA Subterranean Challenge: Tunnel Circuit
The first DARPA Subterranean Challenge, the Tunnel Circuit, was held close to Pittsburgh in the NIOSH mine. The main objective was to autonomously search, detect, and provide spatially referenced locations of artifacts inside the underground mine. The wheeled version of ANYmal participated in two runs as part of the CERBERUS team [cerberus] alongside flying and other mobile platforms. Moreover, the wheeled quadrupedal robot was deployed next to the traditional version of ANYmal without wheels.
As depicted in the lower images of Fig. 1, the terrain consisted of hilly, bumpy, and muddy terrain and in some parts of the mine, the robot needed to cross puddles. Throughout both runs, the robot traversed the terrain with a hybrid trot. In the first run, the wheeled version of ANYmal managed to traverse 70 m without major issues, and the robot successfully reported the correct location of one artifact. In the end, however, one of the wheels started slipping on the muddy terrain before the fall. As can be seen in the accompanying video, the robot manages to balance after the first slip because of our implementation of the inverted pendulum model in (11). The mechanical design was improved after the first run by adding a chain around the wheels to increase the friction coefficient while traversing the mud (see the right middle image of Fig. 1). Fig. 4 and 5 show the measured and desired trajectories of the COM and wheels for a few meters of the whole run. Here, it can be seen that the robot executes a hybrid trotting gait since during ground contact, the wheel moves along its rolling direction. The robot traveled for more than 100 m before it fell due to a hardware issue on one of the motors.
Due to the time limitation of the challenge, the speed of mobile platforms becomes an important factor. Most of the wheeled platforms shown from the other competing teams were faster than our traditional legged robot by a factor of two or more. The upcoming Urban Circuit of the Subterranean Challenge includes stairs and other challenging obstacles. Therefore, we believe, only a wheeledlegged robot is capable of combining speed and versatility. At the Tunnel Circuit, the wheeled version of ANYmal traversed with an average speed of 0.5 m/s which was more than double the average speed of the traditional legged system. Our chosen speed was limited by the update frequency of our mapping approach or otherwise could have traversed the entire terrain with much higher speeds without any loss in agility. On the whole, the performance validation for realworld applications is satisfying, and a direct comparison with the traditional ANYmal reveals the advantages of wheeledlegged robots.
Vi Conclusions
This work presents an online TO for generating hybrid walkingdriving motions on a wheeled quadrupedal robot. The optimization problem is broken down into wheel and base trajectory generation. The independent wheel and base TOs are synchronized to generate feasible motions by time sampling the prior generated wheel trajectories which form the support polygons of the ZMP inequality constraint of the base TO. The presented algorithm makes the locomotion planning for high dimensional wheeledlegged robots more tractable, enables us to solve the problem in realtime onboard in a MPC fashion, and increases the robustness in the robot’s locomotion against unforeseen disturbances.
To the best of our knowledge, this is the first time that a hybrid walkingdriving robot is deployed for realworld missions at one of the biggest robotics competition. To improve the reliability of our newly developed platform, the hardware maturity needs to be increased, e.g., improve the grip of the wheels. Additionally, an algorithmic approach with a higherlevel intelligence to judge when to switch between a pure driving gait and a hybrid walkingdriving gait can be beneficial. The robot uses, throughout the whole mission, a hybrid trotting gait and the switching to a pure driving gait could increase the speed and robustness of the locomotion.
Acknowledgment
The authors would like to thank the CERBERUS team consisting of University of Nevada, Reno, ETH Zurich, Sierra Nevada Corporation, University of California, Berkeley, and Flyability for helping to integrate the robot’s autonomy. Special thanks in this regards go to Eris Sako, Markus Stäuble, Giorgio Valsecchi, Marco Tranzatto, Fabian Tresoldi, Fabian Jenelten, Russell Buchanan, Takahiro Miki, Lorenz Wellhausen, Jan Carius, and Samuel Zimmermann from the Robotic Systems Lab of ETH Zürich.