Trajectory Optimization for Contact-rich Motions using Implicit Differential Dynamic Programming

Trajectory Optimization for Contact-rich Motions using Implicit Differential Dynamic Programming


This paper presents a novel approach using sensitivity analysis for generalizing Differential Dynamic Programming (DDP) to systems characterized by implicit dynamics, such as those modelled via inverse dynamics and variational or implicit integrators. It leads to a more general formulation of DDP, enabling for example the use of the faster recursive Newton-Euler inverse dynamics. We leverage the implicit formulation for precise and exact contact modelling in DDP, where we focus on two contributions: (1) Contact dynamics in acceleration level that enables high-order integration schemes; (2) Formulation using an invertible contact model in the forward pass and a closed-form solution in the backward pass to improve the numerical resolution of contacts. The performance of the proposed framework is validated (1) by comparing implicit versus explicit DDP for the swing-up of a double pendulum, and (2) by planning motions for two tasks using a single leg model making multi-body contacts with the environment: standing up from ground, where a priori contact enumeration is challenging, and maintaining balance under an external perturbation.


leftmargin=* \addbibresourceref.bib \AtBeginBibliography

This is a preprint / authors’ own draft version of paper prior to peer-reviewed publication.

I Introduction

The long-standing research goal of creating robots capable of physically interacting with our environment remains elusive. Typical tasks, such as moving around the environment (locomotion) and modifying the surroundings (manipulation), ultimately require a complex sequence of physical contacts between the robot and the external world. Achieving such capabilities requires effective solutions for producing contact-rich motions.

To date, we still have limited technologies to replicate animal- or human-level interaction skills on robots. This observation forces us to rethink the root of these limitations, which is more at an algorithmic and theoretical level rather than in hardware; it is nowadays possible yet difficult to validate a large range of physical capabilities in high-fidelity physics simulation. The scope here is on producing contact-rich motions for robot locomotion, leaving the applicability and adaptation on robot manipulation for future work.

Fig. 1: Complex multi-contact motions of a single leg robot computed by the proposed framework in time-lapsed snapshots: dynamic standing up from the ground (top), and balancing against an external perturbation (bottom).

I-a Machine Learning

Machine learning techniques can be integrated with optimization, where cost-preference learning alleviates the burden of manual design of cost functions for optimization-based planning and control. This achieved rough-terrain quadrupedal locomotion for Little-Dog robot [zucker2011]. Going beyond such a hierarchy of complex integration of learning, footstep planning, trajectory optimization and reflex control, reinforcement learning for continuous control provides an alternative in an end-to-end fashion. A reinforcement learning approach trained by an adaptive terrain curriculum demonstrated robust single-skill trotting that traversed a variety of indoor and outdoor unstructured environments [lee2020]. An architecture of multi-expert reinforcement learning is able to extend capabilities to multi-skill and multi-modal locomotion with coherent fall recovery, trotting, and all dynamic transitions in-between different modes [yang2020]. These neural network based feedback policies were trained in simulation and then deployed on real robots, but still demonstrated robustness under scenarios that were never encountered during training.

However, these neural network policies act as reactive feedback control that responds to the proprioceptive state feedback. It is hard to incorporate future target objectives for long-term temporal planning. Even though they are computationally fast to run in real time, it is difficult to guarantee the long-term stability and optimality of motions, i.e. whether the robot will fall in the future or whether a successful sequence of motions is more optimal in terms of energy efficiency and sufficient stability margins against uncertainties. Moreover, for safety critical applications, such an approach is not able to provide verifiable validity before execution. To overcome these limitations, a more mathematically principled approach enables to take into account knowledge about the constraints of the robot and environments, and to provide verifiable long-term stability and feasibility.

I-B Model-based Mathematical Optimization

Trajectory optimization (TO) has attracted increasing research interest for motion planning and control of highly dynamical, underactuated robots [kelly2017, xin2020]. Similar to learning approaches, it also has the potential of generating complex motions in a high-level manner: A user can design and specify a desired task using physical terms with associated weights via a cost function [todorov2018], which can also be automatically tuned [yuan2019]. This approach is quite flexible, encompassing a wide-range of cases; for example, additional contact points can be included in the optimization to increase the robustness against perturbations for loco-manipulation tasks [wolfslag2020].

This is particularly interesting for robotic systems that need to compute through-contact motion plans, which involve multiple contact interactions. Physical contacts are traditionally difficult to model and incorporate in motion planning frameworks. Most approaches are hybrid, in the sense that the contact schedule pattern [chatzinikolaidis2018, wang2020] or corresponding timings are provided a priori, while contacts are desirable with the end-effectors only. This leads to difficulties in practical implementations because selection of locations and timings is in general non-trivial, while restricting contacts to end-effectors only limits the motion repertoire.

DDP—a prominent shooting TO methodology [mayne1966]—is among the most promising approaches in terms of efficiency for through-contact motion planning. This is demonstrated by a multitude of previous works that used DDP as backbone: From impressive results in simulation [tassa2012], to real-time applications for high-dimensional legged robots [neunert2017a, carius2019]. However, properly modelling contacts has proven a considerable challenge; most DDP implementations resort to approximations and simplifications that require well-tuned contact parameters. A fundamental reason is that contact phenomena are canonically described implicitly.

The original DDP algorithm and its subsequent studies assume that the discrete-time systems considered are explicitly defined. While this is valid in broad terms, it excludes implicitly defined dynamical systems [hairer1993]. These are typically more challenging because they require the solution of nonlinear equations. However, they offers computational advantages, e.g. providing stability even for stiff differential equations. Further, handling implicitly defined systems allows more principled contact modelling in DDP.

I-C Contributions

In this work, we focus on the optimization paradigm and provide theoretical and algorithmic contributions as:

  • An extension to the DDP algorithm that handles explicitly and implicitly defined systems in a unified manner.

  • Based on this, we propose an approach leveraging an invertible model [todorov2014] for exact contact resolution in DDP.

  • Results demonstrating the possibility of exploiting properties of implicit integrators in DDP settings.

We benchmark our extension by applying it on implicitly and explicitly defined models, and on two cases of multi-contact whole-body motion planning for a planar single leg robot that makes multi-body contacts: Standing-up from ground and balancing from an initial perturbation in a receding horizon fashion (Fig. 1). Our approach is equally applicable to models with large degrees of freedom and arbitrary contact configurations, such as using multiple legs.

The remaining sections are organized as: Sec. II discusses prior work and extensions of the DDP algorithm, and applications of DDP for through-contact motion planning. Sec. III summarizes DDP and how contacts are typically resolved in simulation. In Sec. IV, we present our extension of DDP and, in Sec. V, how to utilize it for through-contact planning. Sec. VI provides comparisons between explicit and implicit systems in the context of DDP, and two motion planning studies for a single leg standing up and balancing in multi-contact settings. We summarize and conclude in Sec. VII.

Ii Prior Work on Differential Dynamic Programming

Ii-a Differential Dynamic Programming

DDP was originally introduced in [mayne1966]. Its main advantage with respect to the Dynamic Programming algorithm [bellman1962] is that it does not suffer from the curse of dimensionality by sacrificing global optimality. Subsequently, a number of improvements and extensions of DDP have been introduced. Recently, there was a resurgence of interest due to its potential for efficient planning for high-dimensional systems.

DDP is a second-order algorithm that exhibits quadratic convergence similar to Newton’s methods [liao1992]. Thus, it requires second-order information, which can be computationally challenging for high-dimensional models. To resolve this, the iLQR variant performs a Gauss-Newton approximation of the Hessian based on first-order information only, albeit with superlinear convergence [todorov2005].

The original DDP algorithm is concerned with unconstrained discrete dynamical systems only. Control bounds can be considered via a projected Newton quadratic programming (QP) solver [tassa2014]. More general nonlinear inequality constraints via an active-set method [xie2017]. In robotics, it is common to consider multiple tasks in a hierarchical fashion, which is possible to do for DDP too [geisert2017]. In legged locomotion, the discontinuous nature of contact phenomena has led to the development of tailored approaches. For example, a predefined gait pattern and centroidal dynamics model was considered in [budhiraja2018], and more general hybrid systems in [li2020]. We underline that the DDP framework presented next can incorporate the previous extensions straightforwardly.

Finally, a brief discussion about the application of DDP for implicitly defined systems from a Lie theoretic viewpoint is given in [boutselis2020]. Here, we present a more complete and deep treatment, with extensive comparisons. Furthermore, our vector-based formulation is much more familiar and common for robotic systems applications.

Ii-B Through-contact Motion Planning

Applications of DDP to motion planning and control for legged robots have been very impressive. From simple, approximate models up until whole-body models, DDP provides a means for fast and even real-time computations.

In [tassa2012], DDP was used to control a humanoid model. A diverse set of behaviours was generated by simply changing weights in the cost function through a GUI interface. An approximate solution for the contact dynamics was used, with a contact model similar to the one that is used here. In this work, the implicit formulation that we present next allows the consideration of contacts in DDP without requiring approximations to the contact model itself.

For quadruped robots, a diverse set of motions both in simulation and in hardware was shown in [neunert2017a]. To take into account contacts, a nonlinear spring-damper model was used. Even though tuning for each contact is done independently, spring-damper models can be difficult to tune in practice and require very small time steps. It is common for the optimizer to explore states where the current model parameters are not valid, while the small time steps translate into a large problem. Here, in the forward pass the model takes into account all possible contacts in a centralized manner (through the coupling with the contact-space inertia matrix), while independently solve for each contact at the backward pass (by leveraging our implicit DDP formulation and the model’s invertibility). Thus, performance is similar to complementarity formulations with large time steps, while we are capable to compute straightforwardly gradients in the backward pass.

To eliminate the unrealistic effects of spring-damper models, a hard contact model was used in [carius2018]. Unfortunately, contact impulses require the numerical solution of a quadratically constrained quadratic program (QCQP), typical in time-stepping approaches with unilateral and friction cone constraints, and formulates the problem in a bilevel fashion. This complicates the derivative computation due to the numerical nature of the solution. We resolve this issue by leveraging the invertibility of the contact model: in the forward pass, the QCQP is solved with the associated constraints; in the backward pass, a closed-form computation is used that avoids the bilevel formulation. As a result, this does not pose issues with differentiation and leads to a faster and simpler implementation, without the need for backpropagation.

A multiple shooting variant was presented in [mastalli2020], extending the work in [giftthaler2018a]. It allows easier initialization since both state and control sequences can be used. Unfortunately, the intermediate iterates of the algorithm are infeasible, meaning that early stopping with a feasible trajectory, as in DDP, is not possible. This is a necessary property in our case, since the through-contact motion planning approach that we present is running in a receding horizon fashion. Furthermore, the contact schedule is predefined in [mastalli2020], while here contacts are activated according to the natural dynamics of the system [posa2014]. Finally, friction cone constraints are neglected or can be taken into account through penalization in the cost function, which can be in practice difficult to tune and can lead to unrealistic solutions. Due to the imposition of contacts as equality constraints, attractive forces can arise at the solution, violating the unilateral constraint. Our framework here utilizes full unilateral and friction cone contact constraints without any approximation or penalization.

Iii Preliminaries

Iii-a Summary of Differential Dynamic Programming

DDP is concerned with the optimization of a performance criterion for an unconstrained discrete-time dynamical system [mayne1966, tassa2012]. This can be expressed as


Here, is an additive cost at time step and is the final cost, and are the state and control, is the length of the horizon, while denotes the quantity at the next time step, e.g. the next state in our context.

According to the principle of optimality, (1a) can be expressed via the value function, which is the total cost at a given state once we apply the optimal control sequence. The principle of optimality makes the computation of the value function iterative, and at a state is given by

Since finding the global minimum is challenging, DDP performs a quadratic approximation of the value function and subsequently improves the control sequence locally. If we define the -function as


a quadratic approximation about the current point is


while and are state and input perturbations. The terms in (3) are computed by expanding and matching same terms in creftype 2 as


Backward pass

The optimal control change is given by minimizing the unconstrained quadratic equation (3) as


The quadratic approximation of the value function at the current time step in (3) becomes


with boundary values and .

Forward pass

Once the feedforward and feedback terms and for each time step are computed, we perform a forward pass to compute the updated control sequence as


for . In practice, regularization and line search are necessary, as explained in [tassa2012].

Iii-B Simulation With Contacts

We summarize a typical simulation pipeline here in the presence of contacts [horak2019]. Contact resolution is usually done in the velocity–impulse level but our DDP is formulated at the acceleration–force level, which will be elaborated later.

The dynamics of a mechanical system are given by


where the mass matrix, the vector of nonlinear forces, a selection matrix that maps actuated joint torques to generalized coordinates, while denotes the Jacobian of the -th contact and the corresponding force. We simplify notation by dropping explicit dependence on quantities.

In time-stepping approaches, e.g. [todorov2014, horak2019], creftype 8 is discretized using an Euler approximation to obtain

where is the time step size and corresponds to the concatenation of the contact impulses at time step . These are projected in contact space

which can also be expressed as


with , , , and .

Different contact models pose different conditions on what constraints accompany creftype 9. In this work, the contact model defined in [todorov2014] is used because it is convex and analytically invertible. It penalizes movement in contact space by solving the following QCQP during the forward dynamics


where are the tangential and normal components, is a positive definite matrix that makes the solution unique and invertible, and is a Baumgarte stabilization reference.

The inverse dynamics is well-defined and for a diagonal we obtain an independent problem per contact


Iv Implicit DDP

Our point of departure from the original DPP algorithm is the dynamics in (III-A). Instead of the explicit dynamics, we assume dynamics of the form


This will allow us to apply DDP for systems expressed via inverse dynamics, implicit or variational integrators, etc. Our focus will be contact dynamics, but we return to this later.

The goal is to compute the derivatives for the quadratic approximation of the -function (4). Terms related to the running cost are trivial and will be omitted. Thus, we focus on the first and second-order sensitivity of the next step value function. A treatment of sensitivity analysis in the context of Newton methods can be found in [zimmermann2019].

Iv-a First-Order Sensitivity Analysis

The first-order sensitivity of the value function in (2) is

Here, is the sensitivity of the next step value function with respect to the current state, while is the sensitivity of the next step value function with respect to the next state; connected by the previous equation. Based on (12) we have


where it is assumed that for any and , can be computed so that (12) is satisfied. Combining the previous two equations gives

In practice, a faster computation can be achieved using the adjoint method [strang2007] by computing first the quantity

and then


If we confine ourselves in a first-order analysis only this is computationally advantageous [strang2007], but the computation of in (13) is required for the second-order expansion. By a similar reasoning, is computed as


which concludes our first-order analysis.

We now have all the ingredients for the first-order approximation of the -function. For example, the term in (4) is given by

Iv-B Second-Order Sensitivity Analysis

The second-order approximation of the value function is

The term constitutes a third-order tensor. We use matrix notation for the contractions but assume that their computation is clear from the context. It is computed as

By combining the last two equations we have that


For the remaining two terms in (4), a similar reasoning can be used to compute them as


This concludes the second-order sensitivity analysis. We can now compute all terms in (4). The rest of the DDP algorithm is implemented without changes.

It is worth pointing out that for the explicit dynamics (III-A) we have that , , and . Thus, as in (4). The same verification can be performed for the rest of the quantities.

Iv-C Gauss-Newton Approximation

Especially for robot models with many degrees of freedom, computing the tensor terms (16b) can be prohibitive expensive. Fortunately, it is possible to do a Gauss-Newton approximation of the Hessian—equivalent to iLQR—by ignoring them. Thus, the second-order sensitivity terms of the value function become


V Acceleration-level Contact Dynamics

Input: , , , , and .
Output: and .
1 Compute and based on creftypeplural 21 and 19.
2 Solve creftype 20 for the contact forces .
Solve creftype 12 together with (7b) for the next state .
Algorithm 1 Forward pass with contacts.
Input: , , , , and .
Output: and .
1 Compute , , and from creftypeplural 23, 21 and 19.
2 Compute creftype 12.
3 Differentiate steps 1 and 2 to compute , , , , , , , , , , .
4 Compute the value function terms in (14a) and (16b).
5 Compute the -function terms in creftype 4.
Compute the gains and in creftype 5, and the current value function terms in (6a) for the next iteration.
Algorithm 2 Backward pass with contacts.

We describe here a contact resolution framework in the acceleration level, rather than the commonly used velocity level. This way, we avoid the necessary first-order discretization of the dynamics. Thus, during integration of the state, an arbitrary order integrator can be used. Other assumptions are not required about the robot’s model (such as the assumption about a constant Jacobian in (9) that is inherent in the velocity-impulse formulations), without increasing the computation complexity. As such, we consider it a superior choice. It is also the default choice in MuJoCo [todorov2020], which is a state-of-the-art robotics simulator.

Starting from the continuous time dynamics (8), we multiply both sides by and add , which gives


We can interpret this equation as follows: is the unconstrained acceleration in contact space in the absence of any contacts, which is corrected by the term to result in the actual acceleration that satisfies the contact constraints.

As already explained, the contact model that we utilize was proposed in [todorov2014]. It computes the necessary contact forces by solving the following convex optimization problem that tries to minimize accelerations in contact space


which is the equivalent to (10) for accelerations.

While the bias accelerations can be in a general Baumgarte stabilization form, a choice that works reasonably good across models is


with the gap distance, positive when bodies are separate. The first term is used to cancel the same term in and and simplify computations. The second and third term are obtained by a Taylor expansion of the gap distance function and ignoring third and higher order terms.

In the forward pass, the above optimization problem is solved for the contact forces using a standard Projected Gauss–Siedel solver [horak2019]. Though in principle this can be implemented in the backward pass, the computation of the gradients becomes more complicated since we have to differentiate a numeric solution. Even with automatic differentiation, the quality of the gradients can suffer. Instead, a diagonal approximation of the system can be assumed and an approximate solution to the contact forces can be computed [tassa2012]. The implicit formulation avoids this issue and the exact solution for the contact forces is given in closed form.

Fig. 2: Aggregate results for the total trajectory cost of each variant.

By utilizing the implicit framework and the invertibility of the model, problem (11) is expressed in acceleration space


For the computation of as given by (19), the joint acceleration is required. In the classical DDP algorithm this is not available, since we only have access to the current state and , and the acceleration is computed after the contact forces. In the implicit form, since we have additionally available the next state , the computation of the acceleration is trivial. Thus, we can compute each contact force in closed form as


projects contact forces to the cone with coefficient  [horak2019].

After the computation of the contact forces, we can enforce the implicit dynamics creftype 12 either using a forward or inverse dynamics formulation. Given the available information, the computation of inverse dynamics is cheaper and numerically superior [hollerbach1980, ferrolho2021]. Furthermore, this decoupling between forward and backward pass allows us to avoid the rootfinding problem during the forward that would be necessary in a full implicit implementation. Having to solve the rootfinding problem in the forward pass increases the computation time of the implicit formulation. We summarize the DDP computations subject to contacts in Algos. 2 and 1.

Vi Results

Fig. 3: Aggregate results for the total number of iterations of each variant.

Vi-a Implementation Details

For the computation of the rigid-body dynamics, the Julia library RigidBodyDynamics.jl is used [koolen2019]. Computation of first-, second- and third-order tensor is done using forward-mode automatic differentiation [revels2016]. 1

We begin by performing multiple comparisons between implicit and explicit DDP formulations for a double pendulum swing-up task. Next, we present two problems that require multi-contact motion planning: A single leg that is required to a) stand up from the ground, and b) balance from an initial random state.

Vi-B Aggregate Double Pendulum Swing-up

For the double pendulum swing-up task, we generate random trials (that is, with random initial state) and we specify an objective that includes a desired upright posture at the end of a s horizon, while penalizing joint torques at intermediate states. Additionally, joint limits are modelled using unilateral forces at the joints. Only the unilateral constraint is imposed (forces push the joint away from the limit at violations), while friction is not required.

We compare four variants of methods presented in this work:

  • Implicit iLQR with backward Euler dynamics.

  • Implicit DDP with backward Euler dynamics.

  • Explicit iLQR with forward Euler dynamics.

  • Explicit DDP with forward Euler dynamics.

For every random initialization, the four variants are executed until convergence (or until an upper iteration limit is reached) and the number of iterations and total cost of the trajectory is logged. Aggregate box plot results for the cost and the number of iterations are shown in Figs. 3 and 2, respectively.

From the comparison, the implicit formulations result in considerably less iterations than the explicit counterpart. Both median, minimum and maximum values, and the rest of the statistical properties in Fig. 3 are improved with an implicit formulation regarding the number of iterations. As expected, the trade-off for this is the larger in general cost of the resulting trajectory in Fig. 2. This can be partially explained from the fact that since the explicit formulations perform on average more iterations, they are capable to fine-tune the resulting trajectory more. But given the considerable less iterations for the implicit formulations, this aspect is more important in terms of the overall performance.

A possible reason behind this is the integrator’s properties. Implicit Euler is an A-stable method suitable even for stiff systems. As such, it usually exhibits energy decrease—instead of the common increase in explicit methods—that makes the whole formulation more stable.

Vi-C Single Double Pendulum Swing-up

(i), (ii) with iLQR
(i), (ii) with DDP
(iii) with iLQR
(iii) with DDP
Fig. 4: Cost per iteration for the different formulations.

Cost per iteration and timings

We evaluate the cost per iteration for one double pendulum swing-up and compare 6 different formulations (each with a DDP and iLQR variant):

  1. Forward Euler dynamics in the forward and backward passes.

  2. Forward Euler dynamics in the forward pass, and forward Euler inverse dynamics in the backward pass.

  3. Backward Euler dynamics in the forward and backward passes.

We use the same cost function and initialize at the stable equilibrium, while the total duration of the motion is s, with a time step of ms, and steps. The results are shown in Fig. 4. Formulation (i) corresponds to a classical iLQR/DDP with explicit dynamics. Formulation (ii) is enabled by the presented framework. The computation of the Jacobian and tensor terms is based on the automatic differentiation of the inverse dynamics. Since (ii) is equivalent to (i), the solutions by the two approaches are exactly the same and are plotted together in Fig. 4. Differences are found for the computation time, as reported next. Formulation (iii) is implicit in both passes and enabled again by the presented framework.

In terms of computation, formulations (i) and (ii) with iLQR require iterations, while with DDP require iterations. In terms of timings, the mean time of each iteration for (i) with iLQR is ms and with DDP ms. For (ii) with iLQR is ms and with DDP ms. While the differences are not significant for such a low-dimensional model, these can become starker for robot models with larger degrees of freedom. For (iii), iterations for iLQR and iterations with DDP. The mean computation time of each iterations with iLQR is ms and with DDP ms. The increased computation is due to the solution of a nonlinear system of equations in the forward pass.

Effect of time step size

We focus now on the effect of the time step size to the solution of the problem. We solve the same problem as before for multiple time step selections and report the number of iterations required until convergence. Since formulations (i) and (ii) are equivalent, we focus the comparison on (i) and (iii). We solve them using iLQR but similar conclusions could be drawn if DDP was used.

The results are shown on Table I. For small time steps, the two formulations are essentially equivalent and, thus, require the same number of iterations. As the time step increases, the influence of the integrator’s damping in (iii) becomes more apparent. This results in a desirable decrease to the number of iterations for convergence. The motions are included again in the accompanying video. For larger time steps, the accuracy of both first-order integrators worsens significantly.

Time step
(i) / (ii)
TABLE I: Effect of time step on No. of iterations until convergence

Vi-D Multi-contact Stand-up

Aerial phase
Heel balancing
Tip balancing
Fig. 5: Time-lapse snapshots of contact-rich motions: (A) Standing-up from ground by a dynamic manoeuvre using large momentum with aerial phases. (B) Robust balancing by toe tipping and jumping that withstands an external push.

Next, we consider a planar 3 degree-of-freedom single leg of a humanoid robot and the task now is to stand-up upright from the ground. The model can make multiple contacts with the terrain using all the bodies of its structure, but self-collisions are inactive. We pre-define a number of possible contact points but we do not prescribe the contact activation pattern. Adding a contact detection mechanism and avoiding the pre-specification of contacts is another possibility, as typically done in simulation engines.

The cost function of the problem is defined as

A penalization of the velocity and joint torque is applied throughout the trajectory, while a goal state is defined in the final cost term. The motion duration is s with a time step of ms; this is a relatively large time step for contacts, but our aim here is to output an approximate contact-rich motion plan. Given this plan as input, it is possible to post-process it to increase the quality.

The friction coefficient is selected as . Parameter is initialized with a value of for all components. While in principle it can take arbitrary values, we can test the validity from a numerical viewpoint as follows [todorov2020]: We run the forward and backward pass separately and compare the computed forces. The two solutions should match according to the desired numerical precision.

The main difficulty is that the problem exhibits a number of contact possibilities. Thus, mode enumeration can be very challenging. Notice also how delicate heel balance emerges while reaching the upright configuration. Our trajectory optimization framework is capable to output a locally optimal motion plan. Even though a zero torque initial solution is used here, its quality greatly affects the quality of the computed motion. Finally, by changing the terms in the cost function, it is possible to obtain different solutions, e.g. more conservative but with higher torque cost.

The resulting motion can be found in the accompanying video. There is an initial explosive and dynamic motion at about s. Such a motion would be in practice difficult to track. Yet being able to compute such a complex motion from high-level input only demonstrates the power of DDP-based approaches. There are a couple of ways to mitigate that: an obvious approach is to increase the torque, position, and velocity penalization accordingly. Another option is to include terms that penalize the rate of the commanded torques. Finally, a more principled approach is to penalize high frequency components of the signals involved [grandia2019].

Vi-E Multi-contact Balancing

Using the same model as before, the state now is randomly initialized in the air. The task is to keep the initial posture with zero velocity, i.e. to balance. In contrast to the previous case, this problem is formulated in a receding horizon fashion. A fixed number of iterations for DDP is pre-specified; this makes real-time iterations of the algorithm possible. The horizon length is s, with the simulation running at Hz, while our framework runs at Hz. The structure of the cost function remains the same as before, albeit the weight regarding the final velocity is increased to bias more towards a static final configuration.

The balancing motion is shown at the bottom of Fig. 5. The computed motion naturally performs a series of jumps to dissipate kinetic energy and come to a complete stop. The underactuated foot tilting emerged as the outcome of optimization without programming explicit controllers as in [li2017]. Compared to the case in the previous section, the receding horizon formulation is capable of producing better motions in general. This is because the constant updates allows it to escape iterations with very small cost decrease, which can be common in the fixed horizon optimization of the previous case. If a bad initialization is specified or the horizon and frequency are not chosen properly, the receding horizon formulation can be trapped too. The selection of these parameters depends on the desired task and initial state.

Finally, a semi-log plot of the total trajectory cost at the beginning and at the end of each DDP step is shown in Fig. 6. We notice that in about runs a successful balancing motion is computed. Afterwards, each run rapidly converges to this motion. The reason why the cost is increased at the beginning of each run is because the horizon moves; the predicted trajectory for the new segment at the end of the previous horizon is that the robot will essentially fall, which incurs a large cost. Additionally, during the initial runs, the motion is highly unstable and a suitable balancing motion is not discovered yet. Thus, the total trajectory cost varies greatly between consecutive runs.

Cost at start
Cost at end
Fig. 6: Trajectory cost at each run of the receding horizon formulation.

Vii Conclusion

This work presents an extension to DDP that is able to handle implicit dynamical systems with particular focus on through-contact motion planning. This allows extending the original DDP to a larger class of dynamics models, e.g. such as models based on inverse dynamics. We described how to use the implicit formulation for accurate contact resolution in the DDP framework without requiring approximations of contact dynamics. The proposed method is exact and straightforward to implement, utilizing a closed-form solution for quality gradient computations. Further, we demonstrated properties of the approach in a number of cases: comparisons of implicit and explicit dynamics representations for a double pendulum, and two case studies for a single leg model that required challenging multi-contact motion plans.

While the original DDP provides both feedforward and feedback gains that guarantee a level of robustness against small perturbations, we noticed that the computed motion plans can fail if the conditions of the problem change slightly. Though one can introduce robustness as part of the trajectory optimization modelling, we believe that running the whole framework in a receding horizon fashion is more appropriate and promising. Thus, the motion plans should be updated online to withstand unexpected perturbations.

It is worth noting that DDP simulates the dynamics of the system and activates a contact point if it finds one. Thus, contacts are taken into account according to the system’s natural dynamics [posa2014], which may lead to characteristically abrupt motions [tassa2012]. Being a shooting method for unconstrained systems, DDP is limited in terms of active search for potential contacts. Further improvements can be made by combinatorial planning and exploration, where transcription-based methods demonstrated better capabilities and flexibility [chatzinikolaidis2020, patel2019], although requiring additional and non-negligible computation cost in practice.



  1. The accompanying code is available at and the video at
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description