Trajectory Optimization for Contactrich Motions using Implicit Differential Dynamic Programming
Abstract
This paper presents a novel approach using sensitivity analysis for generalizing Differential Dynamic Programming (DDP) to systems characterized by implicit dynamics, such as those modelled via inverse dynamics and variational or implicit integrators. It leads to a more general formulation of DDP, enabling for example the use of the faster recursive NewtonEuler inverse dynamics. We leverage the implicit formulation for precise and exact contact modelling in DDP, where we focus on two contributions: (1) Contact dynamics in acceleration level that enables highorder integration schemes; (2) Formulation using an invertible contact model in the forward pass and a closedform solution in the backward pass to improve the numerical resolution of contacts. The performance of the proposed framework is validated (1) by comparing implicit versus explicit DDP for the swingup of a double pendulum, and (2) by planning motions for two tasks using a single leg model making multibody contacts with the environment: standing up from ground, where a priori contact enumeration is challenging, and maintaining balance under an external perturbation.
leftmargin=* \addbibresourceref.bib \AtBeginBibliography
I Introduction
The longstanding research goal of creating robots capable of physically interacting with our environment remains elusive. Typical tasks, such as moving around the environment (locomotion) and modifying the surroundings (manipulation), ultimately require a complex sequence of physical contacts between the robot and the external world. Achieving such capabilities requires effective solutions for producing contactrich motions.
To date, we still have limited technologies to replicate animal or humanlevel interaction skills on robots. This observation forces us to rethink the root of these limitations, which is more at an algorithmic and theoretical level rather than in hardware; it is nowadays possible yet difficult to validate a large range of physical capabilities in highfidelity physics simulation. The scope here is on producing contactrich motions for robot locomotion, leaving the applicability and adaptation on robot manipulation for future work.
Ia Machine Learning
Machine learning techniques can be integrated with optimization, where costpreference learning alleviates the burden of manual design of cost functions for optimizationbased planning and control. This achieved roughterrain quadrupedal locomotion for LittleDog robot [zucker2011]. Going beyond such a hierarchy of complex integration of learning, footstep planning, trajectory optimization and reflex control, reinforcement learning for continuous control provides an alternative in an endtoend fashion. A reinforcement learning approach trained by an adaptive terrain curriculum demonstrated robust singleskill trotting that traversed a variety of indoor and outdoor unstructured environments [lee2020]. An architecture of multiexpert reinforcement learning is able to extend capabilities to multiskill and multimodal locomotion with coherent fall recovery, trotting, and all dynamic transitions inbetween different modes [yang2020]. These neural network based feedback policies were trained in simulation and then deployed on real robots, but still demonstrated robustness under scenarios that were never encountered during training.
However, these neural network policies act as reactive feedback control that responds to the proprioceptive state feedback. It is hard to incorporate future target objectives for longterm temporal planning. Even though they are computationally fast to run in real time, it is difficult to guarantee the longterm stability and optimality of motions, i.e. whether the robot will fall in the future or whether a successful sequence of motions is more optimal in terms of energy efficiency and sufficient stability margins against uncertainties. Moreover, for safety critical applications, such an approach is not able to provide verifiable validity before execution. To overcome these limitations, a more mathematically principled approach enables to take into account knowledge about the constraints of the robot and environments, and to provide verifiable longterm stability and feasibility.
IB Modelbased Mathematical Optimization
Trajectory optimization (TO) has attracted increasing research interest for motion planning and control of highly dynamical, underactuated robots [kelly2017, xin2020]. Similar to learning approaches, it also has the potential of generating complex motions in a highlevel manner: A user can design and specify a desired task using physical terms with associated weights via a cost function [todorov2018], which can also be automatically tuned [yuan2019]. This approach is quite flexible, encompassing a widerange of cases; for example, additional contact points can be included in the optimization to increase the robustness against perturbations for locomanipulation tasks [wolfslag2020].
This is particularly interesting for robotic systems that need to compute throughcontact motion plans, which involve multiple contact interactions. Physical contacts are traditionally difficult to model and incorporate in motion planning frameworks. Most approaches are hybrid, in the sense that the contact schedule pattern [chatzinikolaidis2018, wang2020] or corresponding timings are provided a priori, while contacts are desirable with the endeffectors only. This leads to difficulties in practical implementations because selection of locations and timings is in general nontrivial, while restricting contacts to endeffectors only limits the motion repertoire.
DDP—a prominent shooting TO methodology [mayne1966]—is among the most promising approaches in terms of efficiency for throughcontact motion planning. This is demonstrated by a multitude of previous works that used DDP as backbone: From impressive results in simulation [tassa2012], to realtime applications for highdimensional legged robots [neunert2017a, carius2019]. However, properly modelling contacts has proven a considerable challenge; most DDP implementations resort to approximations and simplifications that require welltuned contact parameters. A fundamental reason is that contact phenomena are canonically described implicitly.
The original DDP algorithm and its subsequent studies assume that the discretetime systems considered are explicitly defined. While this is valid in broad terms, it excludes implicitly defined dynamical systems [hairer1993]. These are typically more challenging because they require the solution of nonlinear equations. However, they offers computational advantages, e.g. providing stability even for stiff differential equations. Further, handling implicitly defined systems allows more principled contact modelling in DDP.
IC Contributions
In this work, we focus on the optimization paradigm and provide theoretical and algorithmic contributions as:

An extension to the DDP algorithm that handles explicitly and implicitly defined systems in a unified manner.

Based on this, we propose an approach leveraging an invertible model [todorov2014] for exact contact resolution in DDP.

Results demonstrating the possibility of exploiting properties of implicit integrators in DDP settings.
We benchmark our extension by applying it on implicitly and explicitly defined models, and on two cases of multicontact wholebody motion planning for a planar single leg robot that makes multibody contacts: Standingup from ground and balancing from an initial perturbation in a receding horizon fashion (Fig. 1). Our approach is equally applicable to models with large degrees of freedom and arbitrary contact configurations, such as using multiple legs.
The remaining sections are organized as: Sec. II discusses prior work and extensions of the DDP algorithm, and applications of DDP for throughcontact motion planning. Sec. III summarizes DDP and how contacts are typically resolved in simulation. In Sec. IV, we present our extension of DDP and, in Sec. V, how to utilize it for throughcontact planning. Sec. VI provides comparisons between explicit and implicit systems in the context of DDP, and two motion planning studies for a single leg standing up and balancing in multicontact settings. We summarize and conclude in Sec. VII.
Ii Prior Work on Differential Dynamic Programming
Iia Differential Dynamic Programming
DDP was originally introduced in [mayne1966]. Its main advantage with respect to the Dynamic Programming algorithm [bellman1962] is that it does not suffer from the curse of dimensionality by sacrificing global optimality. Subsequently, a number of improvements and extensions of DDP have been introduced. Recently, there was a resurgence of interest due to its potential for efficient planning for highdimensional systems.
DDP is a secondorder algorithm that exhibits quadratic convergence similar to Newton’s methods [liao1992]. Thus, it requires secondorder information, which can be computationally challenging for highdimensional models. To resolve this, the iLQR variant performs a GaussNewton approximation of the Hessian based on firstorder information only, albeit with superlinear convergence [todorov2005].
The original DDP algorithm is concerned with unconstrained discrete dynamical systems only. Control bounds can be considered via a projected Newton quadratic programming (QP) solver [tassa2014]. More general nonlinear inequality constraints via an activeset method [xie2017]. In robotics, it is common to consider multiple tasks in a hierarchical fashion, which is possible to do for DDP too [geisert2017]. In legged locomotion, the discontinuous nature of contact phenomena has led to the development of tailored approaches. For example, a predefined gait pattern and centroidal dynamics model was considered in [budhiraja2018], and more general hybrid systems in [li2020]. We underline that the DDP framework presented next can incorporate the previous extensions straightforwardly.
Finally, a brief discussion about the application of DDP for implicitly defined systems from a Lie theoretic viewpoint is given in [boutselis2020]. Here, we present a more complete and deep treatment, with extensive comparisons. Furthermore, our vectorbased formulation is much more familiar and common for robotic systems applications.
IiB Throughcontact Motion Planning
Applications of DDP to motion planning and control for legged robots have been very impressive. From simple, approximate models up until wholebody models, DDP provides a means for fast and even realtime computations.
In [tassa2012], DDP was used to control a humanoid model. A diverse set of behaviours was generated by simply changing weights in the cost function through a GUI interface. An approximate solution for the contact dynamics was used, with a contact model similar to the one that is used here. In this work, the implicit formulation that we present next allows the consideration of contacts in DDP without requiring approximations to the contact model itself.
For quadruped robots, a diverse set of motions both in simulation and in hardware was shown in [neunert2017a]. To take into account contacts, a nonlinear springdamper model was used. Even though tuning for each contact is done independently, springdamper models can be difficult to tune in practice and require very small time steps. It is common for the optimizer to explore states where the current model parameters are not valid, while the small time steps translate into a large problem. Here, in the forward pass the model takes into account all possible contacts in a centralized manner (through the coupling with the contactspace inertia matrix), while independently solve for each contact at the backward pass (by leveraging our implicit DDP formulation and the model’s invertibility). Thus, performance is similar to complementarity formulations with large time steps, while we are capable to compute straightforwardly gradients in the backward pass.
To eliminate the unrealistic effects of springdamper models, a hard contact model was used in [carius2018]. Unfortunately, contact impulses require the numerical solution of a quadratically constrained quadratic program (QCQP), typical in timestepping approaches with unilateral and friction cone constraints, and formulates the problem in a bilevel fashion. This complicates the derivative computation due to the numerical nature of the solution. We resolve this issue by leveraging the invertibility of the contact model: in the forward pass, the QCQP is solved with the associated constraints; in the backward pass, a closedform computation is used that avoids the bilevel formulation. As a result, this does not pose issues with differentiation and leads to a faster and simpler implementation, without the need for backpropagation.
A multiple shooting variant was presented in [mastalli2020], extending the work in [giftthaler2018a]. It allows easier initialization since both state and control sequences can be used. Unfortunately, the intermediate iterates of the algorithm are infeasible, meaning that early stopping with a feasible trajectory, as in DDP, is not possible. This is a necessary property in our case, since the throughcontact motion planning approach that we present is running in a receding horizon fashion. Furthermore, the contact schedule is predefined in [mastalli2020], while here contacts are activated according to the natural dynamics of the system [posa2014]. Finally, friction cone constraints are neglected or can be taken into account through penalization in the cost function, which can be in practice difficult to tune and can lead to unrealistic solutions. Due to the imposition of contacts as equality constraints, attractive forces can arise at the solution, violating the unilateral constraint. Our framework here utilizes full unilateral and friction cone contact constraints without any approximation or penalization.
Iii Preliminaries
Iiia Summary of Differential Dynamic Programming
DDP is concerned with the optimization of a performance criterion for an unconstrained discretetime dynamical system [mayne1966, tassa2012]. This can be expressed as
(1a)  
s.t. 
Here, is an additive cost at time step and is the final cost, and are the state and control, is the length of the horizon, while denotes the quantity at the next time step, e.g. the next state in our context.
According to the principle of optimality, (1a) can be expressed via the value function, which is the total cost at a given state once we apply the optimal control sequence. The principle of optimality makes the computation of the value function iterative, and at a state is given by
Since finding the global minimum is challenging, DDP performs a quadratic approximation of the value function and subsequently improves the control sequence locally. If we define the function as
(2) 
a quadratic approximation about the current point is
(3)  
while and are state and input perturbations. The terms in (3) are computed by expanding and matching same terms in creftype 2 as
(4) 
Backward pass
Forward pass
Once the feedforward and feedback terms and for each time step are computed, we perform a forward pass to compute the updated control sequence as
(7a)  
(7b)  
(7c) 
for . In practice, regularization and line search are necessary, as explained in [tassa2012].
IiiB Simulation With Contacts
We summarize a typical simulation pipeline here in the presence of contacts [horak2019]. Contact resolution is usually done in the velocity–impulse level but our DDP is formulated at the acceleration–force level, which will be elaborated later.
The dynamics of a mechanical system are given by
(8) 
where the mass matrix, the vector of nonlinear forces, a selection matrix that maps actuated joint torques to generalized coordinates, while denotes the Jacobian of the th contact and the corresponding force. We simplify notation by dropping explicit dependence on quantities.
In timestepping approaches, e.g. [todorov2014, horak2019], creftype 8 is discretized using an Euler approximation to obtain
where is the time step size and corresponds to the concatenation of the contact impulses at time step . These are projected in contact space
which can also be expressed as
(9) 
with , , , and .
Different contact models pose different conditions on what constraints accompany creftype 9. In this work, the contact model defined in [todorov2014] is used because it is convex and analytically invertible. It penalizes movement in contact space by solving the following QCQP during the forward dynamics
(10) 
where are the tangential and normal components, is a positive definite matrix that makes the solution unique and invertible, and is a Baumgarte stabilization reference.
The inverse dynamics is welldefined and for a diagonal we obtain an independent problem per contact
(11) 
Iv Implicit DDP
Our point of departure from the original DPP algorithm is the dynamics in (IIIA). Instead of the explicit dynamics, we assume dynamics of the form
(12) 
This will allow us to apply DDP for systems expressed via inverse dynamics, implicit or variational integrators, etc. Our focus will be contact dynamics, but we return to this later.
The goal is to compute the derivatives for the quadratic approximation of the function (4). Terms related to the running cost are trivial and will be omitted. Thus, we focus on the first and secondorder sensitivity of the next step value function. A treatment of sensitivity analysis in the context of Newton methods can be found in [zimmermann2019].
Iva FirstOrder Sensitivity Analysis
The firstorder sensitivity of the value function in (2) is
Here, is the sensitivity of the next step value function with respect to the current state, while is the sensitivity of the next step value function with respect to the next state; connected by the previous equation. Based on (12) we have
(13) 
where it is assumed that for any and , can be computed so that (12) is satisfied. Combining the previous two equations gives
In practice, a faster computation can be achieved using the adjoint method [strang2007] by computing first the quantity
and then
(14a) 
If we confine ourselves in a firstorder analysis only this is computationally advantageous [strang2007], but the computation of in (13) is required for the secondorder expansion. By a similar reasoning, is computed as
(15a) 
which concludes our firstorder analysis.
We now have all the ingredients for the firstorder approximation of the function. For example, the term in (4) is given by
IvB SecondOrder Sensitivity Analysis
The secondorder approximation of the value function is
The term constitutes a thirdorder tensor. We use matrix notation for the contractions but assume that their computation is clear from the context. It is computed as
By combining the last two equations we have that
(16b)  
IvC GaussNewton Approximation
Especially for robot models with many degrees of freedom, computing the tensor terms (16b) can be prohibitive expensive. Fortunately, it is possible to do a GaussNewton approximation of the Hessian—equivalent to iLQR—by ignoring them. Thus, the secondorder sensitivity terms of the value function become
(18a)  
(18b)  
(18c) 
V Accelerationlevel Contact Dynamics
We describe here a contact resolution framework in the acceleration level, rather than the commonly used velocity level. This way, we avoid the necessary firstorder discretization of the dynamics. Thus, during integration of the state, an arbitrary order integrator can be used. Other assumptions are not required about the robot’s model (such as the assumption about a constant Jacobian in (9) that is inherent in the velocityimpulse formulations), without increasing the computation complexity. As such, we consider it a superior choice. It is also the default choice in MuJoCo [todorov2020], which is a stateoftheart robotics simulator.
Starting from the continuous time dynamics (8), we multiply both sides by and add , which gives
(19) 
We can interpret this equation as follows: is the unconstrained acceleration in contact space in the absence of any contacts, which is corrected by the term to result in the actual acceleration that satisfies the contact constraints.
As already explained, the contact model that we utilize was proposed in [todorov2014]. It computes the necessary contact forces by solving the following convex optimization problem that tries to minimize accelerations in contact space
(20) 
which is the equivalent to (10) for accelerations.
While the bias accelerations can be in a general Baumgarte stabilization form, a choice that works reasonably good across models is
(21) 
with the gap distance, positive when bodies are separate. The first term is used to cancel the same term in and and simplify computations. The second and third term are obtained by a Taylor expansion of the gap distance function and ignoring third and higher order terms.
In the forward pass, the above optimization problem is solved for the contact forces using a standard Projected Gauss–Siedel solver [horak2019]. Though in principle this can be implemented in the backward pass, the computation of the gradients becomes more complicated since we have to differentiate a numeric solution. Even with automatic differentiation, the quality of the gradients can suffer. Instead, a diagonal approximation of the system can be assumed and an approximate solution to the contact forces can be computed [tassa2012]. The implicit formulation avoids this issue and the exact solution for the contact forces is given in closed form.
By utilizing the implicit framework and the invertibility of the model, problem (11) is expressed in acceleration space
(22) 
For the computation of as given by (19), the joint acceleration is required. In the classical DDP algorithm this is not available, since we only have access to the current state and , and the acceleration is computed after the contact forces. In the implicit form, since we have additionally available the next state , the computation of the acceleration is trivial. Thus, we can compute each contact force in closed form as
(23) 
projects contact forces to the cone with coefficient [horak2019].
After the computation of the contact forces, we can enforce the implicit dynamics creftype 12 either using a forward or inverse dynamics formulation. Given the available information, the computation of inverse dynamics is cheaper and numerically superior [hollerbach1980, ferrolho2021]. Furthermore, this decoupling between forward and backward pass allows us to avoid the rootfinding problem during the forward that would be necessary in a full implicit implementation. Having to solve the rootfinding problem in the forward pass increases the computation time of the implicit formulation. We summarize the DDP computations subject to contacts in Algos. 2 and 1.
Vi Results
Via Implementation Details
For the computation of the rigidbody dynamics, the Julia library RigidBodyDynamics.jl is used [koolen2019].
Computation of first, second and thirdorder tensor is done using forwardmode automatic differentiation [revels2016].
We begin by performing multiple comparisons between implicit and explicit DDP formulations for a double pendulum swingup task. Next, we present two problems that require multicontact motion planning: A single leg that is required to a) stand up from the ground, and b) balance from an initial random state.
ViB Aggregate Double Pendulum Swingup
For the double pendulum swingup task, we generate random trials (that is, with random initial state) and we specify an objective that includes a desired upright posture at the end of a s horizon, while penalizing joint torques at intermediate states. Additionally, joint limits are modelled using unilateral forces at the joints. Only the unilateral constraint is imposed (forces push the joint away from the limit at violations), while friction is not required.
We compare four variants of methods presented in this work:

Implicit iLQR with backward Euler dynamics.

Implicit DDP with backward Euler dynamics.

Explicit iLQR with forward Euler dynamics.

Explicit DDP with forward Euler dynamics.
For every random initialization, the four variants are executed until convergence (or until an upper iteration limit is reached) and the number of iterations and total cost of the trajectory is logged. Aggregate box plot results for the cost and the number of iterations are shown in Figs. 3 and 2, respectively.
From the comparison, the implicit formulations result in considerably less iterations than the explicit counterpart. Both median, minimum and maximum values, and the rest of the statistical properties in Fig. 3 are improved with an implicit formulation regarding the number of iterations. As expected, the tradeoff for this is the larger in general cost of the resulting trajectory in Fig. 2. This can be partially explained from the fact that since the explicit formulations perform on average more iterations, they are capable to finetune the resulting trajectory more. But given the considerable less iterations for the implicit formulations, this aspect is more important in terms of the overall performance.
A possible reason behind this is the integrator’s properties. Implicit Euler is an Astable method suitable even for stiff systems. As such, it usually exhibits energy decrease—instead of the common increase in explicit methods—that makes the whole formulation more stable.
ViC Single Double Pendulum Swingup
Cost per iteration and timings
We evaluate the cost per iteration for one double pendulum swingup and compare 6 different formulations (each with a DDP and iLQR variant):

Forward Euler dynamics in the forward and backward passes.

Forward Euler dynamics in the forward pass, and forward Euler inverse dynamics in the backward pass.

Backward Euler dynamics in the forward and backward passes.
We use the same cost function and initialize at the stable equilibrium, while the total duration of the motion is s, with a time step of ms, and steps. The results are shown in Fig. 4. Formulation (i) corresponds to a classical iLQR/DDP with explicit dynamics. Formulation (ii) is enabled by the presented framework. The computation of the Jacobian and tensor terms is based on the automatic differentiation of the inverse dynamics. Since (ii) is equivalent to (i), the solutions by the two approaches are exactly the same and are plotted together in Fig. 4. Differences are found for the computation time, as reported next. Formulation (iii) is implicit in both passes and enabled again by the presented framework.
In terms of computation, formulations (i) and (ii) with iLQR require iterations, while with DDP require iterations. In terms of timings, the mean time of each iteration for (i) with iLQR is ms and with DDP ms. For (ii) with iLQR is ms and with DDP ms. While the differences are not significant for such a lowdimensional model, these can become starker for robot models with larger degrees of freedom. For (iii), iterations for iLQR and iterations with DDP. The mean computation time of each iterations with iLQR is ms and with DDP ms. The increased computation is due to the solution of a nonlinear system of equations in the forward pass.
Effect of time step size
We focus now on the effect of the time step size to the solution of the problem. We solve the same problem as before for multiple time step selections and report the number of iterations required until convergence. Since formulations (i) and (ii) are equivalent, we focus the comparison on (i) and (iii). We solve them using iLQR but similar conclusions could be drawn if DDP was used.
The results are shown on Table I. For small time steps, the two formulations are essentially equivalent and, thus, require the same number of iterations. As the time step increases, the influence of the integrator’s damping in (iii) becomes more apparent. This results in a desirable decrease to the number of iterations for convergence. The motions are included again in the accompanying video. For larger time steps, the accuracy of both firstorder integrators worsens significantly.
Time step  

(i) / (ii)  
(iii) 
ViD Multicontact Standup
Next, we consider a planar 3 degreeoffreedom single leg of a humanoid robot and the task now is to standup upright from the ground. The model can make multiple contacts with the terrain using all the bodies of its structure, but selfcollisions are inactive. We predefine a number of possible contact points but we do not prescribe the contact activation pattern. Adding a contact detection mechanism and avoiding the prespecification of contacts is another possibility, as typically done in simulation engines.
The cost function of the problem is defined as
A penalization of the velocity and joint torque is applied throughout the trajectory, while a goal state is defined in the final cost term. The motion duration is s with a time step of ms; this is a relatively large time step for contacts, but our aim here is to output an approximate contactrich motion plan. Given this plan as input, it is possible to postprocess it to increase the quality.
The friction coefficient is selected as . Parameter is initialized with a value of for all components. While in principle it can take arbitrary values, we can test the validity from a numerical viewpoint as follows [todorov2020]: We run the forward and backward pass separately and compare the computed forces. The two solutions should match according to the desired numerical precision.
The main difficulty is that the problem exhibits a number of contact possibilities. Thus, mode enumeration can be very challenging. Notice also how delicate heel balance emerges while reaching the upright configuration. Our trajectory optimization framework is capable to output a locally optimal motion plan. Even though a zero torque initial solution is used here, its quality greatly affects the quality of the computed motion. Finally, by changing the terms in the cost function, it is possible to obtain different solutions, e.g. more conservative but with higher torque cost.
The resulting motion can be found in the accompanying video. There is an initial explosive and dynamic motion at about s. Such a motion would be in practice difficult to track. Yet being able to compute such a complex motion from highlevel input only demonstrates the power of DDPbased approaches. There are a couple of ways to mitigate that: an obvious approach is to increase the torque, position, and velocity penalization accordingly. Another option is to include terms that penalize the rate of the commanded torques. Finally, a more principled approach is to penalize high frequency components of the signals involved [grandia2019].
ViE Multicontact Balancing
Using the same model as before, the state now is randomly initialized in the air. The task is to keep the initial posture with zero velocity, i.e. to balance. In contrast to the previous case, this problem is formulated in a receding horizon fashion. A fixed number of iterations for DDP is prespecified; this makes realtime iterations of the algorithm possible. The horizon length is s, with the simulation running at Hz, while our framework runs at Hz. The structure of the cost function remains the same as before, albeit the weight regarding the final velocity is increased to bias more towards a static final configuration.
The balancing motion is shown at the bottom of Fig. 5. The computed motion naturally performs a series of jumps to dissipate kinetic energy and come to a complete stop. The underactuated foot tilting emerged as the outcome of optimization without programming explicit controllers as in [li2017]. Compared to the case in the previous section, the receding horizon formulation is capable of producing better motions in general. This is because the constant updates allows it to escape iterations with very small cost decrease, which can be common in the fixed horizon optimization of the previous case. If a bad initialization is specified or the horizon and frequency are not chosen properly, the receding horizon formulation can be trapped too. The selection of these parameters depends on the desired task and initial state.
Finally, a semilog plot of the total trajectory cost at the beginning and at the end of each DDP step is shown in Fig. 6. We notice that in about runs a successful balancing motion is computed. Afterwards, each run rapidly converges to this motion. The reason why the cost is increased at the beginning of each run is because the horizon moves; the predicted trajectory for the new segment at the end of the previous horizon is that the robot will essentially fall, which incurs a large cost. Additionally, during the initial runs, the motion is highly unstable and a suitable balancing motion is not discovered yet. Thus, the total trajectory cost varies greatly between consecutive runs.
Vii Conclusion
This work presents an extension to DDP that is able to handle implicit dynamical systems with particular focus on throughcontact motion planning. This allows extending the original DDP to a larger class of dynamics models, e.g. such as models based on inverse dynamics. We described how to use the implicit formulation for accurate contact resolution in the DDP framework without requiring approximations of contact dynamics. The proposed method is exact and straightforward to implement, utilizing a closedform solution for quality gradient computations. Further, we demonstrated properties of the approach in a number of cases: comparisons of implicit and explicit dynamics representations for a double pendulum, and two case studies for a single leg model that required challenging multicontact motion plans.
While the original DDP provides both feedforward and feedback gains that guarantee a level of robustness against small perturbations, we noticed that the computed motion plans can fail if the conditions of the problem change slightly. Though one can introduce robustness as part of the trajectory optimization modelling, we believe that running the whole framework in a receding horizon fashion is more appropriate and promising. Thus, the motion plans should be updated online to withstand unexpected perturbations.
It is worth noting that DDP simulates the dynamics of the system and activates a contact point if it finds one. Thus, contacts are taken into account according to the system’s natural dynamics [posa2014], which may lead to characteristically abrupt motions [tassa2012]. Being a shooting method for unconstrained systems, DDP is limited in terms of active search for potential contacts. Further improvements can be made by combinatorial planning and exploration, where transcriptionbased methods demonstrated better capabilities and flexibility [chatzinikolaidis2020, patel2019], although requiring additional and nonnegligible computation cost in practice.
Footnotes
 The accompanying code is available at github.com/ichatzinikolaidis/iDDP and the video at youtu.be/e_TMjmM4NmU.