Constrained Differential Dynamic Programming:
A primal-dual augmented Lagrangian approach
Wilson Jalleta,b,*, Antoine Bambadeb,c, Nicolas Mansardaand Justin Carpentierb
Abstract— Trajectory optimization is an efficient approach
for solving optimal control problems for complex robotic
systems. It relies on two key components: first the tran-
scription into a sparse nonlinear program, and second the
corresponding solver to iteratively compute its solution. On
one hand, differential dynamic programming (DDP) provides
an efficient approach to transcribe the optimal control problem
into a finite-dimensional problem while optimally exploiting
the sparsity induced by time. On the other hand, augmented
Lagrangian methods make it possible to formulate efficient
algorithms with advanced constraint-satisfaction strategies. In
this paper, we propose to combine these two approaches into
an efficient optimal control algorithm accepting both equality
and inequality constraints. Based on the augmented Lagrangian
literature, we first derive a generic primal-dual augmented
Lagrangian strategy for nonlinear problems with equality
and inequality constraints. We then apply it to the dynamic
programming principle to solve the value-greedy optimization
problems inherent to the backward pass of DDP, which we
combine with a dedicated globalization strategy, resulting in
a Newton-like algorithm for solving constrained trajectory
optimization problems. Contrary to previous attempts of formu-
lating an augmented Lagrangian version of DDP, our approach
exhibits adequate convergence properties without any switch in
strategies. We empirically demonstrate its interest with several
case-studies from the robotics literature.
I. INTRODUCTION
In this paper, we are interested in solving constrained
continuous-time optimal control problems (OCP) of the
form:
min
x,u ZT
0
`(t, x(t), u(t)) dt +`T(x(T)) (1a)
s.t. f(t, x(t), u(t),˙x(t)) = 0, t ∈[0, T )(1b)
x(0) = ¯x0(1c)
h(t, x(t), u(t)) 60(1d)
hT(x(T)) 60,(1e)
where `and `Tare the running and terminal costs respec-
tively, (1b) accounts for the system dynamics written as a
differential-algebraic equation (DAE) (and includes the ODE
case ˙x=f(t, x(t), u(t))). We denote Xand Uthe state and
aLAAS-CNRS, 7 Avenue du Colonel Roche, F-31400 Toulouse, France
bInria, D´
epartement d’informatique de l’ENS, ´
Ecole normale sup´
erieure,
CNRS, PSL Research University, Paris, France
cENPC, France, *corresponding author: wjallet@laas.fr
This work was supported in part by the HPC resources from GENCI-
IDRIS (Grant AD011011342), the French government under management of
Agence Nationale de la Recherche as part of the ”Investissements d’avenir”
program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute) and ANR-
19- P3IA-000 (ANITI 3IA Institute), Louis Vuitton ENS Chair on Artificial
Intelligence, and the European project MEMMO (Grant 780684).
control spaces, T > 0the time horizon, ¯x0∈ X the initial
condition, h(·)and hT(·)the path and terminal constraints.
For numerical resolution, the continuous-time OCP (1)
must be transcribed into a finite-dimensional optimization
problem (i.e., with a finite number of variables, which
the continuous-time trajectories are not) [1]. Several tran-
scriptions are possible [2], [3], [4]. Differential Dynamic
Programming (DDP) is a particular OC algorithm which
implies a direct transcription known as single shooting [5].
Popularized in robotics in the late 2000s [6], it has the
advantage over other transcriptions of providing a simple
formulation, optimally exploiting the sparsity of the resulting
nonlinear programs while providing feedback gains at no
extra cost. The corresponding transcription, extended to any
constraints, reads:
min
x,u
N−1
X
k=0
`k(xk, uk) + `N(xN)(2a)
s.t. fk(xk, uk, xk+1) = 0, k ∈J0, N −1K(2b)
x0= ¯x0(2c)
hk(xk, uk)60(2d)
hN(xN)60,(2e)
where hk, hN, fkare appropriate functions discretizing the
dynamics and path constraints depending on the given nu-
merical discretization scheme employed. The `kare approxi-
mations of the cost integrals Rtk+1
tk`(t, x(t), u(t)) dt. We use
the shorthands xdef
= (x0, . . . , xN)and udef
= (u0, . . . , uN−1)
for the discretized state and control trajectories.
While the nominal DDP algorithm is not able to handle
path constraints, implicit integrators or multiple-shooting
stabilization, several improvements have been proposed over
the years to equip it with these properties. In this paper, we
focus on the handling of equality and inequality constraints,
and we first review previous work focusing on it.
A first subcase of interest only considers OCP with control
bounds, which can be handled by a projected quasi-Newton
approach [7]. Several other projection-based formulations
have then been proposed to extend DDP [8], [9], none of
which have been shown to be robust enough to be widely
adopted in robotics. To account fro inequality constraints,
interior-point methods [10], [11] have also been recently
investigated; however, these do not allow for easy warm-
starting [3] which is unsuitable for online optimization and
application to model-predictive control (MPC) [12], [13].
In the past few years, augmented Lagrangian approaches
have emerged as a suitable solution for solving constrained
arXiv:2210.15409v2 [cs.RO] 28 Oct 2022