Differentiable Constrained Imitation Learning for Robot Motion Planning and Control Christopher Diehl Janis Adamek Martin Kr uger Frank Hoffmann and Torsten Bertram

2025-05-06 0 0 2.71MB 7 页 10玖币
侵权投诉
Differentiable Constrained Imitation Learning for Robot Motion
Planning and Control
Christopher Diehl, Janis Adamek, Martin Kr¨
uger, Frank Hoffmann and Torsten Bertram
Abstract Motion planning and control are crucial compo-
nents of robotics applications like automated driving. Here,
spatio-temporal hard constraints like system dynamics and
safety boundaries (e.g., obstacles) restrict the robot’s motions.
Direct methods from optimal control solve a constrained
optimization problem. However, in many applications finding
a proper cost function is inherently difficult because of the
weighting of partially conflicting objectives. On the other hand,
Imitation Learning (IL) methods such as Behavior Cloning (BC)
provide an intuitive framework for learning decision-making
from offline demonstrations and constitute a promising avenue
for planning and control in complex robot applications. Prior
work primarily relied on soft constraint approaches, which
use additional auxiliary loss terms describing the constraints.
However, catastrophic safety-critical failures might occur in
out-of-distribution (OOD) scenarios. This work integrates the
flexibility of IL with hard constraint handling in optimal
control. Our approach constitutes a general framework for
constraint robotic motion planning and control, as well as
traffic agent simulation, whereas we focus on mobile robot and
automated driving applications. Hard constraints are integrated
into the learning problem in a differentiable manner, via
explicit completion and gradient-based correction. Simulated
experiments of mobile robot navigation and automated driving
provide evidence for the performance of the proposed method.
I. INTRODUCTION
The motion of robots in the real world is constrained
by the kinematics and dynamics of the robot as well as
the geometric structure of the environment. For example, to
navigate safely and smoothly, a self-driving vehicle (SDV)
must consider various factors such as its control limits, stop
signs, and obstacles building a driving corridor. A core chal-
lenge is incorporating these constraints into robot planning
and control. That is also essential for automated driving
traffic simulation to enhance the realism of the simulated
agents. For instance, traffic agents must follow common road
rules. On the one side, optimal control approaches solve
a finite horizon optimal control problem by optimizing a
cost function under explicitly defined constraints. A common
approach, like in direct methods [1], is to derive a nonlinear
program from a continuous optimal control formulation [2],
[3], [4] and then solve the problem with numerical optimiza-
tion. However, designing a general cost function remains
an unsolved problem for inherently complex tasks such as
automated driving [5], [6], [7]. Here, aspects like comfort
This research was funded by the Federal Ministry for Economic Affairs
and Climate Actions on the basis of a decision by the German Bundestag
and the European Union in the project ”KISSaF - AI-based Situation
Interpretation for Automated Driving”.
The authors are with the Institute of Control Theory and Systems
Engineering, TU Dortmund University, D-44227, Germany.
Fig. 1: A schematic overview of the proposed frame-
work: A robot, like an SDV, perceives its environment and
builds a high-dimensional environment model eiand a low-
dimensional state representation xi. Constraints Ci(grey
rectangle: equality constraints, blue ellipse: inequality con-
straints) further bound the robots motion. A neural network
Nθprocesses eiand outputs an initial sequence of control
values uN. These are completed to the initial solution ¯
y, also
containing the predicted states, by unrolling a robot dynamics
model. Afterward, ¯
yis corrected with gradient steps (red
arrows), such that the estimated solution ˆ
ylies in the space
defined by equality (grey) and the inequality constraints
(blue) of Ci. During training, the framework computes a
distance measure between the ˆ
yand the ground truth yGT
and backpropagates the softloss Lsoft. During testing, the
approach delivers a solution that imitates the expert behavior,
while obeying a set of nonlinear constraints.
and safety must be weighed against each other. On the other
side, robot behavior can be learned from demonstrations,
which is the task of IL. One example is BC, a simple
offline learning method, requiring no on-policy environment
interactions. Here, constraints are implicitly learned from
data. Further, constraints can be integrated by auxiliary loss
functions. However, there are no guarantees for constraint
satisfaction, and robot policies fail under distribution shifts
[8], causing unexpected unsafe actions.
That raises the question: Can we combine offline IL
methods like BC with the constraint incorporation of optimal
control methods?
Donti et al. [9] present a method for incorporating hard
constraints into the training of neural networks. The problem
is formulated as a nonlinear program, and evaluated with
a simple network architecture. Our approach extends their
previous work to the robotic IL setting. The nonlinear
program is constructed via direct transcription. Our proposed
approach, summarized in Fig. 1, leverages two differentiable
procedures to account for equality and inequality constraints
and is agnostic to the used network architecture. First,
the network predicts a sequence of control vectors, which
arXiv:2210.11796v2 [cs.RO] 28 Aug 2023
are explicitly completed to a sequence of states w.r.t. the
system dynamics represented as equality constraints. Then, a
gradient-based correction accounts for inequality constraints
while satisfying the equality constraints.
Contributions. To summarize, the paper makes the fol-
lowing contributions: (i) It proposes a general Differentiable
Constraint Imitation Learning (DCIL) framework for incor-
porating constraints, which is agnostic to the particular neural
network architecture. (ii) It demonstrates the approach’s
effectiveness in one mobile robot and one automated driving
environment during closed-loop evaluation. The approach
outperforms multiple state-of-the art baselines considering
a variety of metrics.
II. RELATED WORK
The proposed approach is situated within the broader
scope integrating constraints into learning-based approaches
and IL in the robotics and automated driving literature. This
section classifies related work into two major categories.
Modification of the Training Loss. The first class of ap-
proaches incorporates constraints by modifying the training
loss. A simple approach adds the constraints as weighted
penalties to the imitation loss. [10] proposes an application
for automated driving. The work shows that additional loss
functions penalizing constraint violations improve the closed-
loop performance. [11] modifies the training process with a
primal-dual formulation and converts the constrained opti-
mization problem into an alternating min-max optimization
with Lagrangian variables. [12] uses an energy-based for-
mulation. During training, the loss pushes down the energy
of positive samples (close to the expert demonstration) and
pulls up the energy-values on negative samples, which violate
constraints (e.g., colliding trajectories). While these methods
are more robust to errors in constraint-specifications, they
often fail in OOD scenarios as errors made by the learned
model still compound over time. That can lead to unexpected
behavior like leaving the driving corridor [8].
Projection onto Feasible Sets. The second group of ap-
proaches projects the neural network’s output onto a solution
that is compliant with the constraints. Instead of predicting
a future sequence of states, a neural network predicts a
sequence of controls [13]. Unrolling a dynamics model
generates a feasible state trajectory consistent with the robot
system dynamics. However, the approach does not account
for general nonlinear inequality constraints. [14] presents an
inverse reinforcement learning approach. First, a set of safe
trajectories is sampled, and learning is only performed on
the safe samples. SafetyNet [15] trains an IL planner and
proposes a sampling-based fallback layer performing sanity
checks. [16] proposes a similar approach using quadratic
optimization. Other works incorporate quadratic programs
[17] or convex optimization programs [18] as an implicit
layer into neural network architectures. These approaches
constitute the last layer to project the output to a set of
feasible solutions. [19] directly modifies the network archi-
tecture by encoding convex polytopes. Sampling, quadratic
optimization and convexity severely restrict the solution
space.
Most closely related to our approach is the work of
[9]. The authors present a hybrid approach, which accounts
for nonconvex, nonlinear constraints. Experiments deal with
numerical examples with simple network architectures. We
extend this work to the real-world-oriented robot IL set-
ting with more complex architectures for high-dimensional
feature spaces. Further, we use an explicit completion by
unrolling a robot dynamics model.
Just recently, concurrent works propose approaches which
also incorporate nonlinear constraints using Signal Temporal
Logic [20] and differentiable control barrier functions [21],
which emphasizes the importance of using nonlinearities. In
contrast, our approach relies on a differentiable completion,
and gradient-based correction procedure, and the training
is guided by auxiliary losses. [20] evaluates on simple toy
examples, whereas our analysis considers a more realistic
environment. [21] evaluates in real-world experiments but
only use a circular robot footprint and object representa-
tion, whereas this work evaluates using different constraints.
Moreover, our approach is able to resolve incorrect con-
straints that render the problem infeasible.
III. PROBLEM FORMULATION
Assume robots dynamics described by nonlinear, time-
invariant differential equations with time tR, state x∈ X
and controls u∈ U Rnu:
˙
x(t) = fx(t),u(t).(1)
The state space size Xof dimension nxis the union
of an arbitrary number of real spaces and non-Euclidean
rotation groups SO(2). In addition to the low-dimensional
state representation x, assume access to a high-dimensional
environment representation eERne(e.g., a birds-
eye-view (BEV) image of the scene). Further, the system is
bounded by a set of nonlinear constraints C(e.g., by control
bounds, rules, or safety constraints).
A (sub-)optimal expert, pursuing a policy πexp, controls
the robot and generates a dataset D={(xi,ui,ei,Ci)}I
i=0
with IN+samples. A future trajectory of length HN+
containing states and controls belonging to sample iis given
by yGT =xT
i,uT
i. . . , xT
i+H,uT
i+H1T. During training, the
objective is to find the optimal parameters θRnθunder a
maximum likelihood estimation:
θ= arg min
θ
EdyGT,ˆ
y,(2)
subject to equation (1) and the constraints C. The function
ddenotes a distance measure and ˆ
y=πθ(xi,ei)is the
output of the function πθparameterized by θ. Function πθ
is described by a neural network Nθand the completion
fcompl and correction fcorr procedure. During inference, given
the environment representation, the robot’s goal is to predict
a sequence of states and controls compliant with the con-
straints. In the spirit of an model predictive control (MPC)
framework, the first control vector is applied or an underlying
tracking controller regulates the robot along the reference.
摘要:

DifferentiableConstrainedImitationLearningforRobotMotionPlanningandControlChristopherDiehl,JanisAdamek,MartinKr¨uger,FrankHoffmannandTorstenBertramAbstract—Motionplanningandcontrolarecrucialcompo-nentsofroboticsapplicationslikeautomateddriving.Here,spatio-temporalhardconstraintslikesystemdynamicsand...

展开>> 收起<<
Differentiable Constrained Imitation Learning for Robot Motion Planning and Control Christopher Diehl Janis Adamek Martin Kr uger Frank Hoffmann and Torsten Bertram.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:2.71MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注