Differentiable Constrained Imitation Learning for Robot Motion Planning and Control Christopher Diehl Janis Adamek Martin Kr uger Frank Hoffmann and Torsten Bertram

2025-05-06 0 0 2.71MB 7 页 10玖币

侵权投诉

Differentiable Constrained Imitation Learning for Robot Motion

Planning and Control

Christopher Diehl, Janis Adamek, Martin Kr¨

uger, Frank Hoffmann and Torsten Bertram

Abstract— Motion planning and control are crucial compo-

nents of robotics applications like automated driving. Here,

spatio-temporal hard constraints like system dynamics and

safety boundaries (e.g., obstacles) restrict the robot’s motions.

Direct methods from optimal control solve a constrained

optimization problem. However, in many applications ﬁnding

a proper cost function is inherently difﬁcult because of the

weighting of partially conﬂicting objectives. On the other hand,

Imitation Learning (IL) methods such as Behavior Cloning (BC)

provide an intuitive framework for learning decision-making

from ofﬂine demonstrations and constitute a promising avenue

for planning and control in complex robot applications. Prior

work primarily relied on soft constraint approaches, which

use additional auxiliary loss terms describing the constraints.

However, catastrophic safety-critical failures might occur in

out-of-distribution (OOD) scenarios. This work integrates the

ﬂexibility of IL with hard constraint handling in optimal

control. Our approach constitutes a general framework for

constraint robotic motion planning and control, as well as

trafﬁc agent simulation, whereas we focus on mobile robot and

automated driving applications. Hard constraints are integrated

into the learning problem in a differentiable manner, via

explicit completion and gradient-based correction. Simulated

experiments of mobile robot navigation and automated driving

provide evidence for the performance of the proposed method.

I. INTRODUCTION

The motion of robots in the real world is constrained

by the kinematics and dynamics of the robot as well as

the geometric structure of the environment. For example, to

navigate safely and smoothly, a self-driving vehicle (SDV)

must consider various factors such as its control limits, stop

signs, and obstacles building a driving corridor. A core chal-

lenge is incorporating these constraints into robot planning

and control. That is also essential for automated driving

trafﬁc simulation to enhance the realism of the simulated

agents. For instance, trafﬁc agents must follow common road

rules. On the one side, optimal control approaches solve

a ﬁnite horizon optimal control problem by optimizing a

cost function under explicitly deﬁned constraints. A common

approach, like in direct methods [1], is to derive a nonlinear

program from a continuous optimal control formulation [2],

[3], [4] and then solve the problem with numerical optimiza-

tion. However, designing a general cost function remains

an unsolved problem for inherently complex tasks such as

automated driving [5], [6], [7]. Here, aspects like comfort

This research was funded by the Federal Ministry for Economic Affairs

and Climate Actions on the basis of a decision by the German Bundestag

and the European Union in the project ”KISSaF - AI-based Situation

Interpretation for Automated Driving”.

The authors are with the Institute of Control Theory and Systems

Engineering, TU Dortmund University, D-44227, Germany.

Fig. 1: A schematic overview of the proposed frame-

work: A robot, like an SDV, perceives its environment and

builds a high-dimensional environment model eiand a low-

dimensional state representation xi. Constraints Ci(grey

rectangle: equality constraints, blue ellipse: inequality con-

straints) further bound the robots motion. A neural network

Nθprocesses eiand outputs an initial sequence of control

values uN. These are completed to the initial solution ¯

y, also

containing the predicted states, by unrolling a robot dynamics

model. Afterward, ¯

yis corrected with gradient steps (red

arrows), such that the estimated solution ˆ

ylies in the space

deﬁned by equality (grey) and the inequality constraints

(blue) of Ci. During training, the framework computes a

distance measure between the ˆ

yand the ground truth yGT

and backpropagates the softloss Lsoft. During testing, the

approach delivers a solution that imitates the expert behavior,

while obeying a set of nonlinear constraints.

and safety must be weighed against each other. On the other

side, robot behavior can be learned from demonstrations,

which is the task of IL. One example is BC, a simple

ofﬂine learning method, requiring no on-policy environment

interactions. Here, constraints are implicitly learned from

data. Further, constraints can be integrated by auxiliary loss

functions. However, there are no guarantees for constraint

satisfaction, and robot policies fail under distribution shifts

[8], causing unexpected unsafe actions.

That raises the question: Can we combine ofﬂine IL

methods like BC with the constraint incorporation of optimal

control methods?

Donti et al. [9] present a method for incorporating hard

constraints into the training of neural networks. The problem

is formulated as a nonlinear program, and evaluated with

a simple network architecture. Our approach extends their

previous work to the robotic IL setting. The nonlinear

program is constructed via direct transcription. Our proposed

approach, summarized in Fig. 1, leverages two differentiable

procedures to account for equality and inequality constraints

and is agnostic to the used network architecture. First,

the network predicts a sequence of control vectors, which

arXiv:2210.11796v2 [cs.RO] 28 Aug 2023

are explicitly completed to a sequence of states w.r.t. the

system dynamics represented as equality constraints. Then, a

gradient-based correction accounts for inequality constraints

while satisfying the equality constraints.

Contributions. To summarize, the paper makes the fol-

lowing contributions: (i) It proposes a general Differentiable

Constraint Imitation Learning (DCIL) framework for incor-

porating constraints, which is agnostic to the particular neural

network architecture. (ii) It demonstrates the approach’s

effectiveness in one mobile robot and one automated driving

environment during closed-loop evaluation. The approach

outperforms multiple state-of-the art baselines considering

a variety of metrics.

II. RELATED WORK

The proposed approach is situated within the broader

scope integrating constraints into learning-based approaches

and IL in the robotics and automated driving literature. This

section classiﬁes related work into two major categories.

Modiﬁcation of the Training Loss. The ﬁrst class of ap-

proaches incorporates constraints by modifying the training

loss. A simple approach adds the constraints as weighted

penalties to the imitation loss. [10] proposes an application

for automated driving. The work shows that additional loss

functions penalizing constraint violations improve the closed-

loop performance. [11] modiﬁes the training process with a

primal-dual formulation and converts the constrained opti-

mization problem into an alternating min-max optimization

with Lagrangian variables. [12] uses an energy-based for-

mulation. During training, the loss pushes down the energy

of positive samples (close to the expert demonstration) and

pulls up the energy-values on negative samples, which violate

constraints (e.g., colliding trajectories). While these methods

are more robust to errors in constraint-speciﬁcations, they

often fail in OOD scenarios as errors made by the learned

model still compound over time. That can lead to unexpected

behavior like leaving the driving corridor [8].

Projection onto Feasible Sets. The second group of ap-

proaches projects the neural network’s output onto a solution

that is compliant with the constraints. Instead of predicting

a future sequence of states, a neural network predicts a

sequence of controls [13]. Unrolling a dynamics model

generates a feasible state trajectory consistent with the robot

system dynamics. However, the approach does not account

for general nonlinear inequality constraints. [14] presents an

inverse reinforcement learning approach. First, a set of safe

trajectories is sampled, and learning is only performed on

the safe samples. SafetyNet [15] trains an IL planner and

proposes a sampling-based fallback layer performing sanity

checks. [16] proposes a similar approach using quadratic

optimization. Other works incorporate quadratic programs

[17] or convex optimization programs [18] as an implicit

layer into neural network architectures. These approaches

constitute the last layer to project the output to a set of

feasible solutions. [19] directly modiﬁes the network archi-

tecture by encoding convex polytopes. Sampling, quadratic

optimization and convexity severely restrict the solution

space.

Most closely related to our approach is the work of

[9]. The authors present a hybrid approach, which accounts

for nonconvex, nonlinear constraints. Experiments deal with

numerical examples with simple network architectures. We

extend this work to the real-world-oriented robot IL set-

ting with more complex architectures for high-dimensional

feature spaces. Further, we use an explicit completion by

unrolling a robot dynamics model.

Just recently, concurrent works propose approaches which

also incorporate nonlinear constraints using Signal Temporal

Logic [20] and differentiable control barrier functions [21],

which emphasizes the importance of using nonlinearities. In

contrast, our approach relies on a differentiable completion,

and gradient-based correction procedure, and the training

is guided by auxiliary losses. [20] evaluates on simple toy

examples, whereas our analysis considers a more realistic

environment. [21] evaluates in real-world experiments but

only use a circular robot footprint and object representa-

tion, whereas this work evaluates using different constraints.

Moreover, our approach is able to resolve incorrect con-

straints that render the problem infeasible.

III. PROBLEM FORMULATION

Assume robots dynamics described by nonlinear, time-

invariant differential equations with time t∈R, state x∈ X

and controls u∈ U ⊂ Rnu:

x(t) = fx(t),u(t).(1)

The state space size Xof dimension nxis the union

of an arbitrary number of real spaces and non-Euclidean

rotation groups SO(2). In addition to the low-dimensional

state representation x, assume access to a high-dimensional

environment representation e∈E⊂Rne(e.g., a birds-

eye-view (BEV) image of the scene). Further, the system is

bounded by a set of nonlinear constraints C(e.g., by control

bounds, rules, or safety constraints).

A (sub-)optimal expert, pursuing a policy πexp, controls

the robot and generates a dataset D={(xi,ui,ei,Ci)}I

i=0

with I∈N+samples. A future trajectory of length H∈N+

containing states and controls belonging to sample iis given

by yGT =xT

i,uT

i. . . , xT

i+H,uT

i+H−1T. During training, the

objective is to ﬁnd the optimal parameters θ∈Rnθunder a

maximum likelihood estimation:

θ∗= arg min

EdyGT,ˆ

y,(2)

subject to equation (1) and the constraints C. The function

ddenotes a distance measure and ˆ

y=πθ(xi,ei)is the

output of the function πθparameterized by θ. Function πθ

is described by a neural network Nθand the completion

fcompl and correction fcorr procedure. During inference, given

the environment representation, the robot’s goal is to predict

a sequence of states and controls compliant with the con-

straints. In the spirit of an model predictive control (MPC)

framework, the ﬁrst control vector is applied or an underlying

tracking controller regulates the robot along the reference.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DifferentiableConstrainedImitationLearningforRobotMotionPlanningandControlChristopherDiehl,JanisAdamek,MartinKr¨uger,FrankHoffmannandTorstenBertramAbstract—Motionplanningandcontrolarecrucialcompo-nentsofroboticsapplicationslikeautomateddriving.Here,spatio-temporalhardconstraintslikesystemdynamicsand...

展开>> 收起<<

Differentiable Constrained Imitation Learning for Robot Motion Planning and Control Christopher Diehl Janis Adamek Martin Kr uger Frank Hoffmann and Torsten Bertram.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Differentiable Constrained Imitation Learning for Robot Motion Planning and Control Christopher Diehl Janis Adamek Martin Kr uger Frank Hoffmann and Torsten Bertram

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: