CW-ERM Improving Autonomous Driving Planning with Closed-loop Weighted Empirical Risk Minimization Eesha Kumar1 Yiming Zhang2 Stefano Pini1 Simon Stent1

2025-04-27 0 0 1.41MB 7 页 10玖币
侵权投诉
CW-ERM: Improving Autonomous Driving Planning with
Closed-loop Weighted Empirical Risk Minimization
Eesha Kumar1,, Yiming Zhang2, Stefano Pini1, Simon Stent1,
Ana Ferreira2, Sergey Zagoruyko1, Christian S. Perone1,
Abstract The imitation learning of self-driving vehicle poli-
cies through behavioral cloning is often carried out in an open-
loop fashion, ignoring the effect of actions to future states.
Training such policies purely with Empirical Risk Minimization
(ERM) can be detrimental to real-world performance, as it
biases policy networks towards matching only open-loop behav-
ior, showing poor results when evaluated in closed-loop. In this
work, we develop an efficient and simple-to-implement principle
called Closed-loop Weighted Empirical Risk Minimization (CW-
ERM), in which a closed-loop evaluation procedure is first
used to identify training data samples that are important for
practical driving performance and then we these samples to
help debias the policy network. We evaluate CW-ERM in a
challenging urban driving dataset and show that this procedure
yields a significant reduction in collisions as well as other non-
differentiable closed-loop metrics.
I. INTRODUCTION
Learning effective planning policies for self-driving vehi-
cles (SDVs) from data such as human demonstrations remains
one of the major challenges in robotics and machine learning.
Since early works such as ALVINN [1], Imitation Learning
has seen major recent developments using modern Deep
Neural Networks (DNNs) [2]–[7]. Imitation Learning (IL),
and especially Behavioral Cloning (BC), however, still face
fundamental challenges [8], including causal confusion [9]
(later identified as a feedback-driven covariate shift [10]) and
dataset biases [8], to name a few.
There is one particular limitation of IL policies trained
with BC that is, however, often overlooked: the mismatch
between training and inference-time execution of the policy
actions. Most of the time, BC policies are trained in an open-
loop fashion, predicting the next action given the immediate
previous action and optionally conditioned on recent past
actions [2]–[5], [7]. These policies, however, when executed
in real-world, impact the future states. Small prediction errors
can drive covariate shift and make the network predict in an
out-of-distribution regime.
In this work, we address the mismatch between training
and inference through the development of a simple training
principle. Using a closed-loop simulator, we first identify and
then reweight samples that are important for the closed-loop
performance of the planner. We call this approach
CW-ERM
1
Author is with Woven Planet United Kingdom Limited,
114-116 Curtain Road, London, United Kingdom, EC2A 3AH.
firstname.lastname@woven-planet.global
2
Author is with Woven Planet North America, Inc.,
900 Arastradero Rd, Palo Alto, CA, USA 94304.
firstname.lastname@woven-planet.global
Equal contribution.
(Closed-loop Weighted Empirical Risk Minimization), since
we use Weighted ERM [11] to correct the training distribution
in favour of closed-loop performance. We extensively evaluate
this principle on real-world urban driving data and show that
it can achieve significant improvements on planner metrics
that matter for real-world performance (e.g. collisions).
Our contributions are therefore the following:
We motivate and propose Closed-loop Weighted Em-
pirical Risk Minimization (CW-ERM), a technique that
leverages closed-loop evaluation metrics acquired from
policy rollouts in a simulator to debias the policy network
and reduce the distributional differences between training
(open-loop) and inference time (closed-loop);
we evaluate CW-ERM experimentally on a challenging
urban driving dataset in a closed-loop fashion to show
that our method, although simple to implement, yields
significant improvements in closed-loop performance
without requiring complex and computationally expen-
sive closed-loop training methods;
we also show an important connection of our method
to a family of methods that addresses covariate shift
through density ratio estimation.
In Section II, we detail the proposed CW-ERM and in
Section IV we show the CW-ERM experiments and compare
them against ERM.
II. METHODOLOGY
A. Problem Setup
The traditional formulation of supervised learning for
imitation learning, also called behavioral cloning (BC), can
be formulated as finding the policy ˆπBC :
ˆπBC = argmin
πΠ
Esdπ,aπ(s)[`(s, a, π)] (1)
where the state
s
is sampled from the expert state dis-
tribution
dπ
induced when following the expert policy
π
.
Actions
a
are sampled from the expert policy
π(s)
. The loss
`
is also known as the surrogate loss that will find the policy
ˆπBC
that best mimics the unknown expert policy
π(s)
. In
practice, we only observe a finite set of state-action pairs
(si, a
i)m
i=1
, so the optimization is only approximate and we
then follow the Empirical Risk Minimization (ERM) principle
to find the policy πfrom the policy class Π.
If we let
Esdπ,aπ(s)[`(s, a, π)] =
, then it follows
that
J(π)J(π) + T2
as shown by the proof in [13],
where
J
is the total cost and
T
is the task horizon. As we
can see, the total cost can grow quadratically in T.
arXiv:2210.02174v2 [cs.LG] 11 Oct 2022
Training
Scenes
Upsampled
Training Scenes
Identification
Policy
Final
Policy
Traditional open-loop ERM training
Closed-loop Weighted Empirical Risk Minimization (CW-ERM)
Closed-loop
Simulation
Error set
construction
12
3
45
5
6
Fig. 1: High-level overview of our proposed Closed-loop Weighted Empirical Risk Minimization (CW-ERM) method. In
steps
(1-2)
we train an identification policy
ˆπERM
using traditional ERM [12] on a set of training data samples or driving
"scenes". In step
(3)
, we perform closed-loop simulation of the policy
ˆπERM
and collect metrics to construct the error set in
step
(4)
. With the error set in hand, we upsample scenes in the training set as shown in step
(5)
. We train the final policy
ˆπCW-ERM using CW-ERM as shown in step (6) with the upsampled Dup set.
When the policy
ˆπBC
is deployed in the real-world,
it will eventually make mistakes and then induce a state
distribution
dˆπBC
different than the one it was trained on
(
dπ
). During closed-loop evaluation of driving policies,
non-imitative metrics such as collisions and comfort are also
evaluated. However, they are often ignored in the surrogate
loss or only implicitly learned by imitating the expert due
to the difficulty of overcoming differentiability requirements,
as smooth approximations of these metrics are still different
than the non-differentiable counterparts often used. These
policies can often show good results in open-loop training, but
perform poorly in closed-loop evaluation or when deployed
in a real SDV due to the differences between
dˆπBC
and
dπ
,
where the estimator is no longer consistent.
B. Closed-loop Weighted Empirical Risk Minimization
In our method, called “Closed-loop Weighted Empirical
Risk Minimization” (CW-ERM), we seek to debias a policy
network from the open-loop performance towards closed-
loop performance, making the model rely on features that
are robust to closed-loop evaluation. Our method consists of
three stages: the training of an identification policy, the use
of that policy in closed-loop simulation to identify samples,
and the training of a final policy network on a reweighted
data distribution. More explicitly:
Stage 1 (identification policy)
: train a traditional BC policy
network in open-loop using ERM, to yield ˆπERM.
Stage 2 (closed-loop simulation)
: perform rollouts of the
ˆπERM
policy in a closed-loop simulator, collect closed-loop
metrics and then identify the error set below:
EˆπERM ={(si, ai)s.t. C(si, ai)>0},(2)
where
si
is a training data sample, or “scene” with a fixed
number of timesteps from the training set,
ai
is the action
performed during the roll-out and
C(·)
is a cost such as the
number of collisions found during closed-loop rollouts.
Stage 3 (final policy)
: train a new policy using weighted
ERM where the scenes belonging to the error set
EˆπERM
are
upweighted by a factor w(·), yielding the policy ˆπCW-ERM:
argmin
πΠ
Esdπ,aπ(s)[w(EˆπERM , s)`(s, a, π)] (3)
As we can see, the CW-ERM policy in Equation 3 is
very similar to the original BC policy trained with ERM in
Equation 1, with the key difference of a weighting term based
on the error set from closed-loop simulation in Stage 2. In
practice, although statistically equivalent, we upsample scenes
by a fixed factor rather than reweighting, as it is known to
be more stable and robust [14].
By training a policy using CW-ERM, we expect it to up-
sample scenes that perform poorly in closed-loop evaluation,
making the policy network robust to the covariate shift seen
during inference time while unrolling the policy.
We describe the complete CW-ERM training procedure in
Algorithm 1 and in Figure 1 we show a high-level overview
of our method.
C. Relationship to covariate shift adaptation with density
ratio estimation
One important connection of our method is with covariate
shift correction using density ratio estimation [11]. To correct
for the covariate shift, the negative log-likelihood is often
weighted by the density ratio r(s):
argmin
πΠ
Esdπ,aπ(s)[r(s)`(s, a, π)] (4)
where
r(s)
is defined as the density ratio between test and
training distributions:
r(s) = ptest(s)
ptrain(s)(5)
In practice,
r(s)
is difficult to compute and is thus
estimated. The density ratio will be higher when the sample is
摘要:

CW-ERM:ImprovingAutonomousDrivingPlanningwithClosed-loopWeightedEmpiricalRiskMinimizationEeshaKumar1;,YimingZhang2,StefanoPini1,SimonStent1,AnaFerreira2,SergeyZagoruyko1,ChristianS.Perone1;Abstract—Theimitationlearningofself-drivingvehiclepoli-ciesthroughbehavioralcloningisoftencarriedoutinanopen-...

展开>> 收起<<
CW-ERM Improving Autonomous Driving Planning with Closed-loop Weighted Empirical Risk Minimization Eesha Kumar1 Yiming Zhang2 Stefano Pini1 Simon Stent1.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:1.41MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注