RAP Risk-Aware Prediction for Robust Planning Haruki NishimuraJean MercatBlake Wulfe Rowan McAllister Adrien Gaidon

2025-04-29 0 0 4.55MB 22 页 10玖币
侵权投诉
RAP: Risk-Aware Prediction for Robust Planning
Haruki NishimuraJean MercatBlake Wulfe Rowan McAllister
Adrien Gaidon
Toyota Research Institute, USA
firstname.lastname@tri.global
Abstract: Robust planning in interactive scenarios requires predicting the uncer-
tain future to make risk-aware decisions. Unfortunately, due to long-tail safety-
critical events, the risk is often under-estimated by finite-sampling approximations
of probabilistic motion forecasts. This can lead to overconfident and unsafe robot
behavior, even with robust planners. Instead of assuming full prediction coverage
that robust planners require, we propose to make prediction itself risk-aware. We
introduce a new prediction objective to learn a risk-biased distribution over trajec-
tories, so that risk evaluation simplifies to an expected cost estimation under this
biased distribution. This reduces the sample complexity of the risk estimation dur-
ing online planning, which is needed for safe real-time performance. Evaluation
results in a didactic simulation environment and on a real-world dataset demon-
strate the effectiveness of our approach. The code2and a demo3are available.
Keywords: Risk Measures, Forecasting, Safety, Human-Robot Interaction
1 Introduction
In safety-critical and interactive control tasks such as autonomous driving, the robot must success-
fully account for uncertainty of the future motion of surrounding humans. To achieve this, many
contemporary approaches decompose the decision-making pipeline into prediction and planning
modules [15] for maintainability, debuggability, and interpretability. A prediction module, often
learned from data, first produces likely future trajectories of surrounding agents, which are then con-
sumed by a planning module for computing safe robot actions. Recent works [6,7] further propose
to couple prediction with risk-sensitive planning for enhanced safety, wherein the planner computes
and minimizes a risk measure [8] of its planned trajectory based on probabilistic forecasts of human
motion from the data-driven predictor. A risk measure is a functional that maps a cost distribution
to a deterministic real number, which lies between the expected cost and the worst-case cost [9].
Although combining data-driven forecasting and risk-sensitive planning has been shown to be effec-
tive, there exist several limitations to this approach. First, accurate risk evaluation of candidate robot
plans remains challenging, due to inaccurate characterization of uncertainty in human behavior [10]
and finite-sampling from the predictor. Some existing methods that promote diversity of predic-
tion (e.g., [11,12]) may alleviate this issue, but they are not explicitly designed for reliable risk
estimation needed for robust planning. Second, endowing an existing planner with risk-sensitivity
often requires non-trivial modifications to its internal optimization algorithm [1315]. This modifi-
cation can be problematic, if, for example, an autonomy stack already has a dedicated and complex
(risk-neutral) planner in use and cannot easily modify its internal optimization algorithms.
To address the above limitations, we propose to consider risk within the predictor rather than in the
planner. We present a risk-biased trajectory forecasting framework, which provides a general ap-
proach to making a generative trajectory forecasting model risk-aware. Our novel method augments
a pre-trained generative model with an additional encoding process. This modification changes the
The first two authors contributed equally to this work.
2https://github.com/TRI-ML/RAP
3https://huggingface.co/spaces/TRI-ML/risk_biased_prediction
6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.
arXiv:2210.01368v2 [cs.LG] 12 Jan 2023
output of the prediction so that it purposefully and deliberately over-estimates the probability of dan-
gerous trajectories. This “pessimistic” forecasting model gives distributional robustness (e.g., [16])
to the planner against potential inaccuracies of the human behavior model.
We achieve the pessimistic risk-biased distribution using a novel prediction loss. This shifts the
computational burden of drawing many prediction samples that capture rare events from online de-
ployment to offline prediction training. The planner can still obtain an accurate estimate of the risk
measure in real-time during deployment with fewer prediction samples required from the biased
distribution. Furthermore, our approach also eliminates the need for modifications to the planner’s
optimization algorithm. Thus, one can achieve enhanced safety by simply replacing a conventional
probabilistic motion forecaster with the proposed risk-biased model, while still using the same exist-
ing risk-neutral planner. This capability is intended for use in robotic applications where misestima-
tion of risk could lead to injury, including autonomous vehicles and home robots that must operate
safely in close proximity to humans.
Specifically, our contributions in this work are as follows:
We propose a risk-biased trajectory forecasting framework, which makes forecasts more
useful for the downstream task and leads to plans that are robust to distribution shifts.
Our risk-biased model off-loads the heavy computation of risk estimation from online plan-
ning, providing risk-awareness to a generic risk-neutral planner.
We extensively evaluate our proposed approach in simulation with a planner in the loop
and offline with complex real-world data.
2 Related Work
Trajectory forecasting from data. Early trajectory forecasting approaches defined hand-crafted
dynamics models [17,18], and incorporated rules that induce obstacle avoidance behavior [19] or
mimic the overall traffic flow [20,21]. More recently, data-driven, learning-based methods have
gained popularity for their ability to better capture the complexity of human behavior [22], and
typically use neural networks defining multi-modal trajectory distributions [12,2338].
Significant effort is directed toward increasing the coverage, or diversity, of motion forecasting mod-
els [11,12,3341] in order to ensure that no critical events are missed. Diversity can be explicitly
encouraged using a best-of-many loss [25], by replacing a mean-squared loss with a Huber loss [40],
by choosing trajectory samples that maximize the distribution coverage [34], or by setting diverse
anchors or target points [3638]. Another strategy to increase mode coverage takes advantage of the
latent distribution of CVAEs [5,11,41] or GANs [12]. Cui et al. [5] argue that besides coverage,
sample efficiency is also an important factor. The authors trained a road-scene motion forecasting
model to produce predictions of other agents that induce diverse reactions from the given robot plan-
ner. Similarly, McAllister et al. [42] train a model with a weighted loss giving a low weight to the
predictions that do not affect the planner. Huang et al. [27] train a forecasting model that allows
a simple optimization procedure to select the safest among a set of plans generated by a planner.
While prior work considered task-awareness or planner-awareness, to the best of our knowledge, we
are the first to use risk as a proxy to make forecasts more useful for the downstream task.
Subjective probability and prospect theory. Our pessimistic risk-biased prediction can be
interpreted as a model of subjective probability (e.g., [43]), which is closely related to risk-
awareness [44]. For instance, prospect theory [45] studies how humans make risk-aware deci-
sions and introduces the notion of probability weighting [46]. Under this model, the distribution
is “warped” so that the probabilities of unlikely events are always over-weighted. Recent robotics
literature has leveraged prospect theory to better model risk-awareness in human decision making,
for example, in collaborative human-robot manipulation [47] and driver behavior modeling [48].
Prospect theory is a descriptive model of human decision making, which differs from our goal of
designing risk-aware robots. Moreover, our model only overestimates the probability of events that
incur high-cost for the robot, unlike probability weighting that overestimates any unlikely outcome.
Risk-sensitive planning and control. Risk-sensitive planning and control date back to the 1970s,
as exemplified by risk-sensitive Linear-Exponential-Quadratic-Gaussian [49,50] and risk-sensitive
2
Markov Decision Processes (MDPs) [51]. More recent methods include risk-sensitive nonlinear
MPC [6,52], Q-learning [44,53], and actor-critic [54,55] methods, for various types of risk mea-
sures. Refer to a recent survey [56] for further details. Unlike those methods in which the policy
directly optimizes a risk-measure, we propose to instead bias the prediction so that risk-sensitivity
can be achieved by a risk-neutral planner that simply optimizes the expected value of the cost.
3 Background
3.1 Generative Probabilistic Trajectory Forecasting
Let xand ybe the past and the future trajectories of an agent, and Y|xdenote the random variable of
the future trajectory conditioned on the observed past trajectory x. We would like to fit the distribu-
tion of p(Y|x)given a dataset Dof i.i.d. samples of (x, y)pairs. To fit p(Y|x), we maximize the like-
lihood of future trajectories w.r.t. the model parameters θ, φ:maximizeθ,φ Q(x,y)∈D L(θ, φ;y|x),
where L(θ, φ;y|x)is the likelihood of the sample yknowing x. One method to fit this distribution
is to learn a conditional variational auto-encoder, CVAE [57]. We focus on this approach because
it produces a structured latent representation. The CVAE conditions its likelihood estimation on
a latent random variable Z|x,y with a posterior qφ2(z|x,y ), or Z|xwith an inferred prior qφ1(z|x)
used in the joint likelihood pθ(y|x, z|x). The marginal likelihood of the future trajectory (or “model
evidence”) is pθ(y|x,z ), and can be rewritten as:
L(θ, φ;y|x) = Zpθ(y|x,z)dz =Zpθ(y|x,z )qφ2(z|x,y)
qφ2(z|x,y)dz =Eqφ2(z|x,y )pθ(y|x, z|x)
qφ2(z|x,y).(1)
Using Jensen’s inequality, the logarithm of (1) is lower bounded by
L(θ, φ;x, y) = Eqφ2(z|x,y )[ln(pθ(y|x,z ))] KLqφ2(z|x,y)||qφ1(z|x),(2)
called the evidence lower bound (ELBO). We model qφand pθusing neural networks. The encoders
assume a Gaussian prior with independent elements to produce the inferred prior fφ1: (x)
(µ|x,diag(Σ|x)), and the posterior fφ2: (x, y)(µx,y ,diag(Σ|x,y)). The decoder makes the
forecast gθ: (x, z)y. Every term in (2) can be either computed or estimated with Monte-Carlo
sampling as established in [57,58].
3.2 Risk Measures
A risk measure is defined as a functional that maps a cost distribution to a real number. In other
words, given a random cost variable Cwith distribution p, a risk measure of pyields a deterministic
number rcalled the risk. In practice, we often consider a class of risk measures that lie between
the expected value Ep[C]and the highest value sup(C). The former corresponds to the risk-neutral
evaluation of C, while the latter gives the worst-case assessment. Such risk measures often take a
user-specified risk-sensitivity level σRas an additional argument, which determines where the
risk value ris positioned between Ep[C]and sup(C). Formally, let us define a risk measure as
Rp: (C, σ)r[Ep[C],sup(C)]. Examples of such risk measures include entropic risk [50]:
Rentropic
p(C, σ) = 1
σlog Ep[exp(σC)] as well as CVaR [59]:
RCVaR
p(C, σ) = inf
tRt+1
1σEp[max(0, C t)].(3)
The rest of the paper assumes CVaR (3) as the underlying risk measure, but note that the proposed
approach is not necessarily bound to this particular choice. For CVaR, the risk value rgiven risk-
sensitivity level σ(0,1) can be interpreted as the expected value of the right (1 σ)-tail of the
cost distribution [60]. Thus, Rp(C, σ)tends to Ep[C]as σ0and to sup(C)as σ1.
Another intriguing property of CVaR is its fundamental relation to distributional robustness. CVaR
belongs to a class of risk measures called coherent measures of risk [61] with the following dual
characterization ([61], Theorem 4a):
Rp(C, σ) = sup
q∈Q
Eq[C],(4)
where Qis a uniquely-determined, non-empty and closed convex subset of the set of all density
functions. This suggests that CVaR is equivalent to a worst-case expectation of the cost Cwhen
3
the underlying probability distribution qis chosen adversarially from Q. Therefore, an autonomous
robot optimizing CVaR (or coherent measures of risk in general) obtains distributional robustness,
in that the objective accounts for robustness to potential inaccuracies in the underlying probabilistic
model. In this context, the set Qis often referred to as an ambiguity set in the literature [62,63].
4 Problem Formulation
Suppose that a robot incurs cost Cunder a planned policy πor trajectory. This cost is given by a
function Jπsuch that C=Jπ(Y)with Ybeing the human future trajectory random variable, which
the robot predicts probabilistically. We assume that Jπis known and differentiable in yfor each π.
One can design such a cost function so that Jπ(y)is high when the robot collides into the particular
trajectory Y=yof a human. Supplementary material Edefines the cost function used in this work.
We begin with a pre-trained generative model, as defined in Section 3.1, that gives a predictive
distribution p(Y|x) = Rp(Y|x,z)p(z)dz through an inferred latent distribution p(Z|x). This latent
is mapped to the trajectory space by a generator or decoder y=g(z, x). Under this unbiased model,
the risk is given by r=Rp(Jπ(g(Z, x)), σ)using the risk measure introduced in Section 3.2.
Given the unbiased model and the risk measure, we are interested in finding another distribution
qψ(Z)in the latent space with learnable parameters ψ, under which simply taking the risk-neutral
expectation of the cost will yield the same risk value as given above. This can be achieved by
enforcing the following equality constraint on this biased distribution qψ(Z):
Eqψ[Jπ(g(Z, x))] = Rp(Jπ(g(Z, x)), σ).(5)
We show that such a distribution exists in Section A.1 of the supplementary material. Comparing
both sides in (5), we note that such qshould be dependent on the risk-sensitivity level σ. We
propose to optimize the parameters ψof the risk-biased distribution qψ(Z|x,σ). In general, many
distributions qcan satisfy (5). We propose to pick a particular qthat additionally minimizes the KL
divergence from the prior p, to prevent the biased distribution from becoming too different from the
original unbiased distribution. This leads to the following constrained optimization problem:
minimize
ψKL (qψ(Z|σ)kp(Z)) subject to Eqψ[Jπ(g(Z, x))] = Rp(Jπ(g(Z, x)), σ).(6)
In general, we cannot guarantee uniqueness of the solution to the optimization problem (6). How-
ever, in the supplementary material A, we provide further analysis of (6) along with a sufficient
assumption under which the solution would be unique (Proposition A.3).
Connection to importance sampling. Importance sampling has been employed in rare-event sim-
ulation for accelerated safety verification of autonomous systems [6466], which yields a pessimistic
sampling distribution similar to our risk-biased model. However, a crucial difference of our approach
is that it estimates a more general risk measure instead of an expected value. Given a desired risk-
sensitivity level, unweighted samples from the proposal qwill directly yield the risk estimate (5).
This removes the need to compute the importance weights.
Connection to distributional robustness. When a coherent measure of risk is chosen as the un-
derlying risk measure (such as CVaR), the right-hand side of (5) is always equivalent to a worst-case
distribution qchosen out of an ambiguity set Q(4). In general, it is difficult to verify if the optimal
distribution qψis in Q, since the specifics of Qdepend on the choice of the risk measure as well
as the risk-sensitivity level σ. Nevertheless, it holds true that any feasible distribution qψfor (6)
yields the same worst-case expected cost as the most adversarial distribution from Q. Therefore, a
planner relying on qψinstead of pwill possess distributional robustness. We demonstrate this crucial
capability via an empirical evaluation in Section 6.3.
5 Implementation Details
Section Bof the supplemental defines a usual (unbiased) CVAE trajectory forecasting model that
learns two encoders, defining the Gaussian latent variables Z|xand Z|x,y, and one decoder, predict-
ing Y|x,z. We propose to solve problem (6) by learning a third neural network encoder to define
4
Algorithm 1 Proposed Risk-Biasing Loss Estimation
Input: Trajectory (x, y)∼ D, risk level σp(σ), KL-loss weight β, risk weight α, robot motion yrobot
1: for k∈ {1,...,K1}do
2: Sample latent zk|x N (µ|x,Σ|x)with prior parameters (µ|x,Σ|x) = fφ1(x)
3: Decode risk-neutral predictions yk=gθ(x, zk|x)
4: Compute risk rusing {y1,...yK1}and Jyrobot with Monte Carlo estimation (e.g., [68])
5: for k∈ {1,...,K2}do
6: Sample biased latent ˆz(b)
k N (µ(b),Σ(b))with risk-biased parameters (µ(b),Σ(b)) = fψ(x, σ, yrobot)
7: Decode risk-biased predictions ˆyk=gθ(x, ˆz(b)
k)
8: Compute expected cost ˆr=1
K2PK2
k=1 Jyrobot (ˆyk)
9: Compute risk loss Lrisk =ρ(ˆrr)and prior loss Lprior =KL N(µ(b),Σ(b))||N (µ|x,Σ|x)
Output: Loss value αLrisk +βLprior to train ψ(θand φ1are fixed)
a biased latent distribution that, in combination with the pre-trained decoder, produces biased fore-
casts. This biased encoder takes the past trajectory x, a risk-level σ, and the robot future trajectory
yrobot. It outputs the parameters of a Normal distribution µ(b)and log(diag(Σ(b))).
In practice, we soften the hard constraint (5) by using the penalty method [67], which progressively
increases the weight αof the risk-loss during training. We also leverage a user-defined sampling
distribution p(σ)to sample different risk-sensitivity levels during training, so that the risk estimate
remains accurate at any reasonable value of σat inference time. Finally, we encourage the model to
overestimate the risk rather than underestimate it so we scale by the positive value sand define an
asymmetric risk-loss that penalizes linearly the underestimation of the risk and logarithmically its
overestimation:
ρ(x) = s|x|,if sx 1
log(sx),otherwise. (7)
We obtain the following loss function with αand βcontrolling the relative importance of the losses:
L(ψ) = Eσp(σ)α ρEqψ[Jπ(g(Z, x))] − Rp(Jπ(g(Z, x)), σ)+βKL (qψ(Z|σ,x)kp(Z|x)).
The expected values and the risk measure are approximated by Monte Carlo sampling. For comput-
ing CVaR (Rp(Jπ(g(Z, x)), σ)), we use the estimator proposed by Hong et al. [68]. Consistency
and asymptotic normality of this estimator hold under mild assumptions [68].
Algorithm 1lays out the procedure for training our proposed risk-aware prediction. It relies on a
fully trained CVAE with the encoder fφ1:x(µ|x,Σ|x)and decoder gθ:x, z ythat fits
the distribution of Y|xfrom a dataset. We train a new latent-biasing encoder fψ:x, σ, yrobot
(µ(b),Σ(b))to bias the latent distribution while keeping the rest of the CVAE fixed. The risk-level σ
is randomly sampled on [0,1] during training and chosen by the user at test time.
6 Experiments
6.1 Biasing forecasts in a didactic scenario
Figure 1: Top-down view of a simulated scene. The robot in red moves left to right down the road as a pedestrian
in blue is crossing. The color of the depicted pedestrian trajectory samples indicates their corresponding Time-
To-Collision (TTC) cost for the robot. The slow mode in red is more costly than the fast mode in green.
We created the didactic simulation environment in Fig. 1where a red robot drives at constant speed
along a straight road with a stochastic pedestrian. The pedestrian either walks slowly or quickly,
yielding a bimodal distribution over their travel distance. We collected a dataset in this environment
where the initial position and orientation of the pedestrian are set at random. We used it to train
a risk-biased CVAE model according to the method presented in sections 4and 5. Fig. 2b shows
5
摘要:

RAP:Risk-AwarePredictionforRobustPlanningHarukiNishimuraJeanMercatBlakeWulfeRowanMcAllisterAdrienGaidonToyotaResearchInstitute,USAfirstname.lastname@tri.globalAbstract:Robustplanningininteractivescenariosrequirespredictingtheuncer-tainfuturetomakerisk-awaredecisions.Unfortunately,duetolong-tailsaf...

展开>> 收起<<
RAP Risk-Aware Prediction for Robust Planning Haruki NishimuraJean MercatBlake Wulfe Rowan McAllister Adrien Gaidon.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:4.55MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注