RAP Risk-Aware Prediction for Robust Planning Haruki NishimuraJean MercatBlake Wulfe Rowan McAllister Adrien Gaidon

2025-04-29 0 0 4.55MB 22 页 10玖币

侵权投诉

RAP: Risk-Aware Prediction for Robust Planning

Haruki Nishimura∗Jean Mercat∗Blake Wulfe Rowan McAllister

Adrien Gaidon

Toyota Research Institute, USA

firstname.lastname@tri.global

Abstract: Robust planning in interactive scenarios requires predicting the uncer-

tain future to make risk-aware decisions. Unfortunately, due to long-tail safety-

critical events, the risk is often under-estimated by ﬁnite-sampling approximations

of probabilistic motion forecasts. This can lead to overconﬁdent and unsafe robot

behavior, even with robust planners. Instead of assuming full prediction coverage

that robust planners require, we propose to make prediction itself risk-aware. We

introduce a new prediction objective to learn a risk-biased distribution over trajec-

tories, so that risk evaluation simpliﬁes to an expected cost estimation under this

biased distribution. This reduces the sample complexity of the risk estimation dur-

ing online planning, which is needed for safe real-time performance. Evaluation

results in a didactic simulation environment and on a real-world dataset demon-

strate the effectiveness of our approach. The code2and a demo3are available.

Keywords: Risk Measures, Forecasting, Safety, Human-Robot Interaction

1 Introduction

In safety-critical and interactive control tasks such as autonomous driving, the robot must success-

fully account for uncertainty of the future motion of surrounding humans. To achieve this, many

contemporary approaches decompose the decision-making pipeline into prediction and planning

modules [1–5] for maintainability, debuggability, and interpretability. A prediction module, often

learned from data, ﬁrst produces likely future trajectories of surrounding agents, which are then con-

sumed by a planning module for computing safe robot actions. Recent works [6,7] further propose

to couple prediction with risk-sensitive planning for enhanced safety, wherein the planner computes

and minimizes a risk measure [8] of its planned trajectory based on probabilistic forecasts of human

motion from the data-driven predictor. A risk measure is a functional that maps a cost distribution

to a deterministic real number, which lies between the expected cost and the worst-case cost [9].

Although combining data-driven forecasting and risk-sensitive planning has been shown to be effec-

tive, there exist several limitations to this approach. First, accurate risk evaluation of candidate robot

plans remains challenging, due to inaccurate characterization of uncertainty in human behavior [10]

and ﬁnite-sampling from the predictor. Some existing methods that promote diversity of predic-

tion (e.g., [11,12]) may alleviate this issue, but they are not explicitly designed for reliable risk

estimation needed for robust planning. Second, endowing an existing planner with risk-sensitivity

often requires non-trivial modiﬁcations to its internal optimization algorithm [13–15]. This modiﬁ-

cation can be problematic, if, for example, an autonomy stack already has a dedicated and complex

(risk-neutral) planner in use and cannot easily modify its internal optimization algorithms.

To address the above limitations, we propose to consider risk within the predictor rather than in the

planner. We present a risk-biased trajectory forecasting framework, which provides a general ap-

proach to making a generative trajectory forecasting model risk-aware. Our novel method augments

a pre-trained generative model with an additional encoding process. This modiﬁcation changes the

∗The ﬁrst two authors contributed equally to this work.

2https://github.com/TRI-ML/RAP

3https://huggingface.co/spaces/TRI-ML/risk_biased_prediction

6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.

arXiv:2210.01368v2 [cs.LG] 12 Jan 2023

output of the prediction so that it purposefully and deliberately over-estimates the probability of dan-

gerous trajectories. This “pessimistic” forecasting model gives distributional robustness (e.g., [16])

to the planner against potential inaccuracies of the human behavior model.

We achieve the pessimistic risk-biased distribution using a novel prediction loss. This shifts the

computational burden of drawing many prediction samples that capture rare events from online de-

ployment to ofﬂine prediction training. The planner can still obtain an accurate estimate of the risk

measure in real-time during deployment with fewer prediction samples required from the biased

distribution. Furthermore, our approach also eliminates the need for modiﬁcations to the planner’s

optimization algorithm. Thus, one can achieve enhanced safety by simply replacing a conventional

probabilistic motion forecaster with the proposed risk-biased model, while still using the same exist-

ing risk-neutral planner. This capability is intended for use in robotic applications where misestima-

tion of risk could lead to injury, including autonomous vehicles and home robots that must operate

safely in close proximity to humans.

Speciﬁcally, our contributions in this work are as follows:

• We propose a risk-biased trajectory forecasting framework, which makes forecasts more

useful for the downstream task and leads to plans that are robust to distribution shifts.

• Our risk-biased model off-loads the heavy computation of risk estimation from online plan-

ning, providing risk-awareness to a generic risk-neutral planner.

• We extensively evaluate our proposed approach in simulation with a planner in the loop

and ofﬂine with complex real-world data.

2 Related Work

Trajectory forecasting from data. Early trajectory forecasting approaches deﬁned hand-crafted

dynamics models [17,18], and incorporated rules that induce obstacle avoidance behavior [19] or

mimic the overall trafﬁc ﬂow [20,21]. More recently, data-driven, learning-based methods have

gained popularity for their ability to better capture the complexity of human behavior [22], and

typically use neural networks deﬁning multi-modal trajectory distributions [12,23–38].

Signiﬁcant effort is directed toward increasing the coverage, or diversity, of motion forecasting mod-

els [11,12,33–41] in order to ensure that no critical events are missed. Diversity can be explicitly

encouraged using a best-of-many loss [25], by replacing a mean-squared loss with a Huber loss [40],

by choosing trajectory samples that maximize the distribution coverage [34], or by setting diverse

anchors or target points [36–38]. Another strategy to increase mode coverage takes advantage of the

latent distribution of CVAEs [5,11,41] or GANs [12]. Cui et al. [5] argue that besides coverage,

sample efﬁciency is also an important factor. The authors trained a road-scene motion forecasting

model to produce predictions of other agents that induce diverse reactions from the given robot plan-

ner. Similarly, McAllister et al. [42] train a model with a weighted loss giving a low weight to the

predictions that do not affect the planner. Huang et al. [27] train a forecasting model that allows

a simple optimization procedure to select the safest among a set of plans generated by a planner.

While prior work considered task-awareness or planner-awareness, to the best of our knowledge, we

are the ﬁrst to use risk as a proxy to make forecasts more useful for the downstream task.

Subjective probability and prospect theory. Our pessimistic risk-biased prediction can be

interpreted as a model of subjective probability (e.g., [43]), which is closely related to risk-

awareness [44]. For instance, prospect theory [45] studies how humans make risk-aware deci-

sions and introduces the notion of probability weighting [46]. Under this model, the distribution

is “warped” so that the probabilities of unlikely events are always over-weighted. Recent robotics

literature has leveraged prospect theory to better model risk-awareness in human decision making,

for example, in collaborative human-robot manipulation [47] and driver behavior modeling [48].

Prospect theory is a descriptive model of human decision making, which differs from our goal of

designing risk-aware robots. Moreover, our model only overestimates the probability of events that

incur high-cost for the robot, unlike probability weighting that overestimates any unlikely outcome.

Risk-sensitive planning and control. Risk-sensitive planning and control date back to the 1970s,

as exempliﬁed by risk-sensitive Linear-Exponential-Quadratic-Gaussian [49,50] and risk-sensitive

Markov Decision Processes (MDPs) [51]. More recent methods include risk-sensitive nonlinear

MPC [6,52], Q-learning [44,53], and actor-critic [54,55] methods, for various types of risk mea-

sures. Refer to a recent survey [56] for further details. Unlike those methods in which the policy

directly optimizes a risk-measure, we propose to instead bias the prediction so that risk-sensitivity

can be achieved by a risk-neutral planner that simply optimizes the expected value of the cost.

3 Background

3.1 Generative Probabilistic Trajectory Forecasting

Let xand ybe the past and the future trajectories of an agent, and Y|xdenote the random variable of

the future trajectory conditioned on the observed past trajectory x. We would like to ﬁt the distribu-

tion of p(Y|x)given a dataset Dof i.i.d. samples of (x, y)pairs. To ﬁt p(Y|x), we maximize the like-

lihood of future trajectories w.r.t. the model parameters θ, φ:maximizeθ,φ Q(x,y)∈D L(θ, φ;y|x),

where L(θ, φ;y|x)is the likelihood of the sample yknowing x. One method to ﬁt this distribution

is to learn a conditional variational auto-encoder, CVAE [57]. We focus on this approach because

it produces a structured latent representation. The CVAE conditions its likelihood estimation on

a latent random variable Z|x,y with a posterior qφ2(z|x,y ), or Z|xwith an inferred prior qφ1(z|x)

used in the joint likelihood pθ(y|x, z|x). The marginal likelihood of the future trajectory (or “model

evidence”) is pθ(y|x,z ), and can be rewritten as:

L(θ, φ;y|x) = Zpθ(y|x,z)dz =Zpθ(y|x,z )qφ2(z|x,y)

qφ2(z|x,y)dz =Eqφ2(z|x,y )pθ(y|x, z|x)

qφ2(z|x,y).(1)

Using Jensen’s inequality, the logarithm of (1) is lower bounded by

L(θ, φ;x, y) = Eqφ2(z|x,y )[ln(pθ(y|x,z ))] −KLqφ2(z|x,y)||qφ1(z|x),(2)

called the evidence lower bound (ELBO). We model qφand pθusing neural networks. The encoders

assume a Gaussian prior with independent elements to produce the inferred prior fφ1: (x)→

(µ|x,diag(Σ|x)), and the posterior fφ2: (x, y)→(µx,y ,diag(Σ|x,y)). The decoder makes the

forecast gθ: (x, z)→y. Every term in (2) can be either computed or estimated with Monte-Carlo

sampling as established in [57,58].

3.2 Risk Measures

A risk measure is deﬁned as a functional that maps a cost distribution to a real number. In other

words, given a random cost variable Cwith distribution p, a risk measure of pyields a deterministic

number rcalled the risk. In practice, we often consider a class of risk measures that lie between

the expected value Ep[C]and the highest value sup(C). The former corresponds to the risk-neutral

evaluation of C, while the latter gives the worst-case assessment. Such risk measures often take a

user-speciﬁed risk-sensitivity level σ∈Ras an additional argument, which determines where the

risk value ris positioned between Ep[C]and sup(C). Formally, let us deﬁne a risk measure as

Rp: (C, σ)→r∈[Ep[C],sup(C)]. Examples of such risk measures include entropic risk [50]:

Rentropic

p(C, σ) = 1

σlog Ep[exp(σC)] as well as CVaR [59]:

RCVaR

p(C, σ) = inf

t∈Rt+1

1−σEp[max(0, C −t)].(3)

The rest of the paper assumes CVaR (3) as the underlying risk measure, but note that the proposed

approach is not necessarily bound to this particular choice. For CVaR, the risk value rgiven risk-

sensitivity level σ∈(0,1) can be interpreted as the expected value of the right (1 −σ)-tail of the

cost distribution [60]. Thus, Rp(C, σ)tends to Ep[C]as σ→0and to sup(C)as σ→1.

Another intriguing property of CVaR is its fundamental relation to distributional robustness. CVaR

belongs to a class of risk measures called coherent measures of risk [61] with the following dual

characterization ([61], Theorem 4a):

Rp(C, σ) = sup

q∈Q

Eq[C],(4)

where Qis a uniquely-determined, non-empty and closed convex subset of the set of all density

functions. This suggests that CVaR is equivalent to a worst-case expectation of the cost Cwhen

the underlying probability distribution qis chosen adversarially from Q. Therefore, an autonomous

robot optimizing CVaR (or coherent measures of risk in general) obtains distributional robustness,

in that the objective accounts for robustness to potential inaccuracies in the underlying probabilistic

model. In this context, the set Qis often referred to as an ambiguity set in the literature [62,63].

4 Problem Formulation

Suppose that a robot incurs cost Cunder a planned policy πor trajectory. This cost is given by a

function Jπsuch that C=Jπ(Y)with Ybeing the human future trajectory random variable, which

the robot predicts probabilistically. We assume that Jπis known and differentiable in yfor each π.

One can design such a cost function so that Jπ(y)is high when the robot collides into the particular

trajectory Y=yof a human. Supplementary material Edeﬁnes the cost function used in this work.

We begin with a pre-trained generative model, as deﬁned in Section 3.1, that gives a predictive

distribution p(Y|x) = Rp(Y|x,z)p(z)dz through an inferred latent distribution p(Z|x). This latent

is mapped to the trajectory space by a generator or decoder y=g(z, x). Under this unbiased model,

the risk is given by r=Rp(Jπ(g(Z, x)), σ)using the risk measure introduced in Section 3.2.

Given the unbiased model and the risk measure, we are interested in ﬁnding another distribution

qψ(Z)in the latent space with learnable parameters ψ, under which simply taking the risk-neutral

expectation of the cost will yield the same risk value as given above. This can be achieved by

enforcing the following equality constraint on this biased distribution qψ(Z):

Eqψ[Jπ(g(Z, x))] = Rp(Jπ(g(Z, x)), σ).(5)

We show that such a distribution exists in Section A.1 of the supplementary material. Comparing

both sides in (5), we note that such qshould be dependent on the risk-sensitivity level σ. We

propose to optimize the parameters ψof the risk-biased distribution qψ(Z|x,σ). In general, many

distributions qcan satisfy (5). We propose to pick a particular qthat additionally minimizes the KL

divergence from the prior p, to prevent the biased distribution from becoming too different from the

original unbiased distribution. This leads to the following constrained optimization problem:

minimize

ψKL (qψ(Z|σ)kp(Z)) subject to Eqψ[Jπ(g(Z, x))] = Rp(Jπ(g(Z, x)), σ).(6)

In general, we cannot guarantee uniqueness of the solution to the optimization problem (6). How-

ever, in the supplementary material A, we provide further analysis of (6) along with a sufﬁcient

assumption under which the solution would be unique (Proposition A.3).

Connection to importance sampling. Importance sampling has been employed in rare-event sim-

ulation for accelerated safety veriﬁcation of autonomous systems [64–66], which yields a pessimistic

sampling distribution similar to our risk-biased model. However, a crucial difference of our approach

is that it estimates a more general risk measure instead of an expected value. Given a desired risk-

sensitivity level, unweighted samples from the proposal qwill directly yield the risk estimate (5).

This removes the need to compute the importance weights.

Connection to distributional robustness. When a coherent measure of risk is chosen as the un-

derlying risk measure (such as CVaR), the right-hand side of (5) is always equivalent to a worst-case

distribution qchosen out of an ambiguity set Q(4). In general, it is difﬁcult to verify if the optimal

distribution qψ∗is in Q, since the speciﬁcs of Qdepend on the choice of the risk measure as well

as the risk-sensitivity level σ. Nevertheless, it holds true that any feasible distribution qψfor (6)

yields the same worst-case expected cost as the most adversarial distribution from Q. Therefore, a

planner relying on qψinstead of pwill possess distributional robustness. We demonstrate this crucial

capability via an empirical evaluation in Section 6.3.

5 Implementation Details

Section Bof the supplemental deﬁnes a usual (unbiased) CVAE trajectory forecasting model that

learns two encoders, deﬁning the Gaussian latent variables Z|xand Z|x,y, and one decoder, predict-

ing Y|x,z. We propose to solve problem (6) by learning a third neural network encoder to deﬁne

Algorithm 1 Proposed Risk-Biasing Loss Estimation

Input: Trajectory (x, y)∼ D, risk level σ∼p(σ), KL-loss weight β, risk weight α, robot motion yrobot

1: for k∈ {1,...,K1}do

2: Sample latent zk|x∼ N (µ|x,Σ|x)with prior parameters (µ|x,Σ|x) = fφ1(x)

3: Decode risk-neutral predictions yk=gθ(x, zk|x)

4: Compute risk rusing {y1,...yK1}and Jyrobot with Monte Carlo estimation (e.g., [68])

5: for k∈ {1,...,K2}do

6: Sample biased latent ˆz(b)

k∼ N (µ(b),Σ(b))with risk-biased parameters (µ(b),Σ(b)) = fψ(x, σ, yrobot)

7: Decode risk-biased predictions ˆyk=gθ(x, ˆz(b)

8: Compute expected cost ˆr=1

K2PK2

k=1 Jyrobot (ˆyk)

9: Compute risk loss Lrisk =ρ(ˆr−r)and prior loss Lprior =KL N(µ(b),Σ(b))||N (µ|x,Σ|x)

Output: Loss value αLrisk +βLprior to train ψ(θand φ1are ﬁxed)

a biased latent distribution that, in combination with the pre-trained decoder, produces biased fore-

casts. This biased encoder takes the past trajectory x, a risk-level σ, and the robot future trajectory

yrobot. It outputs the parameters of a Normal distribution µ(b)and log(diag(Σ(b))).

In practice, we soften the hard constraint (5) by using the penalty method [67], which progressively

increases the weight αof the risk-loss during training. We also leverage a user-deﬁned sampling

distribution p(σ)to sample different risk-sensitivity levels during training, so that the risk estimate

remains accurate at any reasonable value of σat inference time. Finally, we encourage the model to

overestimate the risk rather than underestimate it so we scale by the positive value sand deﬁne an

asymmetric risk-loss that penalizes linearly the underestimation of the risk and logarithmically its

overestimation:

ρ(x) = s|x|,if sx ≤1

log(sx),otherwise. (7)

We obtain the following loss function with αand βcontrolling the relative importance of the losses:

L(ψ) = Eσ∼p(σ)α ρEqψ[Jπ(g(Z, x))] − Rp(Jπ(g(Z, x)), σ)+βKL (qψ(Z|σ,x)kp(Z|x)).

The expected values and the risk measure are approximated by Monte Carlo sampling. For comput-

ing CVaR (Rp(Jπ(g(Z, x)), σ)), we use the estimator proposed by Hong et al. [68]. Consistency

and asymptotic normality of this estimator hold under mild assumptions [68].

Algorithm 1lays out the procedure for training our proposed risk-aware prediction. It relies on a

fully trained CVAE with the encoder fφ1:x→(µ|x,Σ|x)and decoder gθ:x, z →ythat ﬁts

the distribution of Y|xfrom a dataset. We train a new latent-biasing encoder fψ:x, σ, yrobot →

(µ(b),Σ(b))to bias the latent distribution while keeping the rest of the CVAE ﬁxed. The risk-level σ

is randomly sampled on [0,1] during training and chosen by the user at test time.

6 Experiments

6.1 Biasing forecasts in a didactic scenario

Figure 1: Top-down view of a simulated scene. The robot in red moves left to right down the road as a pedestrian

in blue is crossing. The color of the depicted pedestrian trajectory samples indicates their corresponding Time-

To-Collision (TTC) cost for the robot. The slow mode in red is more costly than the fast mode in green.

We created the didactic simulation environment in Fig. 1where a red robot drives at constant speed

along a straight road with a stochastic pedestrian. The pedestrian either walks slowly or quickly,

yielding a bimodal distribution over their travel distance. We collected a dataset in this environment

where the initial position and orientation of the pedestrian are set at random. We used it to train

a risk-biased CVAE model according to the method presented in sections 4and 5. Fig. 2b shows

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RAP:Risk-AwarePredictionforRobustPlanningHarukiNishimuraJeanMercatBlakeWulfeRowanMcAllisterAdrienGaidonToyotaResearchInstitute,USAfirstname.lastname@tri.globalAbstract:Robustplanningininteractivescenariosrequirespredictingtheuncer-tainfuturetomakerisk-awaredecisions.Unfortunately,duetolong-tailsaf...

展开>> 收起<<

RAP Risk-Aware Prediction for Robust Planning Haruki NishimuraJean MercatBlake Wulfe Rowan McAllister Adrien Gaidon.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

RAP Risk-Aware Prediction for Robust Planning Haruki NishimuraJean MercatBlake Wulfe Rowan McAllister Adrien Gaidon

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: