
(HJ) reachability in Section III, and formally state our
safety concept learning problem in Section IV. Then we
describe the details of our key contributions: (i) We propose
a data-driven approach to learn humans’ collision avoidance
behaviors in the control space to capture “reasonable driving
behaviors” (Section V). Specifically, we learn safe control
sets from demonstrations via a high order control barrier
function (HOCBF) [13] framework. (ii) We develop a con-
strained game-theoretic optimization problem derived from a
HJ reachability formulation to synthesize novel data-driven
safety concepts that are robust to other agents’ behaviors
while respecting the learned collision avoidance behaviors
(Section VI). (iii) We demonstrate our proposed learning
framework using highway driving data and show that the
resulting data-driven safety concept is less conservative than
other common safety concepts (due to the way it captures
constraints on reasonable agent behavior) and thus is useful
as a “responsibility-aware” evaluation metric for the safety
of AV interactions (Section VII).
II. RELATED WORK
We use the term safety concept to help unify existing
safety theory prevalent in various robot planning and control
algorithms, such as velocity obstacles [14], [15], forward
reachable sets [16], [17], contingency planning [18], [19],
backward reachability [8], and other methods that make static
assumptions on agent behavior [20], [21]. The differences
between various safety concepts stem from the assumptions
about the behavior of other interacting agents, ranging from
worst-case assumptions [8] to presuming agents follow fixed
open-loop trajectories (e.g., braking [21], constant velocity
[15]). Indeed, a core challenge is in selecting behavioral
assumptions that balances conservatism, tractability, inter-
pretability, and compatibility with real-world interactions.
Recent works propose dynamically changing the conser-
vatism of the safety concept based on online estimations of
how confident the robot’s human behavior prediction model
is. If a human agent is behaving as expected (i.e., high
model confidence), then the worst-case assumptions in the
safety concept can be relaxed, and vice versa [22], [23].
However, the integrity of the adaptive safety concept depends
on the quality of the prediction model where obtaining an
accurate human behavior prediction model is in general quite
challenging and, indeed, is an active research field [24].
Another data-driven safe control technique is to use expert
demonstrations and learn a control barrier function (CBF)
[25] to describe unsafe regions in the state space [26], [27],
[28], [29]. The learned CBF is then directly used as the core
safety mechanism in synthesizing a safe policy. However,
CBFs are not well-suited for interactive settings where there
is uncertainty in how other interacting agents may behave. In
our work, we too consider learning CBFs (specifically, high
order CBFs (HOCBFs) [13]), but instead use the learned
HOCBF as an intermediate step towards formulating a more
rigorous notion of safety rooted in robust control theory
which enjoys interpretability and verifiability benefits.
III. SAFETY CONCEPT VIA HAMILTON-JACOBI
REACHABILITY
We define a safety concept as a combination of two
functions mapping world state to (i) a scalar measure of
safety, and (ii) a set of allowable actions for each agent in
which to preserve safety. A family of safety concepts can be
described via a HJ reachability formulation [2].
HJ reachability is a mathematical formalism used for
characterizing the safety properties of dynamical systems [8],
[30]. The outputs of a HJ reachability computation are (i)
a HJ value function, a scalar-valued function that measures
“distance” to collision, and (ii) a set of controls that prevents
the safety measure from decreasing further—precisely the
components needed for a safety concept.
Consider a target set Twhich is the set of collision states
between agents A(ego agent) and B(contender). The HJ
reachability formulation describes a two-player differential
game to determine whether it is possible for the ego agent
to avoid entering Tunder any family of closed-loop policies
of the contender, as well as the ego agent’s appropriate
control policy for ensuring safety. It is assumed that the con-
tender follows an adversarial policy and has the advantage
with respect to the information pattern. Using the principle
of dynamic programming, the collision avoidance problem
reduces to solving the Hamilton-Jacobi-Isaacs (HJI) partial
differential equation (PDE)[8],
∂V(x, t)
∂t + min n0,max
uA∈UAmin
uB∈UB∇xV(x, t)>f(x, uA, uB)o= 0
V(x, 0) = `(x)
(1)
where x∈ X denotes the joint state of agents Aand
B,uA∈ UAand uB∈ UBare the available (bounded)
controls1of agents Aand B, respectively, and f(·,·,·)is the
joint dynamics assumed to be measurable in uAand uBfor
each x, and uniformly continuous, bounded, and Lipschitz
continuous in xfor fixed uAand uB.2The boundary
condition is defined by a function `:X → Rwhose zero
sub-level set encodes the target set, i.e., T={x|`(x)<0}.
The solution V(x, t), t ∈[−T, 0], called the HJ value
function, captures the lowest value of `(·)along the system
trajectory within |t|seconds if the systems starts at xand
both agents Aand Bact optimally, that is, u∗
A(x), u∗
B(x) =
arg maxuA∈U A(arg) minuB∈U B∇xV(x, t)>f(x, uA, uB).
Thus the HJ value function fulfills the first aspect of a
safety concept. After obtaining the HJ value function,
we can also consider the set of controls that prevent
the HJ value function from decreasing over time. That
is, we can compute the safety-preserving control set,
UA
safe(x) = {uA∈ UA|minuB∈U BdV(x,t)
dt ≥0}, thus
fulfilling the second aspect of a safety concept. By varying
the problem parameters, i.e., control sets, behavior type,
1The control sets UAand UBare typically chosen to reflect the physically
feasible limits of the system.
2This assumption ensures that trajectories are generated by a unique
control sequence.