
assumption necessary for the system semantics to be math-
ematically well-defined. We also assume that
X ⊆ Rm
is
compact and that the dynamics function
f
is Lipschitz con-
tinuous, which are common assumptions in control theory.
Probabilistic reach-avoid problem
Let
Xt⊆ X
and
Xu⊆
X
be disjoint Borel-measurable subsets of
Rm
, which we
refer to as the target set and the unsafe set, respectively. Let
p∈[0,1]
be a probability threshold. Our goal is to learn a
control policy which guarantees that, with probability at least
p
, the system reaches the target set
Xt
without reaching the
unsafe set
Xu
. Formally, we want to learn a control policy
π
such that, for any initial state x0∈ X0, we have
Px0hReachAvoid(Xt,Xu)i≥p
with
ReachAvoid(Xt,Xu) = {(xt,ut, ωt)t∈N0| ∃t∈
N0.xt∈ Xt∧(∀t0≤t. xt06∈ Xu)}
the set of trajectories
that reach Xtwithout reaching Xu.
We restrict to the cases when either
p < 1
, or
p= 1
and
Xu=∅
. Our approach is not applicable to the case
p= 1
and
Xu6=∅
due to technical issues that arise in defining
our formal certificate, which we discuss in the following
section. We remark that probabilistic reachability is a special
instance of our problem obtained by setting
Xu=∅
. On
the other hand, we cannot directly obtain the probabilistic
safety problem by assuming any specific form of the target
set
Xt
, however we will show in the following section that
our method implies probabilistic safety with respect to
Xu
if
we provide it with Xt=∅.
Theoretical Results
We now present our framework for formally certifying a
reach-avoid constraint with a given probability threshold. Our
framework is based on the novel notion of reach-avoid super-
martingales (RASMs) that we introduce in this work. Note
that, in this section only, we assume that the policy is fixed.
In the next section, we will present our algorithm for learning
policies that provide formal reach-avoid guarantees in which
RASMs will be an integral ingredient. In what follows, we
consider a discrete-time stochastic dynamical system defined
as in the previous section. For now, we assume that the prob-
ability threshold is strictly smaller than
1
, i.e.
p < 1
. We will
later show that our approach straightforwardly extends to the
case p= 1 and Xu=∅.
Reach-avoid supermartingales
We define a reach-avoid su-
permartingale (RASM) to be a continuous function
V:X →
R
that assigns real values to system states. The name is cho-
sen to emphasize the connection to supermartingale processes
from probability theory [
64
], which we will explore later in
order to prove the effectiveness of RASMs for verifying
reach-avoid properties. The value of
V
is required to be non-
negative over the state space
X
(Nonnegativity condition),
to be bounded from above by
1
over the set of initial states
X0
(Initial condition) and to be bounded from below by
1
1−p
over the set of unsafe states
Xu
(Safety condition). Hence,
in order for a system trajectory to reach an unsafe state and
violate the safety specification, the value of the RASM
V
needs to increase at least
1
1−p
times along the trajectory. Fi-
nally, we require the existence of
> 0
such that the value
of
V
decreases in expected value by at least
after every
one-step evolution of the system from every system state
x∈ X \Xt
for which
V(x)≤1
1−p
(Expected decrease con-
dition). Intuitively, this last condition imposes that the system
has a tendency to strictly decrease the value of
V
until either
the target set
Xt
is reached or a state with
V(x)≥1
1−p
is
reached. However, as the value of
V
needs to increase at least
1
1−p
times in order for the system to reach an unsafe state,
these four conditions will allow us to use RASMs to certify
that the reach-avoid constraint is satisfied with probability at
least p.
Definition 1
(Reach-avoid supermartingales)
.
Let
Xt⊆ X
and
Xu⊆ X
be the target set and the unsafe set, and let
p∈[0,1)
be the probability threshold. A continuous function
V:X → R
is said to be a reach-avoid supermartingale
(RASM) with respect to Xt,Xuand pif it satisfies:
1. Nonnegativity condition. V(x)≥0for each x∈ X .
2. Initial condition. V(x)≤1for each x∈ X0.
3. Safety condition. V(x)≥1
1−pfor each x∈ Xu.
4.
Expected decrease condition. There exists
> 0
such
that, for each
x∈ X \Xt
at which
V(x)≤1
1−p
, we have
V(x)≥Eω∼d[V(f(x, π(x), ω))] + .
Comparison to Lyapunov functions
The defining proper-
ties of RASMs hint a connection to Lyapunov functions for
deterministic control systems. However, the key difference
between Lyapunov functions and our RASMs is that Lya-
punov functions deterministically decrease in value whereas
RASMs decrease in expectation. Deterministic decrease en-
sures that each level set of a Lyapunov function, i.e. a set of
states at which the value of Lyapunov functions is at most
l
for some
l≥0
, is an invariant of the system. However,
it is in general not possible to impose such a condition on
stochastic systems. In contrast, our RASMs only require ex-
pected decrease in the level, and the Initial and the Unsafe
conditions can be viewed as conditions on the maximal initial
level set and the minimal unsafe level set. The choice of a
ratio of these two level values allows us to use existing results
from martingale theory in order to obtain probabilistic avoid-
ance guarantees, while the Expected decrease condition by
> 0
furthermore provides us with probabilistic reachability
guarantees.
Certifying reach-avoid constraints via RASMs
We now
show that the existence of an
-RASM for some
> 0
implies
that the reach-avoid constraint is satisfied with probability at
least p.
Theorem 1.
Let
Xt⊆ X
and
Xu⊆ X
be the target set
and the unsafe set, respectively, and let
p∈[0,1)
be the
probability threshold. Suppose that there exists an RASM
V
with respect to
Xt
,
Xu
and
p
. Then, for every
x0∈ X0
,
Px0[ReachAvoid(Xt,Xu)] ≥p.
The complete proof of Theorem 1 is provided in the Ap-
pendix. However, in what follows we sketch the key ideas
behind our proof, in order to illustrate the applicability of mar-
tingale theory to reasoning about stochastic systems which
we believe to have significant potential for applications be-
yond the scope of this work. To prove the theorem, we first
show that an
-RASM
V
induces a supermartingale [
64
] in