Probabilities Are Not Enough Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty Thom Badings1 Licio Romao2 Alessandro Abate2 Nils Jansen1

2025-04-26 1 0 2.03MB 16 页 10玖币

侵权投诉

Probabilities Are Not Enough: Formal Controller Synthesis for

Stochastic Dynamical Models with Epistemic Uncertainty

Thom Badings1, Licio Romao2, Alessandro Abate2, Nils Jansen1

1Radboud University, Nijmegen, the Netherlands

2University of Oxford, Oxford, United Kingdom

Abstract

Capturing uncertainty in models of complex dynamical sys-

tems is crucial to designing safe controllers. Stochastic noise

causes aleatoric uncertainty, whereas imprecise knowledge of

model parameters leads to epistemic uncertainty. Several ap-

proaches use formal abstractions to synthesize policies that sat-

isfy temporal speciﬁcations related to safety and reachability.

However, the underlying models exclusively capture aleatoric

but not epistemic uncertainty, and thus require that model pa-

rameters are known precisely. Our contribution to overcoming

this restriction is a novel abstraction-based controller synthe-

sis method for continuous-state models with stochastic noise

and uncertain parameters. By sampling techniques and robust

analysis, we capture both aleatoric and epistemic uncertainty,

with a user-speciﬁed conﬁdence level, in the transition proba-

bility intervals of a so-called interval Markov decision process

(iMDP). We synthesize an optimal policy on this iMDP, which

translates (with the speciﬁed conﬁdence level) to a feedback

controller for the continuous model with the same perfor-

mance guarantees. Our experimental benchmarks conﬁrm that

accounting for epistemic uncertainty leads to controllers that

are more robust against variations in parameter values.

1 Introduction

Stochastic models.

Stochastic dynamical models capture

complex systems where the likelihood of transitions is speci-

ﬁed by probabilities (Kumar and Varaiya 2015). Controllers

for stochastic models must act safely and reliably with re-

spect to a desired speciﬁcation. Traditional control design

methods use, e.g., Lyapunov functions and optimization to

guarantee stability and (asymptotic) convergence. However,

alternative methods are needed to give formal guarantees

about richer temporal speciﬁcations relevant to, for example,

safety-critical applications (Fan et al. 2022).

Finite abstractions.

A powerful approach to synthesizing

certiﬁably safe controllers leverages probabilistic veriﬁcation

to provide formal guarantees over speciﬁcations of safety

(always avoid certain states) and reachability (reach a certain

set of states). A common example is the reach-avoid speciﬁ-

cation, where the task is to maximize the probability of reach-

ing desired goal states while avoiding unsafe states (Fisac

et al. 2015). Finite abstractions can make continuous models

2023, Association for the Advancement of Artiﬁcial

System

Controller

Control

input

Ù−

Ù+

Ù−

Ù+

PÜ= high 

P{Ü= low }

Figure 1: Aleatoric (stochastic) uncertainty in the wind (

)

causes probability distributions over the outcomes of controls,

while epistemic uncertainty in the mass (

) of the drone

causes state transitions to be nondeterministic.

amenable to techniques and tools from formal veriﬁcation:

by discretizing their state and action spaces, abstractions re-

sult in, e.g., ﬁnite Markov decision processes (MDPs) that

soundly capture the continuous dynamics (Abate et al. 2008).

Veriﬁcation guarantees on the ﬁnite abstraction can thus carry

over to the continuous model. In this paper, we adopt such

an abstraction-based approach to controller synthesis.

Probabilities are not enough.

The notion of uncertainty

is often distinguished in aleatoric (statistical) and epistemic

(systematic) uncertainty (Fox and

Ulk

umen 2011; Sullivan

2015). Epistemic uncertainty is, in particular, modeled by

parameters that are not precisely known (Smith 2013). A

general premise is that purely probabilistic approaches fail to

capture epistemic uncertainty (H

ullermeier and Waegeman

2021). In this work, we aim to reason under both aleatoric

and epistemic uncertainty in order to synthesize provably

correct controllers for safety-critical applications. Existing

abstraction methods fail to achieve this novel, general goal.

Models with epistemic uncertainty.

We consider reach-

avoid problems for stochastic dynamical models with con-

tinuous state and action spaces under epistemic uncertainty

described by parameters that lie within a convex uncertainty

set. In the simplest case, this uncertainty set is an interval,

such as a drone whose mass is only known to lie between

0.75

–

1.25 kg

. As shown in Fig. 1, the dynamics of the drone

depend on uncertain factors, such as the wind and the drone’s

mass. For the wind, we may derive a probabilistic model

from, e.g., weather data to reason over the likelihood of state

dynamics. For the mass, however, we do not have information

about the likelihood of each value, so employing a probabilis-

tic model is unrealistic. Thus, we treat epistemic uncertainty

arXiv:2210.05989v2 [eess.SY] 7 Dec 2022

in such imprecisely known parameters (in this case, the mass)

using a nondeterministic framework instead.

Problem statement.

Our goal is to synthesize a controller

that (1) is robust against nondeterminism due to parameter

uncertainty and (2) reasons over probabilities derived from

stochastic noise. In other words, the controller must satisfy a

given speciﬁcation under any possible outcome of the nonde-

terminism (robustness) and with at least a certain probability

regarding the stochastic noise (reasoning over probabilities).

We wish to synthesize a controller with a probably approxi-

mately correct (PAC)-style guarantee to satisfy a reach-avoid

speciﬁcation with at least a desired threshold probability.

Thus, we solve the following problem:

Problem.

Given a reach-avoid speciﬁcation for a stochas-

tic model with uncertain parameters, compute a con-

troller and a lower bound on the probability that, under

any admissible value of the parameters, the speciﬁcation

is probabilistically satisﬁed with this lower bound and

with at least a user-speciﬁed conﬁdence probability.

We solve this problem via a discrete-state abstraction of the

continuous model. We generate this abstraction by partition-

ing the continuous state space and deﬁning actions that induce

potential transitions between elements of this partition.

Algorithmically, the closest approach to ours is Badings

et al. (2022), which uses abstractions to synthesize controllers

for stochastic models with aleatoric uncertainty of unknown

distribution, but with known parameters. Our setting is more

general, as epistemic uncertainty requires fundamental differ-

ences to the technical approach, as explained below.

Robustness to capture nondeterminism.

The main con-

tribution that allows us to capture nondeterminism, is that

we reason over sets of potential transitions (as shown by

the boxes in Fig. 1), rather than precise transitions, e.g., as

in Badings et al. (2022). Intuitively, for a given action, the

aleatoric uncertainty creates a probability distribution over

sets of possible outcomes. To ensure robustness against epis-

temic uncertainty, we consider all possible outcomes within

these sets. We show that, for our class of models, comput-

ing these sets of all possible outcomes is computationally

tractable. Building upon this reasoning, we provide the fol-

lowing guarantees to solve the above-mentioned problem.

1) PAC guarantees on abstractions.

We show that

both probabilities and nondeterminism can be captured

in the transition probability intervals of so-called interval

Markov decision processes (iMDPs, Givan, Leach, and Dean

2000). We use sampling methods from scenario optimiza-

tion (Campi, Car

e, and Garatti 2021) and concentration in-

equalities (Boucheron, Lugosi, and Massart 2013) to compute

PAC bounds on these intervals. With a predeﬁned conﬁdence

probability, the iMDP correctly captures both aleatoric and

epistemic uncertainty.

2) Correct-by-construction control.

For the iMDP, we

compute a robust optimal policy that maximizes the worst-

case probability of satisfying the reach-avoid speciﬁcation.

The iMDP policy is automatically translated to a controller

for the original, continuous model on the ﬂy. We show that, by

construction, the PAC guarantees on the iMDP carry over to

the satisfaction of the speciﬁcation by the continuous model.

Contributions.

We develop the ﬁrst abstraction-based, for-

mal controller synthesis method that simultaneously captures

epistemic and aleatoric uncertainty for models with contin-

uous state and action spaces. We also provide results on the

PAC-correctness of obtained iMDP abstractions and guaran-

tees on the synthesized controllers for a reach-avoid speci-

ﬁcation. Our numerical experiments in Sect. 6 conﬁrm that

accounting for epistemic uncertainty yields controllers that

are more robust against deviations in the parameter values.

Related Work

Uncertainty models.

Distinguishing aleatoric from epis-

temic uncertainty is a key challenge towards trustworthy

AI (Thiebes, Lins, and Sunyaev 2021), and has been con-

sidered in reinforcement learning (Charpentier et al. 2022),

Bayesian neural networks (Depeweg et al. 2018; Loquercio,

Seg

u, and Scaramuzza 2020), and systems modeling (Smith

2013). Dynamical models with epistemic parameter uncer-

tainty (but without aleatoric uncertainty) are considered

by Yedavalli (2014), and Geromel and Colaneri (2006). Con-

trol of models similar to ours is studied by (Modares 2022),

albeit only for stability speciﬁcations.

Model-based approaches.

Abstractions of stochastic mod-

els are a well-studied research area (Abate et al. 2008; Alur

et al. 2000), with applications to stochastic hybrid (Cauchi

et al. 2019; Lavaei et al. 2022), switched (Lahijanian, Ander-

sson, and Belta 2015), and partially observable systems (Bad-

ings et al. 2021; Haesaert et al. 2018). Various tools exist,

e.g., StocHy (Cauchi and Abate 2019), ProbReach (Shmarov

and Zuliani 2015), and SReachTools (Vinod, Gleason, and

Oishi 2019). However, in contrast to the approach of this

paper, none of these papers deals with epistemic uncertainty.

Fan et al. (2022) use optimization for reach-avoid control

of linear but non-stochastic models with bounded distur-

bances. Barrier functions are used for cost minimization in

stochastic optimal control (Pereira et al. 2020). So-called

funnel libraries are used by Majumdar and Tedrake (2017)

for robust feedback motion planning under epistemic uncer-

tainty. Finally, Zikelic et al. (2022) learn policies together

with formal reach-avoid certiﬁcates using neural networks

for nonlinear systems with only aleatoric uncertainty.

Data-driven approaches.

Models with (partly) unknown

dynamics express epistemic uncertainty about the underlying

system. Veriﬁcation of such models based on data has been

done using Bayesian inference (Haesaert, den Hof, and Abate

2017), optimization (Kenanian et al. 2019; Vinod, Israel, and

Topcu 2022), and Gaussian process regression (Jackson et al.

2020). Moreover, Knuth et al. (2021), and Chou, Ozay, and

Berenson (2021) use neural network models for feedback

motion planning for nonlinear deterministic systems with

probabilistic safety and reachability guarantees. Data-driven

abstractions have been developed for monotone (Makdesi,

Girard, and Fribourg 2021) and event-triggered systems (Pe-

ruffo and Mazo 2023). By contrast to our setting, these

approaches consider models with non-stochastic dynam-

ics. A few recent exceptions also consider aleatoric uncer-

tainty (Salamati and Zamani 2022; Lavaei et al. 2023), but

these approaches require more strict assumptions (e.g., dis-

crete input sets) than our model-based approach.

Safe learning.

While outside the scope of this paper, our

approach ﬁts naturally in a model-based safe learning con-

text (Brunke et al. 2022; Garc

ıa and Fern

andez 2015). In

such a setting, our approach may synthesize controllers that

guarantee safe interactions with the system, while techniques

from, for example, reinforcement learning (RL, Berkenkamp

et al. 2017; Zanon and Gros 2021) or stochastic system identi-

ﬁcation (Tsiamis and Pappas 2019) can reduce the epistemic

uncertainty based on state observations. A risk-sensitive RL

scheme providing approximate safety guarantees is devel-

oped by Geibel and Wysotzki (2005); we instead give formal

guarantees at the cost of an expensive abstraction.

2 Problem Statement

The cardinality of a discrete set

|X |

. A probability space

is a triple

(Ω,F,P)

of an arbitrary set

Ω

, sigma algebra

Ω

, and probability measure

P:F → [0,1]

. The con-

vex hull of a polytopic set

with vertices

v1, . . . , vn

conv(v1, . . . , vn)

. The word controller relates to continuous

models; a policy to discrete models.

Stochastic Models with Parameter Uncertainty

To capture parameter uncertainty in a linear time-invariant

stochastic system, we consider a model (we extend this model

with parameters describing uncertain additive disturbances in

Sect. 5) whose continuous state

at time

k∈N

evolves as

xk+1 =A(α)xk+B(α)uk+ηk,(1)

where

uk∈ U

is the control input, which is constrained by

the control space

U= conv(u1, . . . , uq)⊂Rm

, being is

a convex polytope with

vertices, and where the process

noise

ηk

is a stochastic process deﬁned on a probability space

(Ω,F,P)

. Both the dynamics matrix

A(α)∈Rn×n

and the

control matrix

B(α)∈Rn×m

are a convex combination of a

ﬁnite set of r∈Nknown matrices:

A(α) = Xr

i=1 αiAi, B(α) = Xr

i=1 αiBi,(2)

where the unknown model parameter

α∈Γ

can be any point

in the unit simplex Γ⊂Rr:

Γ = nα∈Rr:αi≥0,∀i∈ {1, . . . , r},Xr

i=1 αi= 1o.

The model in Eq. (1) captures epistemic uncertainty in

A(α)

and

B(α)

through model parameter

, as well as aleatoric

uncertainty in the discrete-time stochastic process (ηk)k∈N.

Assumption 1.

The noise

ηk

is independent and identically

distributed (i.i.d., which is a common assumption) and has

density with respect to the Lebesgue measure. However, con-

trary to most deﬁnitions, we allow Pto be unknown.

Importantly, being distribution-free, our proposed techniques

hold for any distribution of ηthat satisﬁes Assumption 1.

The matrices

and

can represent the bounds of inter-

vals over parameters, as illustrated by the following example.

Example 1.

Consider again the drone of Fig. 1. The drone’s

longitudinal position pkand velocity vkare modeled as

xk+1 =pk+1

vk+1=1τ

0 1 −0.1τ

mxk+τ2

muk+ηk,

with

the discretization time, and

U= [−5,5]

. Assume that

the mass

is only known to lie within

[0.75,1.25]

. Then, we

obtain a model as Eq. (1), with

r= 2

vertices where

A1, B1

are obtained for m:= 0.75, and A2, B2for m:= 1.25.

Reach-Avoid Planning Problem

The goal is to steer the state

of Eq. (1) to a desirable state

within

time steps while always remaining in a safe region.

Formally, let the safe set

be a compact set of

, and let

G ⊆ Z

be the goal set (see Fig. 2). The control inputs

Eq. (1) are selected by a time-varying feedback controller:

Deﬁnition 1.

A time-varying feedback controller

c:Rn×

N→ U

for Eq. (1) is a function that maps a state

xk∈Rn

and a time step k∈Nto a control input uk∈ U.

The space of admissible feedback controllers is denoted by

. The reach-avoid probability

V(x0, α, c)

is the probability

that Eq. (1), under parameter α∈Γand a controller c, satis-

ﬁes a reach-avoid planning problem with respect to the sets

Zand G, starting from initial state x0∈Rn. Formally:

Deﬁnition 2.

The reach-avoid probability

V:Rn×Γ×C →

[0,1] for a given controller c∈ C on horizon Kis

V(x0, α, c) = Pηk∈Ω : xkevolves as per Eq. (1),

∃k0∈ {0, . . . , K}such that xk0∈ G,and

xk∈ Z, uk=c(xk, k)∀k∈ {0, . . . , k0}.

We aim to ﬁnd a controller for which the reach-avoid proba-

bility is above a certain threshold λ∈[0,1]. However, since

parameter

of Eq. (1) is unknown, it is impossible to com-

pute the reach-avoid probability explicitly. We instead take a

robust approach, thus stating the formal problem as follows:

Formal problem.

Given an initial state

, ﬁnd a controller

c∈ C

together with a (high) probability threshold

λ∈[0,1]

such that V(x0, α, c)≥λholds for all α∈Γ.

Thus, we want to ﬁnd a controller

c∈ C

that is robust against

any possible instance of parameter

α∈Γ

in Eq. (1). Due

to the aleatoric uncertainty in Eq. (1) of unknown distribu-

tion, we solve the problem up to a user-speciﬁed conﬁdence

probability β∈(0,1), as we shall see in Sect. 5.

Interval Markov Decision Processes

We solve the problem by generating a ﬁnite-state abstraction

of the model as an interval Markov decision process (iMDP):

Deﬁnition 3.

An iMDP is a tuple

MI= (S, Act, sI,P)

where

is a ﬁnite set of states,

Act

is a ﬁnite set of actions,

sI∈S

is the initial state, and

P:S×Act×S→I∪{[0,0]}

is an uncertain partial probabilistic transition function over

intervals I={[a, b]|a, b ∈(0,1] and a≤b}.

Note that an interval may not have a lower bound of

, except

for the

[0,0]

interval. If

P(s, a, s0) = [0,0] ∀s0∈S

, action

a∈Act

is not enabled in state

s∈S

. We can instantiate

Figure 2: Partition of safe set

X ⊆ Z

(which excludes obsta-

cles) into regions that deﬁne iMDP states. A state

is a goal

state,

si∈SG

, if its region is contained in the goal region

an iMDP into an MDP with a partial transition function

P:S×Act * Dist (S)

by ﬁxing a probability

P(s, a)(s0)∈

P(s, a, s0)

for each

s, s0∈S

and for each

a∈Act

enabled

, such that

P(s, a)

is a probability distribution over

For brevity, we write this instantiation as

P∈ P

and denote

the resulting MDP by MI[P].

A deterministic policy (Baier and Katoen 2008) for an

iMDP

is a function

π:S∗→Act

, where

S∗

is a se-

quence of states (memoryless policies do not sufﬁce for time-

bounded speciﬁcations), with

π∈ΠMI

the admissible policy

space. For a policy

and an instantiated MDP

MI[P]

for

P∈ P

, we denote by

P rπ(MI[P]|=ϕK

sI)

the probability

of satisfying a reach-avoid speciﬁcation

1ϕK

(i.e., reaching a

goal in

SG⊆S

within

K∈N

steps, while not leaving a safe

set

SZ⊆S

). A robust optimal policy

π?∈ΠMI

maximizes

this probability under the minimizing instantiation P∈ P:2

π?= arg max

π∈Π

min

P∈P P rπ(MI[P]|=ϕK

sI).(3)

We compute an optimal policy in Eq. (3) using a robust

variant of value iteration proposed by Wolff, Topcu, and

Murray (2012). Note that deterministic policies sufﬁce to

obtain optimal values for Eq. (3), see Puggelli et al. (2013).

3 Finite-State Abstraction

To solve the formal problem, we construct a ﬁnite-state ab-

straction of Eq. (1) as an iMDP. We deﬁne the actions of

this iMDP via backward reachability computations on a so-

called nominal model that neglects any source of uncertainty.

We then compensate for the error caused by this modeling

simpliﬁcation in the iMDP’s transition probability intervals.

Nominal Model of the Dynamics

To build our abstraction, we rely on a nominal model that ne-

glects both the aleatoric and epistemic uncertainty in Eq. (1),

and is thus deterministic. Concretely, we ﬁx any value

ˆα∈Γ

and deﬁne the nominal model dynamics as

ˆxk+1 =A(ˆα)xk+B(ˆα)uk.(4)

Due to the linearity of the dynamics, we can now express the

successor state xk+1 with full uncertainty, from Eq. (1), as

xk+1 = ˆxk+1 +δ(α, xk, uk) + ηk,(5)

Note that

ϕK

is a reach-avoid speciﬁcation for an iMDP, while

V(x0, α, c)

is the reach-avoid probability on the dynamical model.

Such min-max (and max-min) problems are common in (dis-

tributionally) robust optimization, see Ben-Tal, Ghaoui, and Ne-

mirovski (2009), and Wiesemann, Kuhn, and Sim (2014) for details.

with

δ(α, xk, uk)

being a new term, called the epistemic er-

ror, encompassing the error caused by parameter uncertainty:

δ(α, xk, uk)=[A(α)−A(ˆα)]xk+ [B(α)−B(ˆα)]uk.(6)

In other words, the successor state

xk+1

is the nominal one,

plus the epistemic error, and plus the stochastic noise. Note

that for

α= ˆα

(i.e., the true model parameters equal their

nominal values), we obtain

δ(α, xk, uk)=0

. We also impose

the next assumption on the nominal model, which guarantees

that we can compute the inverse image of Eq. (4) for a given

ˆxk+1, and is used in the proof of Lemma 3 in Appendix A.

Assumption 2. The matrix A(ˆα)in Eq. (4) is non-singular.

Assumption Assumption 2 is mild as it only requires the

existence of a non-singular matrix in conv{A1, . . . , Ar}.

States

We create a partition of a subset

of the safe set

on the

continuous state space, see Fig. 2. This partition fully covers

the goal set Gbut excludes unsafe states, i.e., any x /∈ Z.

Deﬁnition 4.

A ﬁnite collection of subsets

(Pi)L

i=1

is called

apartition of X ⊆ Z ⊂ Rnif the following conditions hold:

1. X=SL

i=1 Pi,

2. PiTPj=∅,∀i, j ∈ {1, . . . , L}, i 6=j.

We append to the partition a so-called absorbing region

P0= cl(Rn\ X )

, which is deﬁned as the closure of

Rn\ X

and represents any state

x /∈ X

that is disregarded in subse-

quent reachability computations. We consider partitions into

convex polytopic regions, which will allow us to compute

PAC probability intervals in Lemma 1 using results from Ro-

mao, Papachristodoulou, and Margellos (2022) on scenario

optimization programs with discarded constraints:

Assumption 3.

Each region

is a convex polytope given by

Pi={x∈Rn:Hix≤hi},(7)

with

Hi∈Rpi×n

and

hi∈Rpi

for some

pi∈N

, and the

inequality in Eq. (7) is to be interpreted element-wise.

We deﬁne an iMDP state for each element of

(Pi)L

i=0

, yield-

ing a set of

L+ 1

discrete states

S={si|i= 0, . . . , L}

Deﬁne

T:Rn→ {0,1, . . . , L}

as the map from any

x∈ X

to its corresponding region index

. We say that a continuous

state

belongs to iMDP state

T(x) = i

. State

is a

deadlock, such that the only transition leads back to s0.

Actions

Recall that we deﬁne the iMDP actions via backward reacha-

bility computations under the nominal model in Eq. (4). Let

T={T1,...,TM}

be a ﬁnite collection of target sets, each

of which is a convex polytope,

T`= conv{t1, . . . , td} ⊂ Rn

Every target set corresponds to an iMDP action, yielding the

set

Act ={a`|`= 1, . . . , M}

of actions. Action

a`∈Act

represents a transition to

ˆxk+1 ∈ T`

that is feasible under

the nominal model. The one-step backward reachable set

R−1

ˆα(T`)

, shown in Fig. 3, represents precisely these contin-

uous states from which such a direct transition to T`exists:

R−1

ˆα(T`) = x∈Rn| ∃u∈ U, A(ˆα)x+B(ˆα)u∈ T`.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ProbabilitiesAreNotEnough:FormalControllerSynthesisforStochasticDynamicalModelswithEpistemicUncertaintyThomBadings1,LicioRomao2,AlessandroAbate2,NilsJansen11RadboudUniversity,Nijmegen,theNetherlands2UniversityofOxford,Oxford,UnitedKingdomAbstractCapturinguncertaintyinmodelsofcomplexdynamicalsys-tems...

展开>> 收起<<

Probabilities Are Not Enough Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty Thom Badings1 Licio Romao2 Alessandro Abate2 Nils Jansen1.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Probabilities Are Not Enough Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty Thom Badings1 Licio Romao2 Alessandro Abate2 Nils Jansen1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: