Uncertain Evidence in Probabilistic Models and Stochastic Simulators

2025-05-06 0 0 829.41KB 14 页 10玖币

侵权投诉

Andreas Munk 1Alexander Mead 1Frank Wood 123

Abstract

We consider the problem of performing Bayesian

inference in probabilistic models where observa-

tions are accompanied by uncertainty, referred

to as “uncertain evidence.” We explore how to

interpret uncertain evidence, and by extension

the importance of proper interpretation as it per-

tains to inference about latent variables. We con-

sider a recently-proposed method “distributional

evidence” as well as revisit two older methods:

Jeffrey’s rule and virtual evidence. We devise

guidelines on how to account for uncertain evi-

dence and we provide new insights, particularly

regarding consistency. To showcase the impact

of different interpretations of the same uncertain

evidence, we carry out experiments in which one

interpretation is deﬁned as “correct.” We then

compare inference results from each different in-

terpretation illustrating the importance of careful

consideration of uncertain evidence.

1. Introduction

In classical Bayesian inference, the task is to infer the pos-

terior distribution

p(x|y)∝p(y, x)

over the latent variable

given (an observed)

. The joint distribution (or model),

p(y, x)

, is assumed known, and is typically factorized as

p(y, x) = p(y|x)p(x)

where

p(y|x)

and

p(x)

is the likeli-

hood and prior respectively. This paper deals with the case

where

is not observed exactly; rather it is associated with

uncertainty

which we refer to as “uncertain evidence.” This

is a fairly common scenario as these uncertainties may stem

from: observational errors; distrust in the source provding

;

or when yis derived (stochastically) from some other data.

As a running example, consider the experiment of record-

ing the time

it takes for a ball to drop to the ground in

Department of Computer Science, University of British

Columbia, Vancouver, B.C., Canada

Inverted AI Ltd., Vancou-

ver, B.C., Canada

Mila, CIFAR AI Chair. Correspondence to:

Andreas Munk <amunk@cs.ubc.ca>.

Under submission

Ideally one would remodel the system to account for such

uncertianties, but this is rarely easy to do.

Table 1: Uncertain observation of the time

in the ball

dropping example.

VALUE [s] ±[s]

t0.5 0.05

order to determine the acceleration due to gravity,

. Tak-

ing some prior belief about the value of

, we may solve

this problem using Bayesian inference. That is, we infer

p(g|t)∝p(g)p(t|g)

, where

p(g)

is the prior density of

and

p(t|g)

is the likelihood representing the physical model

(or simulation) of the time

given

. In this setup, the un-

certainty about

given

would be due to neglecting air

resistance or ignoring variations in the distance the ball

drops as a result of vibrations etc. Assume next, that the

observations (or data) is given as in Table 1. It is not imme-

diately obvious how the uncertainty relates to

and there

are arguably at least two valid interpretations of the informa-

tion in Table 1: (1) it describes a distribution of the real time

. For example, the real time is normally distributed with

mean

0.5s

and standard deviation

0.05s

. (2) It describes ad-

ditional uncertainty on the predicted time and the observed

value is, indeed,

0.5s

. For example, given the predicted

time

the observed time

is normally distributed with mean

and standard deviation

0.5s

. Importantly, in either case

the uncertainty can be represented with a given external

distribution,

q(·|·)

, which describes a stochastic relation-

ship between

and an auxiliary variables

. We consider

in case (1) and (2) the distributions

q(t|ζ)

and

q(ζ|t)

. In

the former case

is left implicit (something gave rise to the

uncertainty), and in the latter

ζ=ˆ

and the observation is

t= 0.5s

. These two approaches are fundamentally different

operations that may lead to profoundly different inference

results.

The topic of observations associated with uncertainty has

been studied since at least 1965 (Jeffrey, 1965). Of par-

ticular relevance are the work of Jeffrey (1965) and Shafer

(1981); and Pearl (1988), giving rise to Jeffrey’s rule (Jeffrey,

1965; Shafer, 1981) and virtual evidence (Pearl, 1988). In

the example above, inference using approach (1) and (2) cor-

responds to Jeffrey’s rule and virtual evidence respectively.

In this context, external refers to a distribution provided from

some external source.

arXiv:2210.12236v2 [stat.ML] 26 Jan 2023

Uncertain Evidence in Probabilistic Models and Stochastic Simulators

Since then other approaches, closely related to Jeffrey’s

rule and virtual evidence has been proposed (e.g. Valtorta

et al., 2002; Tolpin et al., 2021; Yao, 2022). While each

approach has its own merits and is applicable under (almost)

the same circumstances, the original literature and most

prior work comparing these methods, (e.g., Pearl, 2001; Val-

torta et al., 2002; Chan & Darwiche, 2005; Ben Mrad et al.,

2013; Tolpin et al., 2021), are reluctant to take a concrete

stand on when each is more appropriate.

This paints an obfuscated picture of what to do, practically,

when presented with uncertain evidence. This obfuscation

becomes problematic when practitioners outside the ﬁeld

of statistics deal with uncertain evidence and look to the

literature for ways to address it, especially, considering the

increased use of Bayesian inference in high-ﬁdelity simu-

lators and probabilistic models (e.g., Papamakarios et al.,

2019; Baydin et al., 2019; Lavin et al., 2021; Liang et al.,

2021; van de Schoot et al., 2021; Wood et al., 2022; Mishra-

Sharma & Cranmer, 2022; Munk et al., 2022). For examples,

in physics it is not uncommon that likelihoods are given rel-

atively ad-hoc forms where some notion of “measurement

error” is attached to uncertain observations. However, the

underlying (stochastic) physical model is usually taken to

be understood perfectly. For example when inferring; the

Hubble parameter via supernovae brightness (e.g., Riess

et al., 2022); pre-merger parameters of black-hole/neutron

star binaries via gravitational waves (e.g., Thrane & Talbot,

2019; Dax et al., 2021); neutron star orbital/spin-down/post-

Newtonian parameters via pulsar timings (e.g., Lentati et al.,

2014; Vigeland & Vallisneri, 2014); planetary orbital pa-

rameters via radial velocity/transit-time observations (e.g.,

Schulze-Hartung et al., 2012; Feroz & Hobson, 2014; Liang

et al., 2021). In most cases a Gaussian likelihood is assumed

for the data, but exactly how the error relates to the data

generation process is not speciﬁed. If uncertainties about

simulator/model observations arise given external data, then

usually Jeffrey’s rule would apply, but it appears that virtual

evidence is more often employed.

It is the purpose of this paper to provide novel insights, the-

oretical contributions and concrete guidance as to how to

deal with observations with associated uncertainty as it per-

tains to Bayesian inference. We show, experimentally, how

misinterpretations of uncertain evidence can lead to vastly

different inference results; emphasizing the importance of

carefully accounting for uncertain evidence.

2. Background

Bayesian inference aims to characterize the posterior distri-

bution of the latent random vector

given the observed

random vector

. When observing

with certainty the

inference problem is “straightforward” in the sense that

p(x|y) = p(y,x)/p(y)

. However, exact inference is often in-

feasible as

p(y)

is usually intractable, but if the joint

p(y,x)

is calculable then inference is achievable via approximate

methods such as importance sampling (e.g. Hammersley

& Handscomb, 1964), Metropolis-Hastings (Metropolis &

Ulam, 1949; Metropolis et al., 1953; Hastings, 1970), and

Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 1994).

Unfortunately, standard Bayesian inference is incompati-

ble with uncertain evidence where exact values of

are

unavailable.

Before discussing ways to treat uncertain evidence, we ﬁrst

introduce the highest level abstraction representing uncer-

tain evidence. Speciﬁcally, we consider

∈ E

, where

is a

set of “statements” specifying the uncertainty about

. For

example, in the drop of a ball example



would be a state-

ment represented as Table 1. In contrast

is a lower level

abstraction which is encoded in



. Dealing with uncertain

evidence is a matter of decoding or interpreting



, possi-

bly identifying

and relating it to

p(y,x)

. The canonical

example of interpreting uncertain evidence, as introduced

by (Jeffrey, 1965, p. 165), is “observation by candlelight,”

which motivated Jeffrey’s rule:

Deﬁnition 2.1

(Jeffrey’s Rule (Jeffrey, 1965))

Given

p(y,x)

, let the interpretation of a given

∈ E

lead to

being associated with uncertainty, conditioned on auxiliary

evidence

—where

may be unknown—and denote the de-

coded uncertainty by

q(y|ζ)

. Then the updated (posterior)

distribution p(x|ζ)is:

p(x|ζ) = Eq(y|ζ)[p(x|y)] .(1)

In particular, one considers the updated joint

p(y,x|ζ) =

p(x|y)q(y|ζ), such that q(y|ζ)is a marginal of p(y,x|ζ).

Jeffrey envisioned the existence of the auxiliary variable (or

vector),

; however, Jeffrey’s rule is often deﬁned without

it (e.g., Chan & Darwiche, 2005). Nonetheless, we argue

that reasoning about an auxiliary variable (or vector)

the more intuitive perspective as some evidence must have

given rise to

. Further, accompanying the introduction of

Jeffrey’s rule is the preservation of the conditional distribu-

tion of

upon applying Jeffrey’s rule, see e.g. (Jeffrey, 1965;

Diaconis & Zabell, 1982; Valtorta et al., 2002) and (Chan

& Darwiche, 2005). That is, the evidence

giving rise

q(y|ζ)

must not also alter the conditional distribution

given

. Mathematically, Jeffrey’s rule requires that,

p(x|y, ζ) = p(x|y)

. This, for instance, relates to the com-

mutativity of Jeffrey’s rule, which is treated in full detail

by Diaconis & Zabell (1982), and brieﬂy discuss in Ap-

pendix A.

In contrast to Jeffrey’s rule is virtual evidence as proposed

by Pearl (1988). Virtual evidence also includes an auxiliary

virtual variable (or vector), but does so via the likelihood

q(ζ|y,x):=q(ζ|y), with the only parents of ζbeing y:

Uncertain Evidence in Probabilistic Models and Stochastic Simulators

xyζ

q(y|ζ)p(x|y)

Jeﬀrey’s rule

p(y|x)q(ζ|y)

Virtual evidence

Figure 1: Jeffrey’s rule compared to virtual evidence in

terms of the auxiliary evidence

. Both virtual evidence and

Jeffrey’s rule are deﬁned in terms of the base model

p(y,x)

Deﬁnition 2.2

(Virtual evidence (Pearl, 1988))

Given

p(y,x)

and suppose a given

∈ E

leads to the interpreta-

tion that we extend

p(y,x)

with an auxiliary virtual variable

(or vector)

such that: (1) in the discrete case, where the

values of

y∈ {yk}K

k=1

are mutually exclusive, the uncertain

evidence is decoded to as likelihood ratios3{λk}K

k=1:

λ1:··· :λK=q(ζ|y1):··· :q(ζ|yK).(2)

The posterior over

given uncertain evidence is (Chan &

Darwiche 2005; a result we also prove in Appendix B),

p(x|ζ) = PK

k=1 λkp(yk,x)

j=1 λjp(yj).(3)

(2) If

is continuous, decoding



leads to the virtual likeli-

hood

q(ζ|y)

such that the posterior is proportional to the

(virtual) joint

p(x|ζ)∝Zp(ζ, y,x) dy=Zq(ζ|y)p(y,x) dy.(4)

In practice, in the continuous case one can approximate the

posterior using standard approximate inference algorithms

requiring only the evaluation of the joint. In the discrete

case, Eq. 3, the posterior inference is exact assuming a

known

p(yi)

for all

i∈ {1, . . . , K}

. When comparing Jef-

frey’s rule and virtual evidence (e.g., Pearl, 1988; Valtorta

et al., 2002; Jacobs, 2019) we can do so in terms of

and

the corresponding graphical model (Figure 1). This ﬁgure is

a graphical representation of how Jeffrey’s rule and virtual

evidence relate

to the existing probabilistic model,

p(y,x)

Particularly, Jeffrey’s rule and virtual evidence affect the

model in opposite directions. Jeffrey’s rule pertains to uncer-

tainty about

given some evidence, while virtual evidence

requires reasoning about the likelihood q(ζ|y).

It is (perhaps) not surprising that one may apply Jeffrey’s

rule, yet implement it as a special case of virtual evidence,

by choosing a particular form of likelihood ratios, Equa-

tion (2), and vice versa (Pearl, 1988; Chan & Darwiche,

The notation for ratios containing several terms, for example

A, B, and C, is written as

x:y:z

. This is understood as: “for

every xpart of A there is ypart B and zpart C.”

2005). However, this is of purely algorithmic signiﬁcance

as the two approaches remain fundamentally different.

A third approach to uncertain evidence, recently introduced

by (Tolpin et al., 2021), treats the uncertain evidence on

an event. This approach, which we refer to as distributional

evidence, deﬁnes a likelihood on the event

{y∼Dq}

(reads

as “the event that the distribution of

with density

q(y)

”) and considers the auxiliary variable

ζ={y∼Dq}

Deﬁnition 2.3

(Distributional evidence (Tolpin et al., 2021))

Let

p(y,x) = p(y|x)p(x)

be the joint distribution with a

known factorization. Assume the interpretation of a given

∈ E

yields a density

q(y)

, with distribution

. Deﬁne the

likelihood p(y∼Dq|x)as:

p(ζ|x) = exp Eq(y)[ln p(y|x)]

Z(x)(5)

where

ζ={y∼Dq}

and

Z(x)

is a normalization constant

that generally depends on

. Typically we drop explicitly

writing

and simply write

q(y∼Dq|x)

. See (Tolpin et al.,

2021) for sufﬁcient conditions for which Z(x)<∞.

3. Which Approach?

The lack of a general consensus on how best to approach

uncertain evidence means that it is difﬁcult to know what

to do, in practical terms, when faced with uncertain evi-

dence. In isolation, each approach discussed in the previous

section appears well supported, even when applied to the

same model (e.g., Ben Mrad et al., 2013). However, the un-

derlying arguments remain somewhat circumstantial. Prior

work tends to create contexts tailored for each approach and

it is unclear how relatable or generalizable those contexts

are. As such, much prior work is not particularly instructive

when deducing which approach to adopt for new applica-

tions that do not ﬁt those prior context. We argue that the

apparent philosophical discourse fundamentally stem from

a disagreement about the model

M∈˜

in which we seek

to do inference given uncertain evidence

∈ E

. This can be

framed as an inference problem where we seek to ﬁnd (or di-

rectly deﬁne)

p(M|)

. The signiﬁcance of this perspective is

that reasoning about the triplet

M∈ M

∈ E

, and

p(M|)

makes for a better foundation that encourages discussions

about and makes clear the underlying assumptions.

How then should we deﬁne

p(M|)

? In the general case,

reaching consensus is close to impossible as it requires fully

specifying

and

(all possible models and conceivable

evidences). However, while universal consensus is arguably

unattainable; “local” consensus might be. Here locality

refers to deﬁning

p(M|)

on constrained and application de-

pendent subsets

E ⊂ E

and

M ⊂ M

. This perspective was

considered by (Grove & Halpern, 1997), yet does not seem

to have resurfaced in this context since. Grove & Halpern

(1997) deﬁne

in terms of a prior

p(M)

and implicitly

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UncertainEvidenceinProbabilisticModelsandStochasticSimulatorsAndreasMunk1AlexanderMead1FrankWood123AbstractWeconsidertheproblemofperformingBayesianinferenceinprobabilisticmodelswhereobserva-tionsareaccompaniedbyuncertainty,referredtoasuncertainevidence.Weexplorehowtointerpretuncertainevidence,andb...

展开>> 收起<<

Uncertain Evidence in Probabilistic Models and Stochastic Simulators.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Uncertain Evidence in Probabilistic Models and Stochastic Simulators

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: