Uncertain Evidence in Probabilistic Models and Stochastic Simulators

2025-05-06 0 0 829.41KB 14 页 10玖币
侵权投诉
Uncertain Evidence in Probabilistic Models and Stochastic Simulators
Andreas Munk 1Alexander Mead 1Frank Wood 123
Abstract
We consider the problem of performing Bayesian
inference in probabilistic models where observa-
tions are accompanied by uncertainty, referred
to as “uncertain evidence. We explore how to
interpret uncertain evidence, and by extension
the importance of proper interpretation as it per-
tains to inference about latent variables. We con-
sider a recently-proposed method “distributional
evidence” as well as revisit two older methods:
Jeffrey’s rule and virtual evidence. We devise
guidelines on how to account for uncertain evi-
dence and we provide new insights, particularly
regarding consistency. To showcase the impact
of different interpretations of the same uncertain
evidence, we carry out experiments in which one
interpretation is defined as “correct. We then
compare inference results from each different in-
terpretation illustrating the importance of careful
consideration of uncertain evidence.
1. Introduction
In classical Bayesian inference, the task is to infer the pos-
terior distribution
p(x|y)p(y, x)
over the latent variable
x
given (an observed)
y
. The joint distribution (or model),
p(y, x)
, is assumed known, and is typically factorized as
p(y, x) = p(y|x)p(x)
where
p(y|x)
and
p(x)
is the likeli-
hood and prior respectively. This paper deals with the case
where
y
is not observed exactly; rather it is associated with
uncertainty
1
which we refer to as “uncertain evidence. This
is a fairly common scenario as these uncertainties may stem
from: observational errors; distrust in the source provding
y
;
or when yis derived (stochastically) from some other data.
As a running example, consider the experiment of record-
ing the time
t
it takes for a ball to drop to the ground in
1
Department of Computer Science, University of British
Columbia, Vancouver, B.C., Canada
2
Inverted AI Ltd., Vancou-
ver, B.C., Canada
3
Mila, CIFAR AI Chair. Correspondence to:
Andreas Munk <amunk@cs.ubc.ca>.
Under submission
1
Ideally one would remodel the system to account for such
uncertianties, but this is rarely easy to do.
Table 1: Uncertain observation of the time
t
in the ball
dropping example.
VALUE [s] ±[s]
t0.5 0.05
order to determine the acceleration due to gravity,
g
. Tak-
ing some prior belief about the value of
g
, we may solve
this problem using Bayesian inference. That is, we infer
p(g|t)p(g)p(t|g)
, where
p(g)
is the prior density of
g
and
p(t|g)
is the likelihood representing the physical model
(or simulation) of the time
t
given
g
. In this setup, the un-
certainty about
t
given
g
would be due to neglecting air
resistance or ignoring variations in the distance the ball
drops as a result of vibrations etc. Assume next, that the
observations (or data) is given as in Table 1. It is not imme-
diately obvious how the uncertainty relates to
y
and there
are arguably at least two valid interpretations of the informa-
tion in Table 1: (1) it describes a distribution of the real time
t
. For example, the real time is normally distributed with
mean
0.5s
and standard deviation
0.05s
. (2) It describes ad-
ditional uncertainty on the predicted time and the observed
value is, indeed,
0.5s
. For example, given the predicted
time
t
the observed time
ˆ
t
is normally distributed with mean
t
and standard deviation
0.5s
. Importantly, in either case
the uncertainty can be represented with a given external
2
distribution,
q(·|·)
, which describes a stochastic relation-
ship between
t
and an auxiliary variables
ζ
. We consider
in case (1) and (2) the distributions
q(t|ζ)
and
q(ζ|t)
. In
the former case
ζ
is left implicit (something gave rise to the
uncertainty), and in the latter
ζ=ˆ
t
and the observation is
ˆ
t= 0.5s
. These two approaches are fundamentally different
operations that may lead to profoundly different inference
results.
The topic of observations associated with uncertainty has
been studied since at least 1965 (Jeffrey, 1965). Of par-
ticular relevance are the work of Jeffrey (1965) and Shafer
(1981); and Pearl (1988), giving rise to Jeffrey’s rule (Jeffrey,
1965; Shafer, 1981) and virtual evidence (Pearl, 1988). In
the example above, inference using approach (1) and (2) cor-
responds to Jeffrey’s rule and virtual evidence respectively.
2
In this context, external refers to a distribution provided from
some external source.
arXiv:2210.12236v2 [stat.ML] 26 Jan 2023
Uncertain Evidence in Probabilistic Models and Stochastic Simulators
Since then other approaches, closely related to Jeffrey’s
rule and virtual evidence has been proposed (e.g. Valtorta
et al., 2002; Tolpin et al., 2021; Yao, 2022). While each
approach has its own merits and is applicable under (almost)
the same circumstances, the original literature and most
prior work comparing these methods, (e.g., Pearl, 2001; Val-
torta et al., 2002; Chan & Darwiche, 2005; Ben Mrad et al.,
2013; Tolpin et al., 2021), are reluctant to take a concrete
stand on when each is more appropriate.
This paints an obfuscated picture of what to do, practically,
when presented with uncertain evidence. This obfuscation
becomes problematic when practitioners outside the field
of statistics deal with uncertain evidence and look to the
literature for ways to address it, especially, considering the
increased use of Bayesian inference in high-fidelity simu-
lators and probabilistic models (e.g., Papamakarios et al.,
2019; Baydin et al., 2019; Lavin et al., 2021; Liang et al.,
2021; van de Schoot et al., 2021; Wood et al., 2022; Mishra-
Sharma & Cranmer, 2022; Munk et al., 2022). For examples,
in physics it is not uncommon that likelihoods are given rel-
atively ad-hoc forms where some notion of “measurement
error” is attached to uncertain observations. However, the
underlying (stochastic) physical model is usually taken to
be understood perfectly. For example when inferring; the
Hubble parameter via supernovae brightness (e.g., Riess
et al., 2022); pre-merger parameters of black-hole/neutron
star binaries via gravitational waves (e.g., Thrane & Talbot,
2019; Dax et al., 2021); neutron star orbital/spin-down/post-
Newtonian parameters via pulsar timings (e.g., Lentati et al.,
2014; Vigeland & Vallisneri, 2014); planetary orbital pa-
rameters via radial velocity/transit-time observations (e.g.,
Schulze-Hartung et al., 2012; Feroz & Hobson, 2014; Liang
et al., 2021). In most cases a Gaussian likelihood is assumed
for the data, but exactly how the error relates to the data
generation process is not specified. If uncertainties about
simulator/model observations arise given external data, then
usually Jeffrey’s rule would apply, but it appears that virtual
evidence is more often employed.
It is the purpose of this paper to provide novel insights, the-
oretical contributions and concrete guidance as to how to
deal with observations with associated uncertainty as it per-
tains to Bayesian inference. We show, experimentally, how
misinterpretations of uncertain evidence can lead to vastly
different inference results; emphasizing the importance of
carefully accounting for uncertain evidence.
2. Background
Bayesian inference aims to characterize the posterior distri-
bution of the latent random vector
x
given the observed
random vector
y
. When observing
y
with certainty the
inference problem is “straightforward” in the sense that
p(x|y) = p(y,x)/p(y)
. However, exact inference is often in-
feasible as
p(y)
is usually intractable, but if the joint
p(y,x)
is calculable then inference is achievable via approximate
methods such as importance sampling (e.g. Hammersley
& Handscomb, 1964), Metropolis-Hastings (Metropolis &
Ulam, 1949; Metropolis et al., 1953; Hastings, 1970), and
Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 1994).
Unfortunately, standard Bayesian inference is incompati-
ble with uncertain evidence where exact values of
y
are
unavailable.
Before discussing ways to treat uncertain evidence, we first
introduce the highest level abstraction representing uncer-
tain evidence. Specifically, we consider
∈ E
, where
E
is a
set of “statements” specifying the uncertainty about
y
. For
example, in the drop of a ball example
would be a state-
ment represented as Table 1. In contrast
ζ
is a lower level
abstraction which is encoded in
. Dealing with uncertain
evidence is a matter of decoding or interpreting
, possi-
bly identifying
ζ
and relating it to
p(y,x)
. The canonical
example of interpreting uncertain evidence, as introduced
by (Jeffrey, 1965, p. 165), is “observation by candlelight,
which motivated Jeffrey’s rule:
Definition 2.1
(Jeffrey’s Rule (Jeffrey, 1965))
.
Given
p(y,x)
, let the interpretation of a given
∈ E
lead to
y
being associated with uncertainty, conditioned on auxiliary
evidence
ζ
—where
ζ
may be unknown—and denote the de-
coded uncertainty by
q(y|ζ)
. Then the updated (posterior)
distribution p(x|ζ)is:
p(x|ζ) = Eq(y|ζ)[p(x|y)] .(1)
In particular, one considers the updated joint
p(y,x|ζ) =
p(x|y)q(y|ζ), such that q(y|ζ)is a marginal of p(y,x|ζ).
Jeffrey envisioned the existence of the auxiliary variable (or
vector),
ζ
; however, Jeffrey’s rule is often defined without
it (e.g., Chan & Darwiche, 2005). Nonetheless, we argue
that reasoning about an auxiliary variable (or vector)
ζ
is
the more intuitive perspective as some evidence must have
given rise to
q
. Further, accompanying the introduction of
Jeffrey’s rule is the preservation of the conditional distribu-
tion of
x
upon applying Jeffrey’s rule, see e.g. (Jeffrey, 1965;
Diaconis & Zabell, 1982; Valtorta et al., 2002) and (Chan
& Darwiche, 2005). That is, the evidence
ζ
giving rise
to
q(y|ζ)
must not also alter the conditional distribution
of
x
given
y
. Mathematically, Jeffrey’s rule requires that,
p(x|y, ζ) = p(x|y)
. This, for instance, relates to the com-
mutativity of Jeffrey’s rule, which is treated in full detail
by Diaconis & Zabell (1982), and briefly discuss in Ap-
pendix A.
In contrast to Jeffrey’s rule is virtual evidence as proposed
by Pearl (1988). Virtual evidence also includes an auxiliary
virtual variable (or vector), but does so via the likelihood
q(ζ|y,x):=q(ζ|y), with the only parents of ζbeing y:
Uncertain Evidence in Probabilistic Models and Stochastic Simulators
xyζ
q(y|ζ)p(x|y)
Jeffrey’s rule
p(y|x)q(ζ|y)
Virtual evidence
Figure 1: Jeffrey’s rule compared to virtual evidence in
terms of the auxiliary evidence
ζ
. Both virtual evidence and
Jeffrey’s rule are defined in terms of the base model
p(y,x)
.
Definition 2.2
(Virtual evidence (Pearl, 1988))
.
Given
p(y,x)
and suppose a given
∈ E
leads to the interpreta-
tion that we extend
p(y,x)
with an auxiliary virtual variable
(or vector)
ζ
such that: (1) in the discrete case, where the
values of
y∈ {yk}K
k=1
are mutually exclusive, the uncertain
evidence is decoded to as likelihood ratios3{λk}K
k=1:
λ1:··· :λK=q(ζ|y1):··· :q(ζ|yK).(2)
The posterior over
x
given uncertain evidence is (Chan &
Darwiche 2005; a result we also prove in Appendix B),
p(x|ζ) = PK
k=1 λkp(yk,x)
PK
j=1 λjp(yj).(3)
(2) If
y
is continuous, decoding
leads to the virtual likeli-
hood
q(ζ|y)
such that the posterior is proportional to the
(virtual) joint
p(x|ζ)Zp(ζ, y,x) dy=Zq(ζ|y)p(y,x) dy.(4)
In practice, in the continuous case one can approximate the
posterior using standard approximate inference algorithms
requiring only the evaluation of the joint. In the discrete
case, Eq. 3, the posterior inference is exact assuming a
known
p(yi)
for all
i∈ {1, . . . , K}
. When comparing Jef-
frey’s rule and virtual evidence (e.g., Pearl, 1988; Valtorta
et al., 2002; Jacobs, 2019) we can do so in terms of
ζ
and
the corresponding graphical model (Figure 1). This figure is
a graphical representation of how Jeffrey’s rule and virtual
evidence relate
ζ
to the existing probabilistic model,
p(y,x)
.
Particularly, Jeffrey’s rule and virtual evidence affect the
model in opposite directions. Jeffrey’s rule pertains to uncer-
tainty about
y
given some evidence, while virtual evidence
requires reasoning about the likelihood q(ζ|y).
It is (perhaps) not surprising that one may apply Jeffrey’s
rule, yet implement it as a special case of virtual evidence,
by choosing a particular form of likelihood ratios, Equa-
tion (2), and vice versa (Pearl, 1988; Chan & Darwiche,
3
The notation for ratios containing several terms, for example
A, B, and C, is written as
x:y:z
. This is understood as: “for
every xpart of A there is ypart B and zpart C.
2005). However, this is of purely algorithmic significance
as the two approaches remain fundamentally different.
A third approach to uncertain evidence, recently introduced
by (Tolpin et al., 2021), treats the uncertain evidence on
y
as
an event. This approach, which we refer to as distributional
evidence, defines a likelihood on the event
{yDq}
(reads
as “the event that the distribution of
y
is
Dq
with density
q(y)
”) and considers the auxiliary variable
ζ={yDq}
:
Definition 2.3
(Distributional evidence (Tolpin et al., 2021))
.
Let
p(y,x) = p(y|x)p(x)
be the joint distribution with a
known factorization. Assume the interpretation of a given
∈ E
yields a density
q(y)
, with distribution
Dq
. Define the
likelihood p(yDq|x)as:
p(ζ|x) = exp Eq(y)[ln p(y|x)]
Z(x)(5)
where
ζ={yDq}
and
Z(x)
is a normalization constant
that generally depends on
x
. Typically we drop explicitly
writing
ζ
and simply write
q(yDq|x)
. See (Tolpin et al.,
2021) for sufficient conditions for which Z(x)<.
3. Which Approach?
The lack of a general consensus on how best to approach
uncertain evidence means that it is difficult to know what
to do, in practical terms, when faced with uncertain evi-
dence. In isolation, each approach discussed in the previous
section appears well supported, even when applied to the
same model (e.g., Ben Mrad et al., 2013). However, the un-
derlying arguments remain somewhat circumstantial. Prior
work tends to create contexts tailored for each approach and
it is unclear how relatable or generalizable those contexts
are. As such, much prior work is not particularly instructive
when deducing which approach to adopt for new applica-
tions that do not fit those prior context. We argue that the
apparent philosophical discourse fundamentally stem from
a disagreement about the model
M˜
M
in which we seek
to do inference given uncertain evidence
∈ E
. This can be
framed as an inference problem where we seek to find (or di-
rectly define)
p(M|)
. The significance of this perspective is
that reasoning about the triplet
M∈ M
,
∈ E
, and
p(M|)
makes for a better foundation that encourages discussions
about and makes clear the underlying assumptions.
How then should we define
p(M|)
? In the general case,
reaching consensus is close to impossible as it requires fully
specifying
M
and
E
(all possible models and conceivable
evidences). However, while universal consensus is arguably
unattainable; “local” consensus might be. Here locality
refers to defining
p(M|)
on constrained and application de-
pendent subsets
˜
E ⊂ E
and
˜
M ⊂ M
. This perspective was
considered by (Grove & Halpern, 1997), yet does not seem
to have resurfaced in this context since. Grove & Halpern
(1997) define
˜
M
in terms of a prior
p(M)
and implicitly
摘要:

UncertainEvidenceinProbabilisticModelsandStochasticSimulatorsAndreasMunk1AlexanderMead1FrankWood123AbstractWeconsidertheproblemofperformingBayesianinferenceinprobabilisticmodelswhereobserva-tionsareaccompaniedbyuncertainty,referredtoas“uncertainevidence.”Weexplorehowtointerpretuncertainevidence,andb...

展开>> 收起<<
Uncertain Evidence in Probabilistic Models and Stochastic Simulators.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:14 页 大小:829.41KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注