
Uncertain Evidence in Probabilistic Models and Stochastic Simulators
Since then other approaches, closely related to Jeffrey’s
rule and virtual evidence has been proposed (e.g. Valtorta
et al., 2002; Tolpin et al., 2021; Yao, 2022). While each
approach has its own merits and is applicable under (almost)
the same circumstances, the original literature and most
prior work comparing these methods, (e.g., Pearl, 2001; Val-
torta et al., 2002; Chan & Darwiche, 2005; Ben Mrad et al.,
2013; Tolpin et al., 2021), are reluctant to take a concrete
stand on when each is more appropriate.
This paints an obfuscated picture of what to do, practically,
when presented with uncertain evidence. This obfuscation
becomes problematic when practitioners outside the field
of statistics deal with uncertain evidence and look to the
literature for ways to address it, especially, considering the
increased use of Bayesian inference in high-fidelity simu-
lators and probabilistic models (e.g., Papamakarios et al.,
2019; Baydin et al., 2019; Lavin et al., 2021; Liang et al.,
2021; van de Schoot et al., 2021; Wood et al., 2022; Mishra-
Sharma & Cranmer, 2022; Munk et al., 2022). For examples,
in physics it is not uncommon that likelihoods are given rel-
atively ad-hoc forms where some notion of “measurement
error” is attached to uncertain observations. However, the
underlying (stochastic) physical model is usually taken to
be understood perfectly. For example when inferring; the
Hubble parameter via supernovae brightness (e.g., Riess
et al., 2022); pre-merger parameters of black-hole/neutron
star binaries via gravitational waves (e.g., Thrane & Talbot,
2019; Dax et al., 2021); neutron star orbital/spin-down/post-
Newtonian parameters via pulsar timings (e.g., Lentati et al.,
2014; Vigeland & Vallisneri, 2014); planetary orbital pa-
rameters via radial velocity/transit-time observations (e.g.,
Schulze-Hartung et al., 2012; Feroz & Hobson, 2014; Liang
et al., 2021). In most cases a Gaussian likelihood is assumed
for the data, but exactly how the error relates to the data
generation process is not specified. If uncertainties about
simulator/model observations arise given external data, then
usually Jeffrey’s rule would apply, but it appears that virtual
evidence is more often employed.
It is the purpose of this paper to provide novel insights, the-
oretical contributions and concrete guidance as to how to
deal with observations with associated uncertainty as it per-
tains to Bayesian inference. We show, experimentally, how
misinterpretations of uncertain evidence can lead to vastly
different inference results; emphasizing the importance of
carefully accounting for uncertain evidence.
2. Background
Bayesian inference aims to characterize the posterior distri-
bution of the latent random vector
x
given the observed
random vector
y
. When observing
y
with certainty the
inference problem is “straightforward” in the sense that
p(x|y) = p(y,x)/p(y)
. However, exact inference is often in-
feasible as
p(y)
is usually intractable, but if the joint
p(y,x)
is calculable then inference is achievable via approximate
methods such as importance sampling (e.g. Hammersley
& Handscomb, 1964), Metropolis-Hastings (Metropolis &
Ulam, 1949; Metropolis et al., 1953; Hastings, 1970), and
Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 1994).
Unfortunately, standard Bayesian inference is incompati-
ble with uncertain evidence where exact values of
y
are
unavailable.
Before discussing ways to treat uncertain evidence, we first
introduce the highest level abstraction representing uncer-
tain evidence. Specifically, we consider
∈ E
, where
E
is a
set of “statements” specifying the uncertainty about
y
. For
example, in the drop of a ball example
would be a state-
ment represented as Table 1. In contrast
ζ
is a lower level
abstraction which is encoded in
. Dealing with uncertain
evidence is a matter of decoding or interpreting
, possi-
bly identifying
ζ
and relating it to
p(y,x)
. The canonical
example of interpreting uncertain evidence, as introduced
by (Jeffrey, 1965, p. 165), is “observation by candlelight,”
which motivated Jeffrey’s rule:
Definition 2.1
(Jeffrey’s Rule (Jeffrey, 1965))
.
Given
p(y,x)
, let the interpretation of a given
∈ E
lead to
y
being associated with uncertainty, conditioned on auxiliary
evidence
ζ
—where
ζ
may be unknown—and denote the de-
coded uncertainty by
q(y|ζ)
. Then the updated (posterior)
distribution p(x|ζ)is:
p(x|ζ) = Eq(y|ζ)[p(x|y)] .(1)
In particular, one considers the updated joint
p(y,x|ζ) =
p(x|y)q(y|ζ), such that q(y|ζ)is a marginal of p(y,x|ζ).
Jeffrey envisioned the existence of the auxiliary variable (or
vector),
ζ
; however, Jeffrey’s rule is often defined without
it (e.g., Chan & Darwiche, 2005). Nonetheless, we argue
that reasoning about an auxiliary variable (or vector)
ζ
is
the more intuitive perspective as some evidence must have
given rise to
q
. Further, accompanying the introduction of
Jeffrey’s rule is the preservation of the conditional distribu-
tion of
x
upon applying Jeffrey’s rule, see e.g. (Jeffrey, 1965;
Diaconis & Zabell, 1982; Valtorta et al., 2002) and (Chan
& Darwiche, 2005). That is, the evidence
ζ
giving rise
to
q(y|ζ)
must not also alter the conditional distribution
of
x
given
y
. Mathematically, Jeffrey’s rule requires that,
p(x|y, ζ) = p(x|y)
. This, for instance, relates to the com-
mutativity of Jeffrey’s rule, which is treated in full detail
by Diaconis & Zabell (1982), and briefly discuss in Ap-
pendix A.
In contrast to Jeffrey’s rule is virtual evidence as proposed
by Pearl (1988). Virtual evidence also includes an auxiliary
virtual variable (or vector), but does so via the likelihood
q(ζ|y,x):=q(ζ|y), with the only parents of ζbeing y: