A Geometric Perspective on Bayesian and Generalized Fiducial Inference

2025-04-24
0
0
1.98MB
32 页
10玖币
侵权投诉
Submitted to Statistical Science
A Geometric Perspective on Bayesian and
Generalized Fiducial Inference
Yang Liu , Jan Hannig and Alexander C. Murph
Abstract. Post-data statistical inference concerns making probability state-
ments about model parameters conditional on observed data. When a priori
knowledge about parameters is available, post-data inference can be conve-
niently made from Bayesian posteriors. In the absence of prior information,
we may still rely on objective Bayes or generalized fiducial inference (GFI).
Inspired by approximate Bayesian computation, we propose a novel charac-
terization of post-data inference with the aid of differential geometry. Under
suitable smoothness conditions, we establish that Bayesian posteriors and
generalized fiducial distributions (GFDs) can be respectively characterized
by absolutely continuous distributions supported on the same differentiable
manifold: The manifold is uniquely determined by the observed data and the
data generating equation of the fitted model. Our geometric analysis not only
sheds light on the connection and distinction between Bayesian inference and
GFI, but also allows us to sample from posteriors and GFDs using manifold
Markov chain Monte Carlo algorithms. A repeated measures analysis of vari-
ance example is presented to illustrate the sampling procedure.
Key words and phrases: approximate Bayesian computation, Bayesian infer-
ence, differentiable manifold, generalized fiducial inference, Markov chain
Monte Carlo.
1. INTRODUCTION
A post-data probability represents the degree of be-
lief or plausibility that a certain assertion about model
parameters is true given the observed data, which dif-
fers from a classical frequentist (i.e., pre-data) probabil-
ity that is attached to the generative process of the ob-
served data (Dempster,1964;Martin and Liu,2015a).
Post-data statistical inferences are most commonly made
from a Bayesian posterior that is jointly determined by the
prior distribution of model parameters and the likelihood
function of the model (e.g., Gelman et al.,2013, Section
1.3). When little a priori information about parameters
can be garnered, we may still resort to default or weakly
informative priors to make Bayesian inference (Kass and
Yang Liu is an Associate Professor from the Department of
Human Development and Quantitative Methodology at
University of Maryland, College Park, MD, USA (e-mail:
yliu87@umd.edu). Jan Hannig is a Professor from the
Department of Statistics and Operations Research at the
University of North Carolina, Chapel Hill, NC, USA (e-mail:
jan.hannig@unc.edu). Alexander C. Murph is a postdoctoral
researcher from the Computer, Computational, and Statistical
Sciences Division (CCS-6) at the Los Alamos National
Laboratory, Los Alamos, NM, USA (e-mail: murph@lanl.gov).
Wasserman,1996;Berger,2006;Berger, Bernardo and
Sun,2015).
Alternatively, we can avoid prior specification alto-
gether and obtain a post-data probability distribution of
parameters by inverting the data generating process. This
idea originated from Fisher’s fiducial argument (Fisher,
1925,1930,1933,1935) and motivated the development
of Dempster-Shafer theory (Dempster,1966,1968,2008),
inferential models (Martin and Liu,2013,2015b,a,c),
generalized fiducial inference (GFI; Cisewski and Han-
nig,2012;Hannig,2009,2013;Hannig et al.,2016;Lai,
Hannig and Lee,2015;Liu and Hannig,2016,2017;
Murph, Hannig and Williams,2022a;Shi et al.,2021) and
so forth. Among all the descendents of Fisher’s fiducial
inference, only GFI is considered in the present paper; the
associated post-data distribution of parameters is referred
to as the generalized fiducial distribution (GFD).
A statistical model specifies how data are generated
through a data generating equation (DGE), which is
a function of parameters and random components with
completely known distributions (e.g., uniform or standard
Gaussian variates).1The DGE plays a key role in ap-
proximating post-data inference by simulation (Cranmer,
1A data generating equation may be referred to as a data generat-
ing algorithm (DGA; Murph, Hannig and Williams,2022a) when the
1
arXiv:2210.05462v3 [math.ST] 1 Oct 2023
2
Brehmer and Louppe,2020). When a proper prior can be
specified, we may simulate parameters and random com-
ponents independently, obtain imputed data through the
DGE, and retain the samples if and only if the imputed
and observed data are sufficiently close. Such an accept-
reject scheme is often referred to as approximate Bayesian
computation (ABC; e.g., Beaumont,2019;Beaumont,
Zhang and Balding,2002;Beaumont et al.,2009;Blum,
2010;Fearnhead and Prangle,2012;Marin et al.,2012;
Sisson and Fan,2011;Sisson, Fan and Beaumont,2018):
The retained samples of parameters approximately follow
the posterior distribution and hence can be utilized to es-
timate posterior expectations. If no prior distribution is
available, we can still sample random components but not
parameters. To circumvent the latter, GFI proceeds to pair
each realization of random components with the optimal
parameter values such that the resulting imputed data is as
close to the observed data as possible in some sense. In-
deed such a best matching to the observed data may still
not be good enough: Those values are deemed incompat-
ible with the observed data and therefore have to be dis-
carded, leading to a rejection step similar to ABC. It turns
out that the resulting marginal samples of parameters ap-
proximately follow the GFD (Hannig et al.,2016).
It is then natural to ponder what the limits of the trun-
cated distributions are when we request the imputed data
to be infinitesimally close to the observed data in ap-
proximate post-data inference. As the main result of the
present work, we completely characterize the weak limit
for both approximate Bayesian inference and GFI when
the truncation set contracts to a twice continuously dif-
ferentiable submanifold of the joint space of parameters
and random components. We are able to express the ab-
solutely continuous densities of the limiting distributions
with respect to the intrinsic measure of the submanifold,
and show that Bayesian posteriors and GFDs in the usual
sense are the corresponding marginals on the parameter
space (Propositions 1and 2). As a contribution to the lit-
erature of GFI, we derive an explicit formula for the fidu-
cial density in Proposition 2that is more general com-
pared to Theorem 1 of Hannig et al. (2016). Meanwhile,
our work should be distinguished from Murph, Hannig
and Williams (2022b), which also studied the geometry
of GFI but focused on the case when the parameter space
itself is a manifold. On the theoretical side, our geomet-
ric formulation applies to a broad class of parametric sta-
tistical models for continuous data and facilitates insight-
ful comparisons between Bayesian inference and GFI. On
the practical side, the geometric characterization suggests
an alternative sampling scheme for approximate post-data
inference: We apply manifold Markov chain Monte Carlo
generative process rather than the formal mathematical expression is
of interest.
(MCMC) algorithms (e.g., Brubaker, Salzmann and Urta-
sun,2012;Zappa, Holmes-Cerfon and Goodman,2018)
to sample from the limiting distributions on the data gen-
erating manifold and only retain the parameter marginals.
For certain problems (e.g., GFI for mixed-effects mod-
els), manifold MCMC sampling may scale up better than
existing computational procedures.
The rest of the paper is organized as follows. We re-
visit in Section 2the formal definitions of ABC and GFI;
a graphical illustration is provided using a Gaussian lo-
cation example. In Section 3, we first present a general
result (Theorem 1): When an ambient distribution is trun-
cated to a sequence of increasingly finer approximations
to a smooth manifold, the weak limit is absolutely con-
tinuous with respect to the manifold’s intrinsic measure.
We then apply the general result to derive representations
for Bayesian posteriors and GFDs (Propositions 1and 2)
and comment on their discrepancies. We review in Sec-
tion 4an MCMC algorithm that (approximately) samples
from distributions on differentiable manifolds. A repeated
measures analysis of variance (ANOVA) example is then
presented to illustrate the sampling procedure (Section
5). Limitations and possible extensions of the proposed
method are discussed at the end (Section 6).
2. APPROXIMATE INFERENCE BY SIMULATION
2.1 Data Generating Equation
Let Y,Υ, and Θdenote the spaces of data, random
components, and parameters associated with a fixed fam-
ily of parametric models: In particular, Y ⊆Rn,Υ⊆Rm,
and Θ⊆Rq, where n,m, and qare positive integers. Fol-
lowing Hannig et al. (2016), we characterize the model of
interest by its DGE
(1) Y=G(U, θ),
in which the random components U∈Υfollow a com-
pletely known distribution (typically uniform or standard
Gaussian), θ∈Θdenotes the parameters, and Y∈ Y de-
notes the random data. (1) can be conceived as a formal-
ization of the data generating code: Given true parameters
θand an instance of random components U=u, a unique
set of data Y=ycan be imputed by evaluating the DGE,
i.e., y=G(u, θ).
Now suppose that we have observed Y=y. Post-data
inference aims to assign probabilities to assertions about
parameters θconditional on the observed data y(Mar-
tin and Liu,2015c). In the conventional Bayesian frame-
work, we presume that θfollows a proper prior distribu-
tion and make probabilistic statements based on the con-
ditional distribution of θgiven y. When it is difficult to
specify an informative prior, one may still rely on objec-
tive priors that reflect paucity of knowledge or informa-
tion (Kass and Wasserman,1996;Berger,2006;Berger,
GEOMETRY OF INFERENCE 3
Bernardo and Sun,2015). We next revisit the definition
of a Bayesian posterior through the lens of ABC, as well
as Hannig et al.’s (2016) definition of GFD: The latter
replaces the prior sampling of parameters in ABC by an
optimization problem in the parameter space, which is a
natural workaround when no prior information is avail-
able.
2.2 Approximate Bayesian Computation
Let ρdenote the density of U, and πbe the prior den-
sity of θ; we only restrict to density functions with respect
to the Lebesgue measure and assume that random number
generation from ρand πis feasible. Given the observed
data yand a pre-specified tolerance level ε > 0, ABC is a
computational procedure that repeatedly executes the fol-
lowing steps:
i) sample U∼ρ;
ii) sample θ∼πindependent of U;
iii) accept the draws if ∥G(U, θ)−y∥ ≤ εand other-
wise reject.
The above accept-reject sampling scheme constructs a
truncated distribution on Υ×Θwith the following den-
sity:
(2) πε(u, θ;y)∝π(θ)ρ(u)I{∥G(u,θ)−y∥≤ε}(u, θ),
in which ∥ · ∥ denotes the ℓ2-norm on the data space Y,
and IAdenotes the indicator function for a set A. Integrat-
ing out uresults in
(3) πε(θ;y)∝π(θ)P{∥G(U, θ)−y∥≤ε|θ}.
Suppose that Yhas an absolutely continuous density
f(y|θ)with respect to the Lebesgue measure on Y, and
that yis in the interior of Y. (3) approximates the poste-
rior
π(θ|y)∝π(θ)f(y|θ),
because
f(y|θ) = lim
ε↓0
P{∥G(U, θ)−y∥≤ε|θ}
λY{w∈Y :∥w−y∥≤ε}
pointwise in θ, where λYdenotes the Lebesgue measure
on Y, and thus P{∥G(U, θ)−y∥ ≤ ε|θ}is approximately
proportional to f(y|θ)when εis small.
It is recognized that a more general definition of ABC is
available in the literature. The accept/rejection step in our
introduction corresponds to the use of a bounded uniform
kernel supported on ℓ2-balls centered around the observed
data; other probabilistic kernels can be used and the corre-
sponding limiting results have been established. Readers
are referred to Beaumont (2019), Marin et al. (2012), and
Sisson, Fan and Beaumont (2018) for more comprehen-
sive surveys of ABC.
2.3 Generalized Fiducial Inference
When prior information about θis absent, we can no
longer sample θ∼πin Step ii) of the ABC recipe. Nev-
ertheless, we are still able to determine whether the im-
puted random component Ucan possibly reproduce the
observed data y(up to the pre-specified tolerance ε). Let
(4) ˆ
θ(y, U ) = arg min
ϑ∈Θ∥G(U, ϑ)−y∥.
The rationale of GFI is to pair each Uwith the parameter
values ˆ
θ(y, U )such that G(U, ˆ
θ(y, U )) gives the closest
approximation to y.2ABC can then be modified into a
Monte Carlo recipe for (approximate) GFI once we re-
place the prior sampling step by setting θto ˆ
θ(y, U )and
leave everything else intact. This modified procedure sim-
ulates from a truncated distribution on Υwith density
(5) ψε(u)∝ρ(u)I{∥G(u,ˆ
θ(y,u))−y∥≤ε}(u),
which further induces a distribution on Θvia the map
ˆ
θ(y, ·).
Hannig et al. (2016) went one step further and defined
the GFD as the weak limit of ˆ
θ(y, U ), wherein Ufollows
(5), as ε↓0. Assuming n=mand several regularity con-
ditions on the DGE (Assumptions A.1–A.4), Hannig et al.
(2016) showed that the density of the GFD can be ex-
pressed as
ψ(θ;y)∝f(y|θ)
·det ∇θG(ˆu(y, θ), θ)⊤∇θG(ˆu(y, θ), θ)1/2
(6)
in which ˆu(y, θ)∈Υsatisfies y=G(ˆu(y, θ), θ), and
∇θG(u, θ)denotes the n×qJacobian matrix of G(u, θ)
with respect to θ.3
(6) conveys an empirical Bayesian interpretation of
GFI—the determinant term on the right-hand side of (6)
can be conceived as a (possibly improper) data-dependent
prior. Therefore, GFI in general does not comply with the
likelihood principle (e.g., Berger,1985, Section 1.6.4).
For instance, Hannig et al. (2016) showed that substitut-
ing the ℓ∞- and ℓ1-norm for the ℓ2-norm in (5) may lead
to fiducial densities different from (6) when n>q. More
discussions on the likelihood principle can be found in
Section 3.3.
2.4 An Illustrative Example
Consider the Gaussian location model Y∼ N(µ, 1)
with the mean parameter µ∈R. For ease of graphical dis-
play, we focus on the transformed parameter θ= Φ(µ)∈
2ˆ
θ(y, u)is assumed to uniquely exist for each u(cf. iii) in Assump-
tion 2).
3The assumed regularity conditions guarantee that ˆu(y, θ)uniquely
exists, and that the Jacobian matrix is defined and of full column rank.
4
(0,1), where Φ(·)denotes the distribution function of
N(0,1). We express the corresponding DGE as
(7) Y= Φ−1(U) + Φ−1(θ),
in which U∼Unif(0,1), and Φ−1is the inverse of Φ(i.e.,
the standard Gaussian quantile function). The observed
data yvalue is fixed at −0.5.
For Bayesian inference, suppose that θfollows a
Unif(0,1) prior, which implies a N(0,1) prior for the
mean µ. It is straightforward to verify that the posterior
density is
(8) π(θ|y) = ϕΦ−1(θ)−1√2ϕ√2(Φ−1(θ)−y/2),
where ϕ(·)stands for the standard Gaussian density. Fol-
lowing the ABC recipe, we simulated Uand θindepen-
dently from Unif(0,1), shown as evenly scattered dots
over Υ×Θ = (0,1)2on the left panel of Figure 1.
With a tolerance ε= 0.05, only (u, θ)⊤pairs that satisfy
|Φ−1(u) + Φ−1(θ)−(−0.5)| ≤ 0.05 (dark gray colored
dots) survive in the accept-reject step. The empirical θ-
marginal distribution of the retained draws closely resem-
bles (8).
Meanwhile, the fiducial density (6) reduces to4
(9) ψ(θ;y) = ϕΦ−1(θ)−1ϕy−Φ−1(θ).
For all u∈(0,1),ˆ
θ(−0.5, u)=Φ(−0.5−Φ−1(u)) en-
sures |Φ−1(u)+Φ−1(ˆ
θ(y, u)) −(−0.5)|= 0. Therefore,
all the imputed u’s are retained regardless of the value of ε
in the simulation-based fiducial recipe. We associate each
uwith θ=ˆ
θ(−0.5, u)and plot (u, θ)⊤on the right panel
of Figure 1. It is observed that the u-marginal distribution
remains uniform, and (9) can be well approximated by the
histogram of θ.
We learn from the aforementioned illustration that, on
the joint space of uand θ, simulation-based Bayesian and
fiducial inferences produce distributions that concentrate
on
(10) G(y) = {(u, θ)⊤∈(0,1)2: Φ−1(u)+Φ−1(θ) = y}
as ε↓0.G(y)collects all the (u, θ)⊤pairs that satisfy the
DGE, i.e., (7) with Y=yand U=u, and is geometri-
cally identified as a one-dimensional smooth submanifold
embedded in (0,1)2(shown as the black solid curve in
Figure 1). Similar characterizations can be established in
a broader class of statistical models for continuous data,
which we explicate in the next section.
4The normalizing constant is 1.
3. GEOMETRY OF POST-DATA INFERENCE
We have seen in our previous discussion that both
the accept-reject ABC and the simulation-based fiducial
recipe involve restricting ambient distributions to regions
whose sizes are controlled by ε(see (2) and (5) for de-
tails). We pay heed to the special case that the regions of
truncation contract to a twice continuously differentiable
submanifold as ε↓0.
3.1 General Constraints
Our first result (Theorem 1) is completely general: It
concerns the weak convergence of a sequence of truncated
distributions to a limit that is supported on an implicitly
defined submanifold. The proof can be found in Appendix
A in the supplementary document.
Let h:X → Rnbe a constraint function, where Xis
an open subset of Rdand d > n. The level set of hat 0
is denoted M={x∈ X :h(x)=0}, and the ε-fattening
of M, where ε > 0, is denoted Mε={x∈ X :∥h(x)∥ ≤
ε}. Write a:X → [0,∞)as an ambient density function.5
Further let Pεbe the truncation of ato Mεand is charac-
terized by the density a(x)IMε(x)/RMεa(x)dx.
ASSUMPTION 1. Suppose that
i) his a twice continuously differentiable submer-
sion, and thus Mis a twice continuously differentiable
submanifold of X, which is equipped with a Riemannian
measure λM;6
ii) ais continuous, λM{supp(a)∩M}>0, and 0<
RMεa(x)dx < ∞for all ε > 0;
iii) the collection of probability measures {Pε:ε > 0}
is tight.
THEOREM 1. Under Assumption 1,Pε⇝P0as ε↓0,
where P0has the following absolutely continuous density
with respect to λM:
(11) f(x) = a(x) det ∇h(x)∇h(x)⊤−1/2
RMa(w) det (∇h(w)∇h(w)⊤)−1/2λM(dw)
for x∈M.
REMARK 1. When a random variable Xfollows a
proper density ain the ambient space X, (11) can also be
deduced as a conditional density of Xgiven h(X)=0
5Although the function ais not necessarily integrable over the en-
tire ambient space X, it is referred to as a density function here: Inte-
grable and non-integrable a’s are respectively termed as improper and
proper densities.
6A submersion is a differentiable map, whose differential is sur-
jective at every x. The Riemannian measure of the submanifold Mis
induced by the Euclidean metric on the ambient space X(Lee,2013,
Chapter 13).
GEOMETRY OF INFERENCE 5
0.2 0.4 0.6 0.8
0.2 0.4 0.6 0.8
𝑢
𝜃
0.2 0.4 0.6 0.8
0.2 0.4 0.6 0.8
𝑢
𝜃
FIG 1. Graphic illustration of the Gaussian location example with y=−0.5. Left: approximate Bayesian computation. Samples of random
components (u) and parameters (θ) are represented as light gray dots in the unit square. (u, θ)⊤pairs that are sufficiently close to the curve
y= Φ−1(u)+Φ−1(θ)are kept and highlighted in dark gray (acceptance rate = 2.64%). The empirical marginal distributions of the retained
samples are displayed as histograms, with the theoretical posterior superimposed on the θ-marginal. Right: fiducial inference. 100% of the imputed
u’s are accepted, and each uis paired with θ= Φ(y−Φ−1(u)). The empirical marginal distributions of the retained samples are displayed as
histograms, with the theoretical fiducial density superimposed on the θ-marginal. Dots fall exactly on the curve but are slightly jittered for clearer
visualization.
(Diaconis, Holmes and Shahshahani,2013, Proposition
2) using the co-area formula (e.g., Chavel,2006, Section
III.8; Federer,1996, Section 3.2.12; Lelievre, Rousset and
Stoltz,2010, Lemma 3.2). In this alternative derivation,
the denominator of (11) is interpreted as the marginal
density of h(X)at 0, which must be finite and positive
(see Diaconis, Holmes and Shahshahani,2013, p. 112).
Specifically, positivity follows from i) and ii) in Assump-
tion 1), and finiteness is a consequence of tightness, i.e.,
Assumption 1iii). Details can be found in the proof of
Theorem 1.
REMARK 2. Theorem 1is inspired by Theorem 3.1 of
Hwang (1980). Hwang’s result was proved for a sequence
of Gibbs measures that concentrate on the minimum of
an energy function. The collection of minimum energy
states, or equivalently the limiting manifold, is required to
be compact, which is restrictive but often suffices for op-
timization purposes in statistical physics. In contrast, our
result applies to sequentially restricting a known ambient
distribution to finer approximations of the data generat-
ing manifold—i.e., sublevel sets of h, which is often not
compact for parametric statistical models.
Assumption 1 iii), i.e., the tightness of the measures
{Pε}, automatically holds if Mεis compact for suffi-
ciently small ε’s. When all sublevel sets of hare non-
compact, however, tightness is determined by the tail be-
havior of the Mε-restricted probability measures {Pε}.
Notably, abeing a proper ambient density alone does not
guarantee tightness. To illustrate this, we present Exam-
ple 1with a two-dimensional ambient space. It is demon-
strated that {Pε}can still be tight when ais improper but
the sublevel set of htapers off quickly along the first co-
ordinate of x(i.e., x1), and that {Pε}may not be tight
when ais proper but the sublevel set of hrapidly expands
as x1grows.
EXAMPLE 1. Let x= (x1, x2)⊤∈(0,∞)2and con-
sider the constraint function
(12) h(x) = x2
g(x1),
in which gis positive on (0,∞). The resulting ε-fattened
level set is
(13) Mε={x∈(0,∞)2:x2≤εg(x1)}.
As ε↓0,Mε↓M={x∈(0,∞)2:x2= 0}.
We first set a(x)≡1and g(x1) = exp(−x2/2) (left
panel of Figure 2). Even though a(x)is not integrable on
the ambient space (0,∞)2,g(x1)is integrable on (0,∞).
摘要:
展开>>
收起<<
SubmittedtoStatisticalScienceAGeometricPerspectiveonBayesianandGeneralizedFiducialInferenceYangLiu,JanHannigandAlexanderC.MurphAbstract.Post-datastatisticalinferenceconcernsmakingprobabilitystate-mentsaboutmodelparametersconditionalonobserveddata.Whenaprioriknowledgeaboutparametersisavailable,post-d...
声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源
价格:10玖币
属性:32 页
大小:1.98MB
格式:PDF
时间:2025-04-24
作者详情
-
Voltage-Controlled High-Bandwidth Terahertz Oscillators Based On Antiferromagnets Mike A. Lund1Davi R. Rodrigues2Karin Everschor-Sitte3and Kjetil M. D. Hals1 1Department of Engineering Sciences University of Agder 4879 Grimstad Norway10 玖币0人下载
-
Voltage-controlled topological interface states for bending waves in soft dielectric phononic crystal plates10 玖币0人下载