A Geometric Perspective on Bayesian and Generalized Fiducial Inference

2025-04-24 1 0 1.98MB 32 页 10玖币

侵权投诉

Submitted to Statistical Science

A Geometric Perspective on Bayesian and

Generalized Fiducial Inference

Yang Liu , Jan Hannig and Alexander C. Murph

Abstract. Post-data statistical inference concerns making probability state-

ments about model parameters conditional on observed data. When a priori

knowledge about parameters is available, post-data inference can be conve-

niently made from Bayesian posteriors. In the absence of prior information,

we may still rely on objective Bayes or generalized ﬁducial inference (GFI).

Inspired by approximate Bayesian computation, we propose a novel charac-

terization of post-data inference with the aid of differential geometry. Under

suitable smoothness conditions, we establish that Bayesian posteriors and

generalized ﬁducial distributions (GFDs) can be respectively characterized

by absolutely continuous distributions supported on the same differentiable

manifold: The manifold is uniquely determined by the observed data and the

data generating equation of the ﬁtted model. Our geometric analysis not only

sheds light on the connection and distinction between Bayesian inference and

GFI, but also allows us to sample from posteriors and GFDs using manifold

Markov chain Monte Carlo algorithms. A repeated measures analysis of vari-

ance example is presented to illustrate the sampling procedure.

Key words and phrases: approximate Bayesian computation, Bayesian infer-

ence, differentiable manifold, generalized ﬁducial inference, Markov chain

Monte Carlo.

1. INTRODUCTION

A post-data probability represents the degree of be-

lief or plausibility that a certain assertion about model

parameters is true given the observed data, which dif-

fers from a classical frequentist (i.e., pre-data) probabil-

ity that is attached to the generative process of the ob-

served data (Dempster,1964;Martin and Liu,2015a).

Post-data statistical inferences are most commonly made

from a Bayesian posterior that is jointly determined by the

prior distribution of model parameters and the likelihood

function of the model (e.g., Gelman et al.,2013, Section

1.3). When little a priori information about parameters

can be garnered, we may still resort to default or weakly

informative priors to make Bayesian inference (Kass and

Yang Liu is an Associate Professor from the Department of

Human Development and Quantitative Methodology at

University of Maryland, College Park, MD, USA (e-mail:

yliu87@umd.edu). Jan Hannig is a Professor from the

Department of Statistics and Operations Research at the

University of North Carolina, Chapel Hill, NC, USA (e-mail:

jan.hannig@unc.edu). Alexander C. Murph is a postdoctoral

researcher from the Computer, Computational, and Statistical

Sciences Division (CCS-6) at the Los Alamos National

Laboratory, Los Alamos, NM, USA (e-mail: murph@lanl.gov).

Wasserman,1996;Berger,2006;Berger, Bernardo and

Sun,2015).

Alternatively, we can avoid prior speciﬁcation alto-

gether and obtain a post-data probability distribution of

parameters by inverting the data generating process. This

idea originated from Fisher’s ﬁducial argument (Fisher,

1925,1930,1933,1935) and motivated the development

of Dempster-Shafer theory (Dempster,1966,1968,2008),

inferential models (Martin and Liu,2013,2015b,a,c),

generalized ﬁducial inference (GFI; Cisewski and Han-

nig,2012;Hannig,2009,2013;Hannig et al.,2016;Lai,

Hannig and Lee,2015;Liu and Hannig,2016,2017;

Murph, Hannig and Williams,2022a;Shi et al.,2021) and

so forth. Among all the descendents of Fisher’s ﬁducial

inference, only GFI is considered in the present paper; the

associated post-data distribution of parameters is referred

to as the generalized ﬁducial distribution (GFD).

A statistical model speciﬁes how data are generated

through a data generating equation (DGE), which is

a function of parameters and random components with

completely known distributions (e.g., uniform or standard

Gaussian variates).1The DGE plays a key role in ap-

proximating post-data inference by simulation (Cranmer,

1A data generating equation may be referred to as a data generat-

ing algorithm (DGA; Murph, Hannig and Williams,2022a) when the

arXiv:2210.05462v3 [math.ST] 1 Oct 2023

Brehmer and Louppe,2020). When a proper prior can be

speciﬁed, we may simulate parameters and random com-

ponents independently, obtain imputed data through the

DGE, and retain the samples if and only if the imputed

and observed data are sufﬁciently close. Such an accept-

reject scheme is often referred to as approximate Bayesian

computation (ABC; e.g., Beaumont,2019;Beaumont,

Zhang and Balding,2002;Beaumont et al.,2009;Blum,

2010;Fearnhead and Prangle,2012;Marin et al.,2012;

Sisson and Fan,2011;Sisson, Fan and Beaumont,2018):

The retained samples of parameters approximately follow

the posterior distribution and hence can be utilized to es-

timate posterior expectations. If no prior distribution is

available, we can still sample random components but not

parameters. To circumvent the latter, GFI proceeds to pair

each realization of random components with the optimal

parameter values such that the resulting imputed data is as

close to the observed data as possible in some sense. In-

deed such a best matching to the observed data may still

not be good enough: Those values are deemed incompat-

ible with the observed data and therefore have to be dis-

carded, leading to a rejection step similar to ABC. It turns

out that the resulting marginal samples of parameters ap-

proximately follow the GFD (Hannig et al.,2016).

It is then natural to ponder what the limits of the trun-

cated distributions are when we request the imputed data

to be inﬁnitesimally close to the observed data in ap-

proximate post-data inference. As the main result of the

present work, we completely characterize the weak limit

for both approximate Bayesian inference and GFI when

the truncation set contracts to a twice continuously dif-

ferentiable submanifold of the joint space of parameters

and random components. We are able to express the ab-

solutely continuous densities of the limiting distributions

with respect to the intrinsic measure of the submanifold,

and show that Bayesian posteriors and GFDs in the usual

sense are the corresponding marginals on the parameter

space (Propositions 1and 2). As a contribution to the lit-

erature of GFI, we derive an explicit formula for the ﬁdu-

cial density in Proposition 2that is more general com-

pared to Theorem 1 of Hannig et al. (2016). Meanwhile,

our work should be distinguished from Murph, Hannig

and Williams (2022b), which also studied the geometry

of GFI but focused on the case when the parameter space

itself is a manifold. On the theoretical side, our geomet-

ric formulation applies to a broad class of parametric sta-

tistical models for continuous data and facilitates insight-

ful comparisons between Bayesian inference and GFI. On

the practical side, the geometric characterization suggests

an alternative sampling scheme for approximate post-data

inference: We apply manifold Markov chain Monte Carlo

generative process rather than the formal mathematical expression is

of interest.

(MCMC) algorithms (e.g., Brubaker, Salzmann and Urta-

sun,2012;Zappa, Holmes-Cerfon and Goodman,2018)

to sample from the limiting distributions on the data gen-

erating manifold and only retain the parameter marginals.

For certain problems (e.g., GFI for mixed-effects mod-

els), manifold MCMC sampling may scale up better than

existing computational procedures.

The rest of the paper is organized as follows. We re-

visit in Section 2the formal deﬁnitions of ABC and GFI;

a graphical illustration is provided using a Gaussian lo-

cation example. In Section 3, we ﬁrst present a general

result (Theorem 1): When an ambient distribution is trun-

cated to a sequence of increasingly ﬁner approximations

to a smooth manifold, the weak limit is absolutely con-

tinuous with respect to the manifold’s intrinsic measure.

We then apply the general result to derive representations

for Bayesian posteriors and GFDs (Propositions 1and 2)

and comment on their discrepancies. We review in Sec-

tion 4an MCMC algorithm that (approximately) samples

from distributions on differentiable manifolds. A repeated

measures analysis of variance (ANOVA) example is then

presented to illustrate the sampling procedure (Section

5). Limitations and possible extensions of the proposed

method are discussed at the end (Section 6).

2. APPROXIMATE INFERENCE BY SIMULATION

2.1 Data Generating Equation

Let Y,Υ, and Θdenote the spaces of data, random

components, and parameters associated with a ﬁxed fam-

ily of parametric models: In particular, Y ⊆Rn,Υ⊆Rm,

and Θ⊆Rq, where n,m, and qare positive integers. Fol-

lowing Hannig et al. (2016), we characterize the model of

interest by its DGE

(1) Y=G(U, θ),

in which the random components U∈Υfollow a com-

pletely known distribution (typically uniform or standard

Gaussian), θ∈Θdenotes the parameters, and Y∈ Y de-

notes the random data. (1) can be conceived as a formal-

ization of the data generating code: Given true parameters

θand an instance of random components U=u, a unique

set of data Y=ycan be imputed by evaluating the DGE,

i.e., y=G(u, θ).

Now suppose that we have observed Y=y. Post-data

inference aims to assign probabilities to assertions about

parameters θconditional on the observed data y(Mar-

tin and Liu,2015c). In the conventional Bayesian frame-

work, we presume that θfollows a proper prior distribu-

tion and make probabilistic statements based on the con-

ditional distribution of θgiven y. When it is difﬁcult to

specify an informative prior, one may still rely on objec-

tive priors that reﬂect paucity of knowledge or informa-

tion (Kass and Wasserman,1996;Berger,2006;Berger,

GEOMETRY OF INFERENCE 3

Bernardo and Sun,2015). We next revisit the deﬁnition

of a Bayesian posterior through the lens of ABC, as well

as Hannig et al.’s (2016) deﬁnition of GFD: The latter

replaces the prior sampling of parameters in ABC by an

optimization problem in the parameter space, which is a

natural workaround when no prior information is avail-

able.

2.2 Approximate Bayesian Computation

Let ρdenote the density of U, and πbe the prior den-

sity of θ; we only restrict to density functions with respect

to the Lebesgue measure and assume that random number

generation from ρand πis feasible. Given the observed

data yand a pre-speciﬁed tolerance level ε > 0, ABC is a

computational procedure that repeatedly executes the fol-

lowing steps:

i) sample U∼ρ;

ii) sample θ∼πindependent of U;

iii) accept the draws if ∥G(U, θ)−y∥ ≤ εand other-

wise reject.

The above accept-reject sampling scheme constructs a

truncated distribution on Υ×Θwith the following den-

sity:

(2) πε(u, θ;y)∝π(θ)ρ(u)I{∥G(u,θ)−y∥≤ε}(u, θ),

in which ∥ · ∥ denotes the ℓ2-norm on the data space Y,

and IAdenotes the indicator function for a set A. Integrat-

ing out uresults in

(3) πε(θ;y)∝π(θ)P{∥G(U, θ)−y∥≤ε|θ}.

Suppose that Yhas an absolutely continuous density

f(y|θ)with respect to the Lebesgue measure on Y, and

that yis in the interior of Y. (3) approximates the poste-

rior

π(θ|y)∝π(θ)f(y|θ),

because

f(y|θ) = lim

ε↓0

P{∥G(U, θ)−y∥≤ε|θ}

λY{w∈Y :∥w−y∥≤ε}

pointwise in θ, where λYdenotes the Lebesgue measure

on Y, and thus P{∥G(U, θ)−y∥ ≤ ε|θ}is approximately

proportional to f(y|θ)when εis small.

It is recognized that a more general deﬁnition of ABC is

available in the literature. The accept/rejection step in our

introduction corresponds to the use of a bounded uniform

kernel supported on ℓ2-balls centered around the observed

data; other probabilistic kernels can be used and the corre-

sponding limiting results have been established. Readers

are referred to Beaumont (2019), Marin et al. (2012), and

Sisson, Fan and Beaumont (2018) for more comprehen-

sive surveys of ABC.

2.3 Generalized Fiducial Inference

When prior information about θis absent, we can no

longer sample θ∼πin Step ii) of the ABC recipe. Nev-

ertheless, we are still able to determine whether the im-

puted random component Ucan possibly reproduce the

observed data y(up to the pre-speciﬁed tolerance ε). Let

(4) ˆ

θ(y, U ) = arg min

ϑ∈Θ∥G(U, ϑ)−y∥.

The rationale of GFI is to pair each Uwith the parameter

values ˆ

θ(y, U )such that G(U, ˆ

θ(y, U )) gives the closest

approximation to y.2ABC can then be modiﬁed into a

Monte Carlo recipe for (approximate) GFI once we re-

place the prior sampling step by setting θto ˆ

θ(y, U )and

leave everything else intact. This modiﬁed procedure sim-

ulates from a truncated distribution on Υwith density

(5) ψε(u)∝ρ(u)I{∥G(u,ˆ

θ(y,u))−y∥≤ε}(u),

which further induces a distribution on Θvia the map

θ(y, ·).

Hannig et al. (2016) went one step further and deﬁned

the GFD as the weak limit of ˆ

θ(y, U ), wherein Ufollows

(5), as ε↓0. Assuming n=mand several regularity con-

ditions on the DGE (Assumptions A.1–A.4), Hannig et al.

(2016) showed that the density of the GFD can be ex-

pressed as

ψ(θ;y)∝f(y|θ)

·det ∇θG(ˆu(y, θ), θ)⊤∇θG(ˆu(y, θ), θ)1/2

(6)

in which ˆu(y, θ)∈Υsatisﬁes y=G(ˆu(y, θ), θ), and

∇θG(u, θ)denotes the n×qJacobian matrix of G(u, θ)

with respect to θ.3

(6) conveys an empirical Bayesian interpretation of

GFI—the determinant term on the right-hand side of (6)

can be conceived as a (possibly improper) data-dependent

prior. Therefore, GFI in general does not comply with the

likelihood principle (e.g., Berger,1985, Section 1.6.4).

For instance, Hannig et al. (2016) showed that substitut-

ing the ℓ∞- and ℓ1-norm for the ℓ2-norm in (5) may lead

to ﬁducial densities different from (6) when n>q. More

discussions on the likelihood principle can be found in

Section 3.3.

2.4 An Illustrative Example

Consider the Gaussian location model Y∼ N(µ, 1)

with the mean parameter µ∈R. For ease of graphical dis-

play, we focus on the transformed parameter θ= Φ(µ)∈

2ˆ

θ(y, u)is assumed to uniquely exist for each u(cf. iii) in Assump-

tion 2).

3The assumed regularity conditions guarantee that ˆu(y, θ)uniquely

exists, and that the Jacobian matrix is deﬁned and of full column rank.

(0,1), where Φ(·)denotes the distribution function of

N(0,1). We express the corresponding DGE as

(7) Y= Φ−1(U) + Φ−1(θ),

in which U∼Unif(0,1), and Φ−1is the inverse of Φ(i.e.,

the standard Gaussian quantile function). The observed

data yvalue is ﬁxed at −0.5.

For Bayesian inference, suppose that θfollows a

Unif(0,1) prior, which implies a N(0,1) prior for the

mean µ. It is straightforward to verify that the posterior

density is

(8) π(θ|y) = ϕΦ−1(θ)−1√2ϕ√2(Φ−1(θ)−y/2),

where ϕ(·)stands for the standard Gaussian density. Fol-

lowing the ABC recipe, we simulated Uand θindepen-

dently from Unif(0,1), shown as evenly scattered dots

over Υ×Θ = (0,1)2on the left panel of Figure 1.

With a tolerance ε= 0.05, only (u, θ)⊤pairs that satisfy

|Φ−1(u) + Φ−1(θ)−(−0.5)| ≤ 0.05 (dark gray colored

dots) survive in the accept-reject step. The empirical θ-

marginal distribution of the retained draws closely resem-

bles (8).

Meanwhile, the ﬁducial density (6) reduces to4

(9) ψ(θ;y) = ϕΦ−1(θ)−1ϕy−Φ−1(θ).

For all u∈(0,1),ˆ

θ(−0.5, u)=Φ(−0.5−Φ−1(u)) en-

sures |Φ−1(u)+Φ−1(ˆ

θ(y, u)) −(−0.5)|= 0. Therefore,

all the imputed u’s are retained regardless of the value of ε

in the simulation-based ﬁducial recipe. We associate each

uwith θ=ˆ

θ(−0.5, u)and plot (u, θ)⊤on the right panel

of Figure 1. It is observed that the u-marginal distribution

remains uniform, and (9) can be well approximated by the

histogram of θ.

We learn from the aforementioned illustration that, on

the joint space of uand θ, simulation-based Bayesian and

ﬁducial inferences produce distributions that concentrate

(10) G(y) = {(u, θ)⊤∈(0,1)2: Φ−1(u)+Φ−1(θ) = y}

as ε↓0.G(y)collects all the (u, θ)⊤pairs that satisfy the

DGE, i.e., (7) with Y=yand U=u, and is geometri-

cally identiﬁed as a one-dimensional smooth submanifold

embedded in (0,1)2(shown as the black solid curve in

Figure 1). Similar characterizations can be established in

a broader class of statistical models for continuous data,

which we explicate in the next section.

4The normalizing constant is 1.

3. GEOMETRY OF POST-DATA INFERENCE

We have seen in our previous discussion that both

the accept-reject ABC and the simulation-based ﬁducial

recipe involve restricting ambient distributions to regions

whose sizes are controlled by ε(see (2) and (5) for de-

tails). We pay heed to the special case that the regions of

truncation contract to a twice continuously differentiable

submanifold as ε↓0.

3.1 General Constraints

Our ﬁrst result (Theorem 1) is completely general: It

concerns the weak convergence of a sequence of truncated

distributions to a limit that is supported on an implicitly

deﬁned submanifold. The proof can be found in Appendix

A in the supplementary document.

Let h:X → Rnbe a constraint function, where Xis

an open subset of Rdand d > n. The level set of hat 0

is denoted M={x∈ X :h(x)=0}, and the ε-fattening

of M, where ε > 0, is denoted Mε={x∈ X :∥h(x)∥ ≤

ε}. Write a:X → [0,∞)as an ambient density function.5

Further let Pεbe the truncation of ato Mεand is charac-

terized by the density a(x)IMε(x)/RMεa(x)dx.

ASSUMPTION 1. Suppose that

i) his a twice continuously differentiable submer-

sion, and thus Mis a twice continuously differentiable

submanifold of X, which is equipped with a Riemannian

measure λM;6

ii) ais continuous, λM{supp(a)∩M}>0, and 0<

RMεa(x)dx < ∞for all ε > 0;

iii) the collection of probability measures {Pε:ε > 0}

is tight.

THEOREM 1. Under Assumption 1,Pε⇝P0as ε↓0,

where P0has the following absolutely continuous density

with respect to λM:

(11) f(x) = a(x) det ∇h(x)∇h(x)⊤−1/2

RMa(w) det (∇h(w)∇h(w)⊤)−1/2λM(dw)

for x∈M.

REMARK 1. When a random variable Xfollows a

proper density ain the ambient space X, (11) can also be

deduced as a conditional density of Xgiven h(X)=0

5Although the function ais not necessarily integrable over the en-

tire ambient space X, it is referred to as a density function here: Inte-

grable and non-integrable a’s are respectively termed as improper and

proper densities.

6A submersion is a differentiable map, whose differential is sur-

jective at every x. The Riemannian measure of the submanifold Mis

induced by the Euclidean metric on the ambient space X(Lee,2013,

Chapter 13).

GEOMETRY OF INFERENCE 5

0.2 0.4 0.6 0.8

𝑢

𝜃

0.2 0.4 0.6 0.8

𝑢

𝜃

FIG 1. Graphic illustration of the Gaussian location example with y=−0.5. Left: approximate Bayesian computation. Samples of random

components (u) and parameters (θ) are represented as light gray dots in the unit square. (u, θ)⊤pairs that are sufﬁciently close to the curve

y= Φ−1(u)+Φ−1(θ)are kept and highlighted in dark gray (acceptance rate = 2.64%). The empirical marginal distributions of the retained

samples are displayed as histograms, with the theoretical posterior superimposed on the θ-marginal. Right: ﬁducial inference. 100% of the imputed

u’s are accepted, and each uis paired with θ= Φ(y−Φ−1(u)). The empirical marginal distributions of the retained samples are displayed as

histograms, with the theoretical ﬁducial density superimposed on the θ-marginal. Dots fall exactly on the curve but are slightly jittered for clearer

visualization.

(Diaconis, Holmes and Shahshahani,2013, Proposition

2) using the co-area formula (e.g., Chavel,2006, Section

III.8; Federer,1996, Section 3.2.12; Lelievre, Rousset and

Stoltz,2010, Lemma 3.2). In this alternative derivation,

the denominator of (11) is interpreted as the marginal

density of h(X)at 0, which must be ﬁnite and positive

(see Diaconis, Holmes and Shahshahani,2013, p. 112).

Speciﬁcally, positivity follows from i) and ii) in Assump-

tion 1), and ﬁniteness is a consequence of tightness, i.e.,

Assumption 1iii). Details can be found in the proof of

Theorem 1.

REMARK 2. Theorem 1is inspired by Theorem 3.1 of

Hwang (1980). Hwang’s result was proved for a sequence

of Gibbs measures that concentrate on the minimum of

an energy function. The collection of minimum energy

states, or equivalently the limiting manifold, is required to

be compact, which is restrictive but often sufﬁces for op-

timization purposes in statistical physics. In contrast, our

result applies to sequentially restricting a known ambient

distribution to ﬁner approximations of the data generat-

ing manifold—i.e., sublevel sets of h, which is often not

compact for parametric statistical models.

Assumption 1 iii), i.e., the tightness of the measures

{Pε}, automatically holds if Mεis compact for sufﬁ-

ciently small ε’s. When all sublevel sets of hare non-

compact, however, tightness is determined by the tail be-

havior of the Mε-restricted probability measures {Pε}.

Notably, abeing a proper ambient density alone does not

guarantee tightness. To illustrate this, we present Exam-

ple 1with a two-dimensional ambient space. It is demon-

strated that {Pε}can still be tight when ais improper but

the sublevel set of htapers off quickly along the ﬁrst co-

ordinate of x(i.e., x1), and that {Pε}may not be tight

when ais proper but the sublevel set of hrapidly expands

as x1grows.

EXAMPLE 1. Let x= (x1, x2)⊤∈(0,∞)2and con-

sider the constraint function

(12) h(x) = x2

g(x1),

in which gis positive on (0,∞). The resulting ε-fattened

level set is

(13) Mε={x∈(0,∞)2:x2≤εg(x1)}.

As ε↓0,Mε↓M={x∈(0,∞)2:x2= 0}.

We ﬁrst set a(x)≡1and g(x1) = exp(−x2/2) (left

panel of Figure 2). Even though a(x)is not integrable on

the ambient space (0,∞)2,g(x1)is integrable on (0,∞).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SubmittedtoStatisticalScienceAGeometricPerspectiveonBayesianandGeneralizedFiducialInferenceYangLiu,JanHannigandAlexanderC.MurphAbstract.Post-datastatisticalinferenceconcernsmakingprobabilitystate-mentsaboutmodelparametersconditionalonobserveddata.Whenaprioriknowledgeaboutparametersisavailable,post-d...

展开>> 收起<<

A Geometric Perspective on Bayesian and Generalized Fiducial Inference.pdf

共32页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Geometric Perspective on Bayesian and Generalized Fiducial Inference

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: