Generalized Bayes Approach to Inverse Problems with Model Misspecification Youngsoo Baek1 Wilkins Aquino2 Sayan Mukherjee1 3 4 5

2025-05-06 0 0 1.12MB 26 页 10玖币
侵权投诉
Generalized Bayes Approach to Inverse Problems
with Model Misspecification
Youngsoo Baek1, Wilkins Aquino2, Sayan Mukherjee1 3 4 5
1Department of Statistical Science, Duke University, NC
Department of Mechanical Engineering and Materials Science, Duke University, NC
3Department of Mathematics, Computer Science, Biostatistics & Bioinformatics, Duke
University, NC
Center for Scalable Data Analytics and Artificial Intelligence, Universit¨at Leipzig
Max Planck Institute for Mathematics in the Sciences, Leipzig
Abstract. We propose a general framework for obtaining probabilistic solutions to PDE-
based inverse problems. Bayesian methods are attractive for uncertainty quantification but
assume knowledge of the likelihood model or data generation process. This assumption is
difficult to justify in many inverse problems, where the specification of the data generation
process is not obvious. We adopt a Gibbs posterior framework that directly posits a
regularized variational problem on the space of probability distributions of the parameter.
We propose a novel model comparison framework that evaluates the optimality of a given loss
based on its “predictive performance”. We provide cross-validation procedures to calibrate
the regularization parameter of the variational objective and compare multiple loss functions.
Some novel theoretical properties of Gibbs posteriors are also presented. We illustrate
the utility of our framework via a simulated example, motivated by dispersion-based wave
models used to characterize arterial vessels in ultrasound vibrometry.
1. Introduction
Quantification of uncertainty in the context of inverse problems is increasingly demanded
by many applications [Stuart,2010]. Bayesian statistics provides a useful viewpoint for this
demand [Cotter et al.,2009]. In a Bayesian framework, one prescribes a prior distribution
summarizing relative uncertainties about possible solutions to the inverse problem. After
observing noisy data, one updates the probabilities to obtain a posterior distribution of
the possible solutions. A fundamental component of a Bayesian formulation is the data-
generating process or likelihood. Specification of a likelihood is often invoked as a necessary
condition to guarantee theoretical properties of the posterior. However, it is difficult to
specify the data-generating process in nonlinear inverse problems due to two main sources
of model uncertainty: forward model uncertainty, with respect to the underlying system
dynamics; uncertainty, or lack of knowledge, with respect to the distribution of noise. The
possibility of model misspecification raises a serious concern about using Bayesian methods.
In this paper, we propose to solve inverse problems using an alternative, Gibbs posterior or
Generalized Bayes framework, proposed by Jiang and Tanner [2008], Bissiri et al. [2016],
Dunlop and Yang [2021], Zou et al. [2019]. The framework similarly requires a prior
distribution and outputs a probability update conditional on the data. Gibbs posteriors
do not rely on the knowledge of likelihood. They are derived as a solution to a variational
problem on the space of probability measures on the space of solutions. They require the
choice of a loss function that measures the mismatch between the model and the data.
To use the Gibbs/generalized Bayes approach to solve inverse problems, several
questions need to be addressed. Without knowledge of the underlying data-generating
mechanism, how can we make a good choice of loss? In the variational objective that needs
to be minimized, how do we determine the regularization parameter? The regularization
parameter plays a vital role in balancing the trade-off between fidelity to the observed data
and exploiting prior information. Finally, is the variational problem well-posed?
arXiv:2210.06921v3 [stat.ME] 26 Sep 2023
Generalized Bayes Approach to Inverse Problems with Model Misspecification 2
1.1. Contributions
The main contributions of this paper are the following:
1 A theory of model comparison for Gibbs posteriors that enables model comparison for
loss functions. We define a notion of “predictive performance” for Gibbs posterior and
study its theoretical properties.
2 We develop a particle filter and importance sampling method to simultaneously sample
from the underlying Gibbs posterior and calibrate the regularization parameter that
balances the loss function and regularization with respect to the prior. Our calibration
procedure minimizes a novel leave-one-out cross-validation (LOOCV) objective. Due
to the distributional nature of the solution, existing cross-validation algorithms are not
immediately applicable.
3 We prove the stability and consistency of Gibbs posteriors. We show the continuity
of Gibbs posterior as a mapping of the data in various distances for probability
distributions. Our proposed upper bound improves on existing upper bounds that are
vacuous when the perturbation to the data is large. We also study the asymptotic
behavior of Gibbs posteriors in the large sample limit. The technical aspects of a
consistency proof rely on tools in the robust Bayes estimation literature. We also study
the asymptotics of a predictive distribution used for model selection associated with the
Gibbs posterior.
1.2. Prior work
The relation between the regularized least-squares problem proposed by Tikhonov and
Arsenin [1977] and the maximum a posteriori (MAP) estimation problem in Bayesian
statistics has been known for some time. Bayesian methods for inverse problems have been
successfully adopted in diverse domains, nicely summarized by Kaipio and Somersalo [2005].
Recent literature [Cotter et al.,2009,Stuart,2010,Cotter et al.,2013] has extended the
Bayesian framework with Gaussian likelihood to infinite-dimensional settings. The Gibbs
posterior framework [Bissiri et al.,2016,Jiang and Tanner,2008,Martin et al.,2017] is not
new, and its application in inverse problems was studied by Zou et al. [2019], Dunlop and
Yang [2021]. Similar concepts have been studied by Gr¨unwald and Langford [2007], Gr¨unwald
and van Ommen [2017], Miller and Dunson [2019], Bhattacharya et al. [2019], among others,
for improving the robustness of Bayesian inference under model misspecification. The novel
model selection theory we develop in this paper can be viewed as an analog of the theory
of Bayesian model selection and Bayesian cross-validation under model misspecification
[Bernardo and Smith,2009]. Computationally, we rely on sequential Monte Carlo and
particle filters algorithms. These algorithms have gained recent attention for potential use in
Bayesian inverse problems. [Kantas et al.,2014,Beskos et al.,2015] have used particle filters
to solve parabolic and elliptic inverse problems. Zou et al. [2019] have proposed a combination
of particle filter and reduced order models for improved computational efficiency.
A vast amount of literature exists on quantifying uncertainty in inverse problems.
We place our method in context with previous ideas. Our variational formulation shares
similarities with variational Bayes methods used in nonlinear inverse problems [Franck
and Koutsourelakis,2016] and stochastic design [Koutsourelakis,2016]. In these works,
an objective involving a complex posterior distribution is minimized under the constraint
that the approximating distribution is easy to sample from. In contrast, we use the
variational problem to define the distribution of interest. Second, when the likelihood is
intractable, approximate Bayesian computation (ABC) has been proposed as a viable method
of approximating intractable likelihoods [Lyne et al.,2015,Zeng et al.,2019]. However, these
procedures can be computationally costly and do not address model misspecification.
Finally, several methods have been proposed for solving stochastic inverse problems and
nonparametric probability measure estimation. Gradient-based optimization methods have
been used to solve stochastic inverse problems [Narayanan and Zabaras,2004,Borggaard and
Generalized Bayes Approach to Inverse Problems with Model Misspecification 3
Van Wyk,2015,Warner et al.,2015]. When the unknown parameter itself is a probability
distribution, Banks et al. [2015], Banks and Thompson [2015] have proposed minimizing
a discretized objective stated in terms of the Prohorov metric. An exciting avenue we do
not pursue in this work is to investigate possible connections between these non-Bayesian
approaches and the Bayes/Gibbs posterior frameworks.
1.3. Outline of the Paper
Section 2presents the foundations for the Gibbs posterior framework in the setup of inverse
problems with model uncertainty. In Section 3, we offer results on stability and asymptotic
properties of Gibbs posteriors. We also present our novel contributions to model selection for
different loss functions that are not intrinsically comparable to each other. In Section 4, we
describe a Monte Carlo algorithm that simultaneously learns the regularization parameter
based on the LOOCV criterion and samples from the underlying Gibbs posterior. The
algorithm is novel and relies on recent advances in particle filtering and importance sampling
for Bayesian LOOCV. Section 5presents numerical experiments illustrating the benefits of
our approach. We conclude the paper with a discussion of future directions in Section 6. All
proofs are collected in the Appendix.
2. Gibbs Posterior with Model Selection
We review the foundations for the Gibbs posterior framework and describe properties of
the Gibbs posterior proposed by Bissiri et al. [2016]. Section 2.4 describes the problem
of model comparison and our original contribution of predictive model selection theory for
Gibbs posteriors.
2.1. Notations
We fix some notation used throughout the text. We denote as || · || the norm in an Euclidean
space Rm. We write as ∆(X) the space of all probability distributions on X Rm, assuming
standard Borel σ-algebras. For two probability measures µ, ν ∆(X), dT V (µ, ν), dH(µ, ν),
and DKL(µ, ν) denote the total variation metric, Hellinger metric, and Kullback-Leibler (KL)
divergence [Gibbs and Su,2002], respectively. For two probability measures µ, ν (possibly
on different spaces X1and X2), we denote by µνtheir product measure. For a probability
measure µ∆(X), Lq(X;µ) is the space of all functions f:X Rthat are Lq-integrable
with respect to µ, where q[1,].
2.2. Parametric Inverse Problems with Model Uncertainty
Throughout this paper, we assume observing ni.i.d. variables that take values in Y Rd
with an unknown probability distribution P:
yi
iid
PPF(θ0).(1)
Here, the parameter θ0is a physically meaningful parameter that characterizes the observed
system. The parameter-to-observation map F(θ) is often defined in relation to the
parameterized PDE model:
M(u(θ); θ)=0, u(θ)∈ U,M:U → V.(2)
where U,Vare Hilbert spaces with Vbeing the dual space of V. We assume for every θ
there exists a unique u(θ) that satisfies (2). The parameter-to-observation map is defined as
F(θ) := Du(θ), where Dis the observation operator.
In the classical Bayesian framework, the parameterization of the sampling distribution
(1) by the forward model (2) is known. Examples include the additive white noise model
[Knapik et al.,2011] and Poisson likelihood [Barmherzig and Sun,2022]. However, in
practice, this need not be true because either the hypothesized parameterization is incorrect,
Generalized Bayes Approach to Inverse Problems with Model Misspecification 4
or the model uncertainties are so large that such a parameterization is difficult. Both
errors in the forward model and errors in the noise distribution contribute to an incorrect
parameterization of the likelihood. There are many ways in which such a mismatch can arise.
A concrete example in ultrasound vibrometry application is reviewed in Section 5.2. Here,
we only mention that both philosophical and asymptotic justification of Bayesian inference
is more tenuous with model misspecification. While the modeler may use a “surrogate
likelihood” to define a misspecified Bayes posterior and argue that one can obtain good
approximations when the surrogate is “close” to Pθ, defining this “closeness” in nonlinear
inverse problems is not trivial.
In the next Section, we review a variational formulation that bypasses these difficulties.
Instead of trying to define the correctly parameterized Qθ=Pθ, the variational perspective
defines a discrepancy between the posited forward model and the observed data. The relative
weights given to possible parameters θare higher if they yield smaller discrepancy or loss.
2.3. Variational Framework for Gibbs Posteriors
Let L: Θ×Rdbe a loss function. Let ρ0∆(Θ). We propose to solve the problem proposed
by Bissiri et al. [2016]:
ˆρW
n() := arg min
ρ∆(Θ) "RW(ρ) = ZΘ
1
n
n
X
i=1
L(θ, yi)ρ() + 1
nW DKL(ρ||ρ0)#.(3)
Here, ρ0is the distribution quantifying our prior and W > 0 is a regularization parameter
that we assume is given for now. If ρis not absolutely continuous with respect to ρ0, the
divergence is defined to be +. Often we will abbreviate the average loss over all data by
Rn(θ) := 1
nPn
i=1 L(θ, yi).
To ensure the existence of a solution, we will make assumptions on the structure of the
problem, motivated by the assumptions of Cotter et al. [2009] and Stuart [2010].
Assumption 1. Let the loss L(θ, y) have the form l(F(θ), y) and satisfy the following.
(i) L(θ, y) is uniformly bounded from below:
inf
θ,y L(θ, y)B > −∞.
We assume B= 0 without loss of generality.
(ii) For every θ, y there exists KK(||θ||,||y||)L1× Y;ρ0P) such that L(θ, y)
K(||θ||,||y||).
(iii) For every r > 0 there exists C1(r, y)>0 such that whenever ||θ1||,||θ2|| < r,
|L(θ1, y)L(θ2, y)| ≤ C1(r, y)||θ1θ2||
with C1(y)C1(r, y)L1(Y;P).
(iv) For every r > 0 there exists C2(r, θ)>0 such that whenever ||y1||,||y2|| < r,
|L(θ, y1)L(θ, y2)| ≤ C2(r, θ)||y1y2||
with exp(C2(θ)) C2(r, θ)L1(Θ; ρ0).
Remark 1. Note that because Lis defined to be a mapping on Θ× Y, the regularity
assumptions implicitly place restrictions on the forward model F. We will use the squared
2loss as an example to understand the regularity conditions
L(θ, y)l(F(θ), y) = ||y− F(θ)||2.
The properties of Fdictate whether the loss Lsatisfies the assumptions. Various PDE-based
models used in the literature, combined with the popular Gaussian prior distribution, satisfy
assumptions iand iv for squared 2loss; see, e.g., Section 3, Stuart [2010]. On the other
hand, the integrability conditions ii and iii depend on the unknown P. One can check how
mild or severe these conditions turn are for a specific loss function, by hypothesizing models
like (20), without specifying a likelihood.
Generalized Bayes Approach to Inverse Problems with Model Misspecification 5
We will also make a mild smoothness assumption on the density of the prior distribution
ρ0. The Gaussian prior satisfies the assumption.
Assumption 2. ρ0has positive density everywhere. Furthermore, for every r > 0 there
exists C3(r)>0 such that whenever ||θ1||,||θ2|| < r,
|log ρ0(θ1)log ρ0(θ2)| ≤ C3(r)||θ1θ2||.
There exists a unique solution to (3) in ∆(Θ), which has the following density for fixed
W > 0:
ˆρW
n() := exp{−nW Rn(θ)}ρ0()
ZW
n
,(4)
where the normalizing constant, or the “partition function” ZW
n, is defined as
ZW
nZenW R(θ)ρ0().(5)
To derive this formula, the objective functional can be rewritten as
RW(ρ) = 1
nW DKL(ρ||ˆρW
n)log ZW
n; (6)
The first term is non-negative and uniquely attains zero at ρˆρ. The second term does
not depend on ρ, so the minimum of the functional is achieved at RW(ˆρW
n) = log ZW
n.
The possible technical issues are measurability of exp(nW Rn(θ)) and finiteness of log ZW
n,
which follow from Assumption 1.
Remark 2. When we fix W= 1 and choose the loss to be the negative log-likelihood:
L(θ, y) = log p(y|F(θ)), the Gibbs posterior coincides with a Bayes posterior update, using
p(y|F(θ)) as its likelihood component. Thus, our Gibbs posterior solution strictly generalizes
the Bayes posterior.
We close the Section with some intuition of the role of Win the Gibbs posterior. In
the limit W0, ˆρW
nρ0, so there is no update of information from the prior. Smaller
Wthus more heavily weighs prior information. In the limit W→ ∞, ˆρW
nconcentrates
on a set of θs minimizing the loss over the observed data. Larger Wthus more heavily
weighs information from the data and leads to increased sensitivity to perturbation under
noise. Intuition suggests that a prior ρ0that is strictly positive on Θ should reflect large
uncertainty and that Wmust be carefully chosen based on the amount of information in the
data relative to the prior. Inspection of (3) suggests that we are implicitly implementing a
discrepancy principle [Nair,2009] since the divergence penalty has less influence when the
sample size is large.
2.4. Extension to Model Selection
2.4.1. Predictive Model Selection Solving the variational problem (3) still requires a pre-
specified choice of loss L. It may appear this requirement is as restrictive as positing the
generating process as a better choice of loss hinges on knowledge or assumptions on P. Our
first proposal is to define a valid way to compare two different losses without requiring
knowledge of the data-generating mechanism. The key idea is to compare them based on the
ability to make accurate predictions, measuring their discrepancy on a future observation.
As mentioned, this principle is not new and has been used to improve the robustness of
Bayesian model prediction and model checking. The novelty lies in the definition of the
predictive density without assuming the likelihood, which will serve as a natural discrepancy
measure between the new observation and prediction.
Consider a common prior distribution ρ0and multiple competing losses, L1, . . . , Lk,
defined on subsets Θ1,...,Θkof Θ. Given the corresponding set of Gibbs posteriors
ˆρW1
n,1,...,ˆρWk
n,k , we propose the following predictive model comparison principle: map each
摘要:

GeneralizedBayesApproachtoInverseProblemswithModelMisspecificationYoungsooBaek1,WilkinsAquino2,SayanMukherjee13451DepartmentofStatisticalScience,DukeUniversity,NCDepartmentofMechanicalEngineeringandMaterialsScience,DukeUniversity,NC3DepartmentofMathematics,ComputerScience,Biostatistics&Bioinformatic...

展开>> 收起<<
Generalized Bayes Approach to Inverse Problems with Model Misspecification Youngsoo Baek1 Wilkins Aquino2 Sayan Mukherjee1 3 4 5.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:1.12MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注