Distance-to-Set Priors and Constrained Bayesian Inference Rick Presman

2025-05-03 0 0 795.67KB 33 页 10玖币
侵权投诉
Distance-to-Set Priors and Constrained Bayesian
Inference
Rick Presman
Department of Statistical Science, Duke University
Jason Xu
Department of Statistical Science, Duke University
October 25, 2022
Abstract
Constrained learning is prevalent in many statistical tasks. Recent work proposes
distance-to-set penalties to derive estimators under general constraints that can be
specified as sets, but focuses on obtaining point estimates that do not come with
corresponding measures of uncertainty. To remedy this, we approach distance-to-set
regularization from a Bayesian lens. We consider a class of smooth distance-to-set
priors, showing that they yield well-defined posteriors toward quantifying uncertainty
for constrained learning problems. We discuss relationships and advantages over prior
work on Bayesian constraint relaxation. Moreover, we prove that our approach is
optimal in an information geometric-sense for finite penalty parameters
ρ
, and enjoys
favorable statistical properties when
ρ→ ∞
. The method is designed to perform
effectively within gradient-based MCMC samplers, as illustrated on a suite of simulated
and real data applications.
1
arXiv:2210.12258v1 [stat.ME] 21 Oct 2022
1 Introduction
Constrained learning is ubiquitous in statistical tasks when seeking to impose desired structure
on solutions. Concretely, consider the task of estimating a parameter
xRd
by minimizing
some loss function
f
(
x
) where
x
needs to satisfy a set of constraints encoded by a set
C
.
Then we seek:
min
xf(x) s.t. x∈ C (1)
A simple but powerful observation that will make this amenable to effective algorithms
is to equivalently express the restriction in terms of the Euclidean distance between the
point
x
and the constraint set as
dist
(
x,C
) = 0. In many instances, it is enough to only
approximately satisfy the constraints. A recent framework that accomplishes this kind of
constraint relaxation is known as the distance-to-set penalization [Chi et al., 2014, Xu et al.,
2017]: for ρ(0,),
xargminxhf(x) + ρ
2dist(x,C)2i.
Solutions to this problem can be obtained using a majorization-minimization (MM) scheme
known as the proximal distance algorithm [Keys et al., 2019], and it is so called because
the iterative updates are defined via proximal operators [Parikh and Boyd, 2014]. However,
despite its ability to deliver point estimates effectively, it is very difficult to derive measures
of uncertainty, and so a general theory of inference is difficult to obtain. Toward filling this
methodological gap, we recast the optimization problem in a constrained Bayesian setting by
an analog of these penalties that we term distance-to-set priors.
Our approach draws previously unexplored connections between this optimization framework
and the broader constrained Bayesian inference literature [Ghosh, 1992, Gramacy et al.,
2016], an area that continues to grow with exciting recent ideas. We focus on a tradition
of sampling through gradient-based samplers such as Hamiltonian Monte Carlo, or HMC
[Neal et al., 2011, Betancourt and Girolami, 2015]. Lan et al. [2014] utilize a spherical
HMC, mapping constraints that can be written as norms onto the hypersphere. Related
recent work uses a Riemannian HMC under a manifold setup [Kook et al., 2022], extending a
line of work pioneered by Byrne and Girolami [2013]. Duan et al. [2020] replace a support
constraint with a term that decays exponentially outside of the support, while Sen et al. [2018]
project sample draws from unconstrained posteriors to the constraint set to approximate the
2
original posterior. Recently, Xu et al. [2021] propose using priors based on proximal mappings
related to the constraint sets. Concurrent work in Zhou et al. [2022] propose using a the
Moreau-Yosida envelope more generally, using a class of epigraph priors toward regularized
and constrained problems suited for proximal MCMC [Pereyra, 2016].
Distance-to-set priors extend this line of inquiry, providing an effective, practical way to
consider constrained inference problems. The framework is more general than many of the
previous methods in that it essentially only requires that the constraint can be written as
a set, and that projection onto that set is feasible. These priors are then easy to evaluate,
and work well within gradient-based samplers due to our smooth formulation. This improves
computational stability under posterior sampling algorithms such as HMC, as we investigate
in an empirical study. Moreover from a theoretical perspective, this class of priors admits
posteriors that converge in distribution to the original constrained problem along with their
maximum a posteriori (MAP) estimates as we increase the parameter governing the degree of
constraint enforcement. Finally, we draw a connection between Bayesian constraint relaxation
and information geometry, revealing how distance-to-set priors are optimal in a certain sense,
while simultaneously yielding a way to select the regularization parameter ρsystematically.
2 Theory and Methods
We begin by briefly reviewing distance-to-set penalties and some of their key properties.
Distance-to-Set Penalties
Let
C Rn
be convex, and let
f
:
C R
be a convex function.
Many constrained programming problems of the form
(1)
may be intractable in their original
form, but can be converted to a sequence of simpler subproblems. To make progress, denote
the Euclidean distance from any
xRn
to
C
by
dist
(
x,C
) :=
infykxyk2
: then the condition
x∈ C
can be equivalently written as
dist
(
x, C
) = 0. Note that while the distance operator
is not necessarily smooth, its square is differentiable as long as the projection of
x
onto
C
, denoted
PC
(
x
) :=
arg miny∈C kxyk2
, is single-valued (Lange [2016]). Thus, we may
reformulate the problem by instead considering the smooth unconstrained optimization task:
min
xhf(x) + ρ
2dist(x,C)2i,
3
where
ρ >
0 is a penalty parameter. To solve the resulting problem, Lange [2016] propose a
method termed the proximal distance algorithm which makes use of the MM principle to
create surrogate functions based on distance majorization [Chi et al., 2014]. Its namesake
derives from the fact that the minimization of the surrogate functions
gρ(x|xk) = f(x) + ρ
2kxPC(xk)k2
is related to the proximal operator of
f
(Parikh and Boyd [2014]): recall for a function
f
, the
proximal mapping with parameter λis defined
proxλf (y)argmin
xhf(x) + 1
2λkxyk2
2i,
which relates to our problem with
y
the projection at iterate
k
and
λ
=
ρ1
. Under this
formulation, to recover the solution to the original optimization problem, it is necessary for
ρ→ ∞
at some appropriate rate (Wright et al. [1999]). Conversely, fixing a finite
ρ
results
in a solution where
x
is close to
C
, but not strictly inside of the set. Both cases may be of
interest depending on the modeling context. We will discuss primarily the latter in this paper
but also establish theoretical relationships to the former. As our primary setting is statistical,
we may think of f(x) as a convex loss function.
2.1 Distance-to-Set Priors
The proximal distance algorithm mentioned above provides a method for obtaining point
estimates under distance-to-set penalization. However, to the best of our knowledge, the
current literature does not provide results pertaining to uncertainty quantification for these
estimators. Toward understanding their uncertainty properties, our first contribution is
to link these ideas to a Bayesian constraint relaxation framework. Identifying a penalized
estimation problem with a Bayesian problem has been done at least as early as the seminal
LASSO paper [Tibshirani, 1996]. Consider data
y|θRn
that has likelihood
L
(
θ|y
)
and is parameterized by some parameter
θ
with prior
π
(
θ
) that is absolutely continuous
with respect to Lebesgue measure, with support
Rd
but constrained to
ΘRd
. Since
θ
is
constrained to Θ, Bayes’ Theorem gives the posterior for θ:
π(θ|y)L(θ|y)π(θ)1θΘ
L(θ|y)π(θ)1dist(θ,Θ)=0
4
where the second line follows from the discussion of distance-to-set penalties. Since sampling
from a posterior sharply constrained on
Θ
may be difficult, we can replace the indicator
representing the constraint with
exp ρ
2dist(θ,Θ)2
. An illustration is given in Figure 1.
This term is equal to the indicator on
Θ
and rapidly decays to zero as the distance from
θ
to
Θ
grows larger. We choose to square the distance-to-set operator to align with the
distance-to-set optimization and to improve sampling performance, which we discuss in
Section 2.4.
321 1 2 3
0.5
0.5
1
1.5
Figure 1: Schematic of distance-to-set prior for the set
Θ
= [
1
,
1]. The blue dashed line
represents the indicator
1Θ
, and the red solid line represents the relaxation of
1Θ
that we
consider in this paper.
We propose to use distance-to-set priors, defined as follows:
eπ(θ) := π(θ) exp ρ
2dist(θ,Θ)2,
where
ρ >
0 is a hyperparameter in our treatment. To bridge this with the optimization
setting, we can define
f
(
θ
) :=
log
(
L
(
θ|y
)
π
(
θ
)). However, it should be noted that although
every likelihood function gives us a loss by taking the negative-log, the converse is not always
true. Thus, one could generalize our approach by considering more general loss functions and
incorporating them into the Bayesian framework via Gibbs posteriors [Bissiri et al., 2016]; see
also Jacob et al. [2017]. Ignoring the constraint for a moment, we obtained the unconstrained
posterior:
π(θ|y)L(θ|y)π(θ).(2)
5
摘要:

Distance-to-SetPriorsandConstrainedBayesianInferenceRickPresmanDepartmentofStatisticalScience,DukeUniversityJasonXuDepartmentofStatisticalScience,DukeUniversityOctober25,2022AbstractConstrainedlearningisprevalentinmanystatisticaltasks.Recentworkproposesdistance-to-setpenaltiestoderiveestimatorsunder...

展开>> 收起<<
Distance-to-Set Priors and Constrained Bayesian Inference Rick Presman.pdf

共33页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:33 页 大小:795.67KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 33
客服
关注