Distance-to-Set Priors and Constrained Bayesian Inference Rick Presman

2025-05-03 0 0 795.67KB 33 页 10玖币

侵权投诉

Distance-to-Set Priors and Constrained Bayesian

Inference

Rick Presman

Department of Statistical Science, Duke University

Jason Xu

Department of Statistical Science, Duke University

October 25, 2022

Abstract

Constrained learning is prevalent in many statistical tasks. Recent work proposes

distance-to-set penalties to derive estimators under general constraints that can be

speciﬁed as sets, but focuses on obtaining point estimates that do not come with

corresponding measures of uncertainty. To remedy this, we approach distance-to-set

regularization from a Bayesian lens. We consider a class of smooth distance-to-set

priors, showing that they yield well-deﬁned posteriors toward quantifying uncertainty

for constrained learning problems. We discuss relationships and advantages over prior

work on Bayesian constraint relaxation. Moreover, we prove that our approach is

optimal in an information geometric-sense for ﬁnite penalty parameters

, and enjoys

favorable statistical properties when

ρ→ ∞

. The method is designed to perform

eﬀectively within gradient-based MCMC samplers, as illustrated on a suite of simulated

and real data applications.

arXiv:2210.12258v1 [stat.ME] 21 Oct 2022

1 Introduction

Constrained learning is ubiquitous in statistical tasks when seeking to impose desired structure

on solutions. Concretely, consider the task of estimating a parameter

x∈Rd

by minimizing

some loss function

(

) where

needs to satisfy a set of constraints encoded by a set

Then we seek:

min

xf(x) s.t. x∈ C (1)

A simple but powerful observation that will make this amenable to eﬀective algorithms

is to equivalently express the restriction in terms of the Euclidean distance between the

point

and the constraint set as

dist

(

x,C

) = 0. In many instances, it is enough to only

approximately satisfy the constraints. A recent framework that accomplishes this kind of

constraint relaxation is known as the distance-to-set penalization [Chi et al., 2014, Xu et al.,

2017]: for ρ∈(0,∞),

x∗∈argminxhf(x) + ρ

2dist(x,C)2i.

Solutions to this problem can be obtained using a majorization-minimization (MM) scheme

known as the proximal distance algorithm [Keys et al., 2019], and it is so called because

the iterative updates are deﬁned via proximal operators [Parikh and Boyd, 2014]. However,

despite its ability to deliver point estimates eﬀectively, it is very diﬃcult to derive measures

of uncertainty, and so a general theory of inference is diﬃcult to obtain. Toward ﬁlling this

methodological gap, we recast the optimization problem in a constrained Bayesian setting by

an analog of these penalties that we term distance-to-set priors.

Our approach draws previously unexplored connections between this optimization framework

and the broader constrained Bayesian inference literature [Ghosh, 1992, Gramacy et al.,

2016], an area that continues to grow with exciting recent ideas. We focus on a tradition

of sampling through gradient-based samplers such as Hamiltonian Monte Carlo, or HMC

[Neal et al., 2011, Betancourt and Girolami, 2015]. Lan et al. [2014] utilize a spherical

HMC, mapping constraints that can be written as norms onto the hypersphere. Related

recent work uses a Riemannian HMC under a manifold setup [Kook et al., 2022], extending a

line of work pioneered by Byrne and Girolami [2013]. Duan et al. [2020] replace a support

constraint with a term that decays exponentially outside of the support, while Sen et al. [2018]

project sample draws from unconstrained posteriors to the constraint set to approximate the

original posterior. Recently, Xu et al. [2021] propose using priors based on proximal mappings

related to the constraint sets. Concurrent work in Zhou et al. [2022] propose using a the

Moreau-Yosida envelope more generally, using a class of epigraph priors toward regularized

and constrained problems suited for proximal MCMC [Pereyra, 2016].

Distance-to-set priors extend this line of inquiry, providing an eﬀective, practical way to

consider constrained inference problems. The framework is more general than many of the

previous methods in that it essentially only requires that the constraint can be written as

a set, and that projection onto that set is feasible. These priors are then easy to evaluate,

and work well within gradient-based samplers due to our smooth formulation. This improves

computational stability under posterior sampling algorithms such as HMC, as we investigate

in an empirical study. Moreover from a theoretical perspective, this class of priors admits

posteriors that converge in distribution to the original constrained problem along with their

maximum a posteriori (MAP) estimates as we increase the parameter governing the degree of

constraint enforcement. Finally, we draw a connection between Bayesian constraint relaxation

and information geometry, revealing how distance-to-set priors are optimal in a certain sense,

while simultaneously yielding a way to select the regularization parameter ρsystematically.

2 Theory and Methods

We begin by brieﬂy reviewing distance-to-set penalties and some of their key properties.

Distance-to-Set Penalties

Let

C ⊂ Rn

be convex, and let

C → R

be a convex function.

Many constrained programming problems of the form

(1)

may be intractable in their original

form, but can be converted to a sequence of simpler subproblems. To make progress, denote

the Euclidean distance from any

x∈Rn

dist

(

x,C

) :=

infykx−yk2

: then the condition

x∈ C

can be equivalently written as

dist

(

x, C

) = 0. Note that while the distance operator

is not necessarily smooth, its square is diﬀerentiable as long as the projection of

onto

, denoted

(

) :=

arg miny∈C kx−yk2

, is single-valued (Lange [2016]). Thus, we may

reformulate the problem by instead considering the smooth unconstrained optimization task:

min

xhf(x) + ρ

2dist(x,C)2i,

where

ρ >

0 is a penalty parameter. To solve the resulting problem, Lange [2016] propose a

method termed the proximal distance algorithm which makes use of the MM principle to

create surrogate functions based on distance majorization [Chi et al., 2014]. Its namesake

derives from the fact that the minimization of the surrogate functions

gρ(x|xk) = f(x) + ρ

2kx−PC(xk)k2

is related to the proximal operator of

(Parikh and Boyd [2014]): recall for a function

, the

proximal mapping with parameter λis deﬁned

proxλf (y)≡argmin

xhf(x) + 1

2λkx−yk2

2i,

which relates to our problem with

the projection at iterate

and

ρ−1

. Under this

formulation, to recover the solution to the original optimization problem, it is necessary for

ρ→ ∞

at some appropriate rate (Wright et al. [1999]). Conversely, ﬁxing a ﬁnite

results

in a solution where

is close to

, but not strictly inside of the set. Both cases may be of

interest depending on the modeling context. We will discuss primarily the latter in this paper

but also establish theoretical relationships to the former. As our primary setting is statistical,

we may think of f(x) as a convex loss function.

2.1 Distance-to-Set Priors

The proximal distance algorithm mentioned above provides a method for obtaining point

estimates under distance-to-set penalization. However, to the best of our knowledge, the

current literature does not provide results pertaining to uncertainty quantiﬁcation for these

estimators. Toward understanding their uncertainty properties, our ﬁrst contribution is

to link these ideas to a Bayesian constraint relaxation framework. Identifying a penalized

estimation problem with a Bayesian problem has been done at least as early as the seminal

LASSO paper [Tibshirani, 1996]. Consider data

y|θ∈Rn

that has likelihood

(

θ|y

)

and is parameterized by some parameter

with prior

(

) that is absolutely continuous

with respect to Lebesgue measure, with support

but constrained to

Θ⊂Rd

. Since

constrained to Θ, Bayes’ Theorem gives the posterior for θ:

π(θ|y)∝L(θ|y)π(θ)1θ∈Θ

∝L(θ|y)π(θ)1dist(θ,Θ)=0

where the second line follows from the discussion of distance-to-set penalties. Since sampling

from a posterior sharply constrained on

may be diﬃcult, we can replace the indicator

representing the constraint with

exp −ρ

2dist(θ,Θ)2

. An illustration is given in Figure 1.

This term is equal to the indicator on

and rapidly decays to zero as the distance from

grows larger. We choose to square the distance-to-set operator to align with the

distance-to-set optimization and to improve sampling performance, which we discuss in

Section 2.4.

−3−2−1 1 2 3

−0.5

0.5

1.5

Figure 1: Schematic of distance-to-set prior for the set

= [

−

1]. The blue dashed line

represents the indicator

1Θ

, and the red solid line represents the relaxation of

1Θ

that we

consider in this paper.

We propose to use distance-to-set priors, deﬁned as follows:

eπ(θ) := π(θ) exp −ρ

2dist(θ,Θ)2,

where

ρ >

0 is a hyperparameter in our treatment. To bridge this with the optimization

setting, we can deﬁne

(

) :=

−log

(

θ|y

)

(

)). However, it should be noted that although

every likelihood function gives us a loss by taking the negative-log, the converse is not always

true. Thus, one could generalize our approach by considering more general loss functions and

incorporating them into the Bayesian framework via Gibbs posteriors [Bissiri et al., 2016]; see

also Jacob et al. [2017]. Ignoring the constraint for a moment, we obtained the unconstrained

posterior:

π(θ|y)∝L(θ|y)π(θ).(2)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Distance-to-SetPriorsandConstrainedBayesianInferenceRickPresmanDepartmentofStatisticalScience,DukeUniversityJasonXuDepartmentofStatisticalScience,DukeUniversityOctober25,2022AbstractConstrainedlearningisprevalentinmanystatisticaltasks.Recentworkproposesdistance-to-setpenaltiestoderiveestimatorsunder...

展开>> 收起<<

Distance-to-Set Priors and Constrained Bayesian Inference Rick Presman.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Distance-to-Set Priors and Constrained Bayesian Inference Rick Presman

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: