Empirical Bayes Selection for Value Maximization Dominic Coey and Kenneth Hung

2025-05-01 0 0 378.54KB 29 页 10玖币

侵权投诉

Empirical Bayes Selection for Value

Maximization

Dominic Coey and Kenneth Hung

Meta

{coey, kenhung}@meta.com

January 3, 2023

Abstract

We study the problem of selecting the best munits from a set of nas

m/n→α∈(0, 1), where noisy, heteroskedastic measurements of the units’

true values are available and the decision-maker wishes to maximize the

average true value of the units selected. Given a parametric prior distribu-

tion, the empirical Bayes decision rule incurs Op(n−1)regret relative to the

Bayesian oracle that knows the true prior. More generally, if the error in the

estimated prior is of order Op(rn), regret is Op(r2

n). In this sense selecting

the best units is easier than estimating their values. We show this regret

bound is sharp in the parametric case, by giving an example in which it

is attained. Using priors calibrated from a dataset of over four thousand

internet experiments, we ﬁnd that empirical Bayes methods perform well

in practice for detecting the best treatments given only a modest number

of experiments.

1 Introduction

In many important scientiﬁc and economic applications, decision-makers are

presented with data on the performance of nunits, from which they must se-

lect a strict subset for further investigation or treatment. Examples include

identifying the best teachers, hospitals, or athletes [6,17,11]; genes associated

with particular outcomes [23]; drug candidates [62]; or, in the application we

consider in this paper, the treatments in internet experiments with the largest

effects on a metric of interest. Each unit is associated with an unobserved true

value, which is measured with heteroskedastic noise. The constraint that only

munits can be selected arises naturally when the decision-maker has limited

resources to devote to the chosen units, and must restrict attention to the most

promising candidates.

arXiv:2210.03905v2 [stat.ME] 3 Jan 2023

A desirable feature of a selection procedure is that the aggregate value of its

selections is close to the maximum attainable value. Understanding how dif-

ferent selection procedures perform in this respect enables decision-makers to

assess how likely they are to succeed in the preceding applications. The empir-

ical Bayes approach to this question involves estimating the unknown, prior

distribution from which the true values are drawn, and selecting units with

the highest estimated posterior means. We show that if the prior distribution

is known to lie within some parametric class, empirical Bayes incurs regret of

order Op(n−1)relative to the oracle Bayes procedure in which the prior dis-

tribution is known. This is faster than the usual n−1/2 rate of convergence in

parametric models. In this sense selection is easier than estimation: picking

the set of units with the highest values is easier than pinning down the precise

values of those units. This generalizes directly to the nonparametric case: re-

gret converges to zero at the square of the rate that estimation error in the prior

converges to zero.

The basic intuition for this result is as follows. First, mistakes, whether of

inclusion or exclusion, are only likely to happen for those units whose true val-

ues are sufﬁciently close to a critical threshold. Units comfortably (or below)

above that threshold will be correctly selected (or omitted) with high probabil-

ity (w.h.p.). Second, even those mistakes cannot be too costly, as items incor-

rectly included or excluded are likely to be marginal — almost good enough to

be selected, or almost bad enough to be omitted. The regret, which is the prod-

uct of two terms corresponding to these factors, will therefore be second-order

small. We show that our Op(n−1)bound is sharp in the parametric case, by

constructing an example in which regret is at least Cn−1with non-vanishing

probability for some positive constant C.

We apply this methodology to internet experimentation, where units cor-

respond to experiments, and values to treatment effects. Heteroskedasticity

arises because experiments vary in sample size. Technology companies may

wish to identify a subset of best- or worst-performing experiments for further

investigation, in the former case, as candidates to launch to production, or in

the latter case, as candidates to stop early. When the follow-up investigation

incurs some cost, it may only be feasible to select a strict subset of experiments

for more analysis. In this application, as in others, the cost of mistakes depends

on their magnitude, that is, on the difference between the aggregate value of

the units selected and the units that should have been selected.

The oracle Bayes decision rule assumes knowledge of the prior distribution

of treatment effects.1Given the objective of maximizing the expected aggregate

values of the selection, the optimal procedure is to select the munits with treat-

ment effects with the highest posterior means. The empirical Bayes approach

differs from this only in using the estimated prior instead of the true prior. We

compare the performance of the empirical and oracle Bayes procedures in se-

lecting the top 10% of experiments from a sample of 4677. We estimate the

1We refer to this as the oracle Bayes decision rule rather than simply the Bayes decision rule

throughout, to emphasize that the prior is unknown to the decision-maker.

prior distribution of true effects with a scale mixture of mean-zero Gaussians,

ﬁtted with ebnm [61]. Treating the estimated prior as the truth, we simulate

new experiments’ true and estimated treatment effects, and evaluate the re-

gret of the empirical Bayes approach on this semi-synthetic data. Consistent

with our theoretical results, we ﬁnd that regret is Op(n−1). By comparison,

identifying the set of the top 10% experiments with all misclassiﬁcations being

equally penalized regardless of their magnitude, or estimating the treatment

effects of the selected experiments, or estimating the prior distribution itself,

are all categorically harder problems, each of which only exhibits convergence

at the usual parametric rate.

Our work builds on several large and active strands of the statistics and

econometrics literature. Foundational work introducing and developing the

empirical Bayes approach to statistics includes [51,41,52,21]. Applications

of the selection problem have proliferated, as the general problem of discern-

ing between units which perform well or poorly on the basis of noisy, het-

eroskedastic measurements describes many real-world settings of interest. Pre-

vious work has studied identifying the best teachers [39,38,35,10,25], the

best medical facilities [56,27,17,36], the best baseball players [22,6]; differ-

entially expressed genes [23,54]; promising drug candidates [62]; geographic

areas associated with the greatest intergenerational mobility [4] or mortality

[46], and employers exhibiting the most evidence of discrimination [42]. In-

ternet experiments are particularly well-suited to empirical Bayes methods

[15,26,2,12,3,30] as datasets are often large enough for accurate estimation

of ﬂexibly-speciﬁed priors, and the experiment-level sampling error is typi-

cally close to normally distributed. For these applications, the aggregate value

of the selected units will often be an important component of the decision-

maker’s utility function. Our results provide theoretical and empirical support

for selection based on such methods.

The literature on post-selection inference, including [14,13,33,24,37,1,31],

also studies selection problems, but differs from the present work in that its

chief focus is estimating the values, differences or ranks of the selected units,

rather than analyzing the regret associated with the selection. [14,13] provide

estimates for the value of a selection unit. [33,24,37,1,31] largely aim at

frequentist inferences. While the notion of regret we consider averages over

possible draws from the distribution of units’ true values, an alternative line

of inquiry beyond the scope of this paper would be to characterize admissible

and minimax decision rules for the frequentist analog of the regret we deﬁne,

considering the units’ values as ﬁxed constants.

Closely related to our paper are [29,49]. [29] take an empirical Bayes ap-

proach to selecting the best units while controlling the marginal false discov-

ery rate; [49] assert frequentist control over the familywise error rate, which

amounts to a zero-one loss based on the correctness of the ranks. Both con-

sider loss functions different from ours. In their frameworks, the loss function

contains a discontinuity near the oracle Bayes decision rule, e.g. [29, Equation

3.1]. Mistakenly selecting or omitting any unit incurs a discrete cost, whereas

in ours the cost of mistakenly selecting or omitting a marginal unit near the

selection threshold is small, see also [29, Section 3.2]. We view these as com-

plementary perspectives. While in some decision problems mistakes may be

undesirable per se, the aggregate performance of the selected units is typically

still a relevant consideration. In the context of teacher evaluation, for example,

the policy-maker may rightly be concerned with guarantees over the number

of teachers who are incorrectly ﬁred [48], but may also wish to understand how

well their selection procedure is performing from the students’ perspective, in

terms of the aggregate “value-added” of the teachers who keep their jobs. In

other contexts, as in internet experimentation or drug discovery, the aggregate

value of the selection is the primary concern, and it is harder to justify caring

about the number of mistakes per se.

Our selection problem may remind readers of the multi-armed bandit lit-

erature, e.g. [53,9] study the problem of identifying the best arm or best m

arms at a certain probability. However to target the probability of correct se-

lection is to consider discontinuous loss functions similar to ones in [29,49].

We also note that our selection problem is non-sequential which leads to new

challenges, as poor choices of parameter in the prior cannot be overcome with

additional samples in long run.

Finally, our work is particularly tied to the framework introduced in [59],

which generalizes the restrictive compound decision framework from [34] to a

class of simultaneous decision problems that are permutation invariant. How-

ever the optimal frequentist solution requires knowledge of the empirical c.d.f.

(e.c.d.f.), or equivalently the order statistic, of the µi’s. Thus for practical rea-

sons, we take a more Bayesian approach that also allows analysis of the perfor-

mance.

Our main contribution relative to this existing literature is to provide the

ﬁrst regret bounds for empirical Bayes selection, and further to show that these

bounds are sharp in the parametric case. Our empirical work on internet exper-

imentation complements this by verifying that regret is quantitatively modest

in practice, given a reasonable number experiments of moderate precision. To-

gether, our theoretical and empirical results suggest optimism for empirical

Bayes approaches to selection when the decision-maker is primarily concerned

with maximizing the aggregate value of the selected units, as opposed to cor-

rectly classifying the top units or estimating their values.2

2Convergence and rate results are available for empirical Bayes as applied to compound deci-

sion problems, e.g. [34,58,63,32,50], but the compound decision framework is rather restrictive

[59], and does not encompass the value maximization problem studied here of selecting the best m

of nunits.

2 The Top-mSelection Problem

2.1 Setup

There are nunits, each of which is associated with a unobserved true value

µi∈Rand an observed noise standard deviation σi>03. The µiand σiare

distributed independently from each other and independently across experi-

ments with unknown marginal distributions G0and H0, i.e.

(µi,σi)∼G0×H0.

We consider a potentially nonparametric family of nondegenerate prior distri-

butions G(η), where the parameter ηbelongs to a metric space (M,d), and the

family includes the truth G0, i.e. G0=G(η0)for some η0∈ M. For each unit i,

the decision-maker observes a measurement Xi∈R, which is distributed as

Xi|µi,σi∼ N(µi,σ2

i).

The decision-maker must choose munits for some m<n. Their average utility

given the index set of choices J⊂ {1, 2, . . . , n}is U(J) = 1

n∑n

i=11(i∈J)µi. Let

fη,σi(Xi) = Rµ1

σiφXi−µ

σidG(η)

σiφXi−µ

σidG(η)(1)

denote the posterior mean of µigiven Xi,σi, assuming that the prior distri-

bution of µiis G(η), where φ(·)is the probability density function (p.d.f.)

of a standard Gaussian. The true posterior mean is fη0,σi(Xi). An estimator

η=b

η(X1, . . . , Xn,σ1, . . . , σn)of η0is available, where b

ηconverges to η0at some

rate rnin d(·,·). It is used to form the empirical Bayes prior, G(b

η), and con-

sequently to construct posterior mean estimates, fb

η,σi(Xi). For simplicity we

denote fη0,σi(Xi)and fb

η,σi(Xi), the oracle and empirical Bayes posterior means

for unit i, as θiand b

θirespectively. Note we can view θias being drawn i.i.d.

from a distribution, which we denote P.

Given the observed data, an oracle Bayesian decision-maker maximizes ex-

pected utility (where the expectation is with respect to the posterior distribu-

tion over the unknown true values) by selecting the munits with the highest

values of θi, breaking ties randomly. The empirical Bayes decision-maker mim-

ics this rule, by selecting the munits with the highest values of b

θi, breaking ties

randomly. Letting JEB and JBayes be the empirical Bayes and oracle Bayes choice

sets, the regret from empirical relative to oracle Bayes is

R=E[U(JBayes)|X1, . . . , Xn,σ1, . . . , σn]−E[U(JEB)|X1, . . . , Xn,σ1, . . . , σn]

∑

i=1

(1(i∈JBayes)−1(i∈JEB))θi, (2)

3We assume σito be known, in line with past applications of empirical Bayes methods, e.g.

[60,30,16].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EmpiricalBayesSelectionforValueMaximizationDominicCoeyandKennethHungMetafcoey,kenhungg@meta.comJanuary3,2023AbstractWestudytheproblemofselectingthebestmunitsfromasetofnasm/n!a2(0,1),wherenoisy,heteroskedasticmeasurementsoftheunits'truevaluesareavailableandthedecision-makerwishestomaximizetheaveraget...

展开>> 收起<<

Empirical Bayes Selection for Value Maximization Dominic Coey and Kenneth Hung.pdf

共29页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Empirical Bayes Selection for Value Maximization Dominic Coey and Kenneth Hung

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: