prior distribution of true effects with a scale mixture of mean-zero Gaussians,
fitted with ebnm [61]. Treating the estimated prior as the truth, we simulate
new experiments’ true and estimated treatment effects, and evaluate the re-
gret of the empirical Bayes approach on this semi-synthetic data. Consistent
with our theoretical results, we find that regret is Op(n−1). By comparison,
identifying the set of the top 10% experiments with all misclassifications being
equally penalized regardless of their magnitude, or estimating the treatment
effects of the selected experiments, or estimating the prior distribution itself,
are all categorically harder problems, each of which only exhibits convergence
at the usual parametric rate.
Our work builds on several large and active strands of the statistics and
econometrics literature. Foundational work introducing and developing the
empirical Bayes approach to statistics includes [51,41,52,21]. Applications
of the selection problem have proliferated, as the general problem of discern-
ing between units which perform well or poorly on the basis of noisy, het-
eroskedastic measurements describes many real-world settings of interest. Pre-
vious work has studied identifying the best teachers [39,38,35,10,25], the
best medical facilities [56,27,17,36], the best baseball players [22,6]; differ-
entially expressed genes [23,54]; promising drug candidates [62]; geographic
areas associated with the greatest intergenerational mobility [4] or mortality
[46], and employers exhibiting the most evidence of discrimination [42]. In-
ternet experiments are particularly well-suited to empirical Bayes methods
[15,26,2,12,3,30] as datasets are often large enough for accurate estimation
of flexibly-specified priors, and the experiment-level sampling error is typi-
cally close to normally distributed. For these applications, the aggregate value
of the selected units will often be an important component of the decision-
maker’s utility function. Our results provide theoretical and empirical support
for selection based on such methods.
The literature on post-selection inference, including [14,13,33,24,37,1,31],
also studies selection problems, but differs from the present work in that its
chief focus is estimating the values, differences or ranks of the selected units,
rather than analyzing the regret associated with the selection. [14,13] provide
estimates for the value of a selection unit. [33,24,37,1,31] largely aim at
frequentist inferences. While the notion of regret we consider averages over
possible draws from the distribution of units’ true values, an alternative line
of inquiry beyond the scope of this paper would be to characterize admissible
and minimax decision rules for the frequentist analog of the regret we define,
considering the units’ values as fixed constants.
Closely related to our paper are [29,49]. [29] take an empirical Bayes ap-
proach to selecting the best units while controlling the marginal false discov-
ery rate; [49] assert frequentist control over the familywise error rate, which
amounts to a zero-one loss based on the correctness of the ranks. Both con-
sider loss functions different from ours. In their frameworks, the loss function
contains a discontinuity near the oracle Bayes decision rule, e.g. [29, Equation
3.1]. Mistakenly selecting or omitting any unit incurs a discrete cost, whereas
in ours the cost of mistakenly selecting or omitting a marginal unit near the
3