Adaptive AB Tests and Simultaneous Treatment Parameter Optimization Yuhang Wu Zeyu Zhengz Guangyu Zhangz Zuohua Zhangz Chu Wangz

2025-04-24 0 0 1.6MB 27 页 10玖币
侵权投诉
Adaptive A/B Tests and Simultaneous Treatment
Parameter Optimization
Yuhang Wu, Zeyu Zheng‡, Guangyu Zhang, Zuohua Zhang, Chu Wang
Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA
Amazon.com Inc, Seattle, Washington
A/B tests, also known as online randomized controlled experiments, are massively used by data-driven
enterprises to estimate the expected effect difference between a treatment plan (B) and a control plan (A).
Building a valid central limit theorem (CLT) for A/B tests estimators is a crucial yet standard task, in
order to provide valid statistical inference such as constructing asymptotically valid confidence intervals.
We find that this task of building valid CLT encounters challenges in emerging applications for data-driven
enterprises, where the treatment plan is not a single plan, but instead encompasses an infinite continuum of
plans indexed by a continuous treatment parameter. As such, the experimenter not only needs to provide
valid statistical inference, but also desires to effectively and adaptively find the optimal choice of value for the
treatment parameter to use for the treatment plan. However, we find that classical stochastic optimization
algorithms, despite of their fast convergence rates under strong convexity assumptions, may not come with
a CLT with a vanishing bias. This non-vanishing bias can harm the construction of asymptotically valid
confidence intervals. We fix this issue by providing a new stochastic optimization algorithm that on one hand
maintains the same fast convergence rate and on the other hand permits the establishment of a valid CLT
with vanishing bias. We discuss practical implementations of the proposed algorithm and conduct numerical
experiments to illustrate the theoretical findings.
1. Introduction
A/B tests, also referred to as A/B testing or online controlled experiments, play an important role
in data-driven enterprises and online platforms to test innovative ideas, evaluate an upgrade or
update in products, and support decisions; see Kohavi et al. (2020) for a comprehensive review of
A/B tests. Conventionally, the term “A” refers to the control plan, which often describes the current
system in use. The term “B” refers to the treatment plan, which often describes a new system
that has not been tested online yet. For online platforms, the procedure of an A/B test typically
involves an experiment phase and an inference phase. The experiment phase involves randomly
assigning a portion of the sequentially arriving samples (customers) to the control plan and the
rest to the treatment plan. The inference phase includes deriving an estimator for the average
treatment effect (the difference between the expected treatment outcome and the expected control
outcome), and establishing a central limit theorem to characterize the asymptotic distribution of
the estimator. Gupta et al. (2019) document that A/B tests are massively used at Airbnb, Amazon,
1
arXiv:2210.06737v2 [stat.ME] 6 Dec 2022
2
Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, among
other companies; see also Tang et al. (2010), Kohavi et al. (2013), Johari et al. (2017).
For a range of A/B tests implemented in practice, the treatment plan is not a single plan, but
is instead a infinite continuum of plans indexed by a continuous parameter. The different value of
such parameter can lead to different expected outcome of the treatment plan. For example, when
the treatment plan is a newly designed system that selects what advertisement (ad) to display to
a customer, such system may involve a “relevance” threshold parameter θ. That is, the system
chooses to block ads that have a relevance score lower than the threshold parameter as part of
the ad selection-and-display process, even if those irrelevant ads may generate decent revenue.
When the parameter θtakes continuous value, the treatment plan itself effectively has a continuum
infinite number of variants, labeled by each choice of value for θ. We call such parameter that may
affect the treatment plan’s expected outcome as treatment parameter.
We consider in this work the problem of adaptive A/B tests with simultaneous treatment param-
eter optimization. That is, in addition to the standard task of A/B tests to statistically assert
whether the treatment plan is better than the control, there is a simultaneous additional task
of recommending an optimal choice of value for treatment parameter by the end of experiment.
The term “adaptive” means that the experimenter has the ability to sequentially decide, based on
adaptively collected information, what choice of value of the treatment parameter to use for the
next sample (e.g., customer) that arrives at the experiment.
Technically, the simultaneous additional task of recommending an optimal choice of value for
treatment parameter can be formulated as a zero-th order stochastic optimization prob-
lem with continuous variable. The term “zero-th order” means that only noisy function value
evaluations are available, but no gradient information of the function value is directly available.
(One can always choose to use the finite difference method to “indirectly” construct estimators
or approximations of the function gradient based on the noisy function value evaluations.) Our
considered adaptive A/B tests applications correspond to this zeroth-order indication, because
an experimenter can only observe a single random outcome from giving a customer a treatment
plan with a certain treatment parameter. For example, such random outcome can be whether the
customer clicks on the ad or not. No gradient information or other information is directly available.
Our main results can be summarized as follows.
1. We argue that even though classical zeroth-order simulation optimization or stochastic opti-
mization algorithms can be used to asymptotically find the optimizer at fast rates under a strongly
convex assumption on the objective function, they do not come up with a central limit theorem
that can be used for valid statistical inference due to a non-vanishing bias.
3
2. We fix this gap by providing a new optimization algorithm that on one hand maintains the
same fast convergence rate and on the other hand permits the establishment of a valid central
limit theorem with vanishing bias that can be used to construct asymptotically valid confidence
intervals. The new algorithm strategically draws four noisy function value to construct gradient and
estimate the objective function value, instead of the classical finite difference gradient estimator
based on two draws of noisy function value.
3. We prove under a strong convexity assumption that our proposed algorithm enjoys a con-
vergence rate of O(T1
3+ε) for any arbitrary small ε > 0 on the estimator for the optimizer, and
simultaneously enjoys an almost optimal central limit theorem (CLT) for the estimated optimal
objective function value, a CLT that is almost as good as if the optimal treatment parameter were
known in advance. We conduct numerical experiments to illustrate the theoretical findings.
1.1. Connections to Literature
Our work is closely related to work on simulation optimization with continuous variable. There is
a rich literature on this topic, including L’Ecuyer and Glynn (1994), Hu et al. (2008), Fu et al.
(2008), Andrad´ottir and Prudius (2010), Pasupathy and Ghosh (2013), Hashemi et al. (2014), Zhou
et al. (2014), Zhou and Bhatnagar (2017), Fan and Hu (2018), Kiatsupaibul et al. (2018), Hunter
et al. (2019), Zhang and Hu (2021), Hong and Zhang (2021). The setting considered by our work,
motivated by the A/B tests applications, naturally forms a simulation optimization problem where
only noisy function value evaluation is available. In our setting, each sample comes not from a
computer simulation but from feeding one of the plans to a customer. Therefore, in our setting,
it is not possible to use, for example, infinitesimal perturbation analysis to conveniently obtain a
gradient estimator. This is because the outcome observed from each customer can be viewed as an
outcome observed from a black-box noisy function oracle. There is not much we can do to look into
the black box and obtain more information. Another feature of our setting is that it is difficult,
if not impossible, to apply common random numbers. Typically one way to use common random
numbers is to keep all randomness the same whereas adjusting the choice of the decision variable
to obtain random function output. However, due to the nature of A/B test with customers and
causality requirement, we are generally not able to keep all randomness. With customers, using
common random numbers is like creating parallel worlds and applying different plans to the same
customer in different parallel worlds to observe their outcomes. Creating such “parallel worlds”
would be easier on computer simulation compared to on customers.
What is discussed above may be summarized as that: the simulation/stochastic optimization
problem in the A/B tests setting is one of the more restrictive cases of the general zeroth-order
simulation/stochastic optimization problems, with only noisy function value evaluations available
4
and without the access to using common random numbers. When constructing a gradient estimator
using noisy function value evaluations, one is not able is use common random numbers for these
noisy function value evaluations to reduce the variance the gradient estimator. This fact leads
the algorithm design in our work to be different from the literature; we use four noisy function
value draws to construct the finite-difference gradient estimator, instead of the classical two noisy
function value draws. The necessity, as we will discuss in the work, is that the classical two noisy
function value draws do not lead to valid statistical inference in the A/B test setting when no
common randomness can be used.
Our work is also connected to the stochastic approximation literature; see Robbins and Monro
(1951), Kiefer and Wolfowitz (1952), Spall (1992), L’Ecuyer and Yin (1998), Kushner and Yin
(2003), Kim et al. (2015), He et al. (2021), Hu et al. (2022) among others. Stochastic approximation
problems generally aim to adaptively find the root of an equation in some region of the variable.
Stochastic approximation and simulation/stochastic optimization are tightly connected. One way
to connect is that some simulation/optimization problems can be formulated into finding the root
of the equation that equals the gradient to zero. Our work is especially connected to L’Ecuyer
and Yin (1998), who consider constructing central limit theorems for stochastic approximation
problems with a given budget. Even though the setting is different, our central limit theorem proof
is partially inspired by L’Ecuyer and Yin (1998).
Our work is connected to the literature of stochastic zeroth-order optimization; see Ghadimi and
Lan (2013), Duchi et al. (2015), Nesterov and Spokoiny (2017), Liu et al. (2018), Balasubramanian
and Ghadimi (2018), Ajalloeian and Stich (2020) among others. The commonality in our work is
that we also need to use zeroth-order noisy function evaluations to construct gradient estimator
and do gradient search for optimizing the treatment parameter. Our work additionally provides
statistical inference results (because they are needed to assess whether the treatment plan is signif-
icantly better than the control under recommended value for the treatment parameter) in addition
to the optimization results. Our work is also connected to Xie et al. (2016), Wu et al. (2017),
Letham et al. (2019), Wang et al. (2020), Balandat et al. (2020); they consider the problem of
adaptively optimizing a continuous parameter of the treatment plan, under a Bayesian framework.
Our work is different in three folds. Our algorithm takes a frequentist view, constructs gradient
information using noisy function value observations, and does gradient search, which is different
from the Bayesian approach. Our work develops a central limit theorem (CLT) that can be directly
used for large-sample statistical inference, while the above mentioned work does not emphasize on
CLT.
Organization. In Section 2, we describe the problem setting for adaptive A/B tests and simul-
taneous treatment parameter optimization. Section 3provides a new algorithm design. In Section
5
4, we provide theoretical guarantees and central limit theorem for the proposed algorithm. Section
5presents numerical experiments. Section 6concludes.
2. Problem Setting
2.1. Adaptive A/B tests
We first describe the problem setting of a simple adaptive A/B test with one treatment plan
and one control plan, without treatment parameter optimization. We then state the the problem
setting for adaptive A/B tests and simultaneous treatment parameter optimization in Section 2.2.
To put the discussion in context, we presume the control plan to be the currently in-use system
that decides what type of advertisement (ad) to display for a customer; whereas the treatment
plan is a newly designed system that decides what type of advertisement (ad) to display for a
customer. The term “adaptive” means that (i) the experimenter have timely observations for the
outcomes of treatment/control and (ii) the experimenter can adaptively adjust the assignment of
treatment/control. Such adaptivity is generally feasible for online platforms.
To elaborate, the experimenter sets a sampling budget N, where Nis a positive integer. Cus-
tomers arrive independently and sequentially. When the i-th customer arrives, the experimenter
chooses either the treatment plan or the control plan to assign to the customer, through the use of
some pre-specified randomization scheme. In the adaptive settings, the experimenter can immedi-
ately observe the outcome of the treatment/control, for example, whether the customer clicks on a
displayed ad or not. Denote the data collected in the experiment by {Ii, yi}i=1,2,··· ,N , where Iitakes
value 1 (or 0) representing the i-th customer receiving the treatment (or control), and yidenotes
the outcome observed for the i-th customer. When deciding whether to assign the treatment plan
or the control plan to the (i+ 1)-th customer, the experimenter has access to the observations of
{Ij, yj}j=1,2,··· ,i.
Once the experiment data {Ii, yi}i=1,2,··· ,N are collected, a standard estimator the average treat-
ment effect (ATE) (i.e., the difference between the treatment’s expected outcome and the control’s
expected outcome) is given by
PN
i=1 1(Ii= 1) ·yi
PN
i=1 1(Ii= 1) PN
i=1 1(Ii= 0) ·yi
PN
i=1 1(Ii= 0) ,
where the notation 1(·) represents the indicator function. A core statistical inference task is to
build a valid central limit theorem (CLT) and understand the asymptotic distribution for the ATE
estimator when the sample size Nis large.
2.2. Adaptive A/B tests with simultaneous treatment parameter optimization
This subsection describes the problem setting for adaptive A/B tests and simultaneous treatment
parameter optimization. In some practical scenarios, inside the treatment plan is a tunable param-
eter. When the parameter θtakes continuous value, the treatment plan itself effectively has a
摘要:

AdaptiveA/BTestsandSimultaneousTreatmentParameterOptimizationYuhangWu,ZeyuZhengz,GuangyuZhangz,ZuohuaZhangz,ChuWangzDepartmentofIndustrialEngineeringandOperationsResearch,UniversityofCalifornia,Berkeley,CAzAmazon.comInc,Seattle,WashingtonA/Btests,alsoknownasonlinerandomizedcontrolledexperiments,a...

展开>> 收起<<
Adaptive AB Tests and Simultaneous Treatment Parameter Optimization Yuhang Wu Zeyu Zhengz Guangyu Zhangz Zuohua Zhangz Chu Wangz.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:1.6MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注