Adaptive AB Tests and Simultaneous Treatment Parameter Optimization Yuhang Wu Zeyu Zhengz Guangyu Zhangz Zuohua Zhangz Chu Wangz

2025-04-24 0 0 1.6MB 27 页 10玖币

Adaptive A/B Tests and Simultaneous Treatment

Parameter Optimization

Yuhang Wu, Zeyu Zheng‡, Guangyu Zhang‡, Zuohua Zhang‡, Chu Wang‡

Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA

‡Amazon.com Inc, Seattle, Washington

A/B tests, also known as online randomized controlled experiments, are massively used by data-driven

enterprises to estimate the expected eﬀect diﬀerence between a treatment plan (B) and a control plan (A).

Building a valid central limit theorem (CLT) for A/B tests estimators is a crucial yet standard task, in

order to provide valid statistical inference such as constructing asymptotically valid conﬁdence intervals.

We ﬁnd that this task of building valid CLT encounters challenges in emerging applications for data-driven

enterprises, where the treatment plan is not a single plan, but instead encompasses an inﬁnite continuum of

plans indexed by a continuous treatment parameter. As such, the experimenter not only needs to provide

valid statistical inference, but also desires to eﬀectively and adaptively ﬁnd the optimal choice of value for the

treatment parameter to use for the treatment plan. However, we ﬁnd that classical stochastic optimization

algorithms, despite of their fast convergence rates under strong convexity assumptions, may not come with

a CLT with a vanishing bias. This non-vanishing bias can harm the construction of asymptotically valid

conﬁdence intervals. We ﬁx this issue by providing a new stochastic optimization algorithm that on one hand

maintains the same fast convergence rate and on the other hand permits the establishment of a valid CLT

with vanishing bias. We discuss practical implementations of the proposed algorithm and conduct numerical

experiments to illustrate the theoretical ﬁndings.

1. Introduction

A/B tests, also referred to as A/B testing or online controlled experiments, play an important role

in data-driven enterprises and online platforms to test innovative ideas, evaluate an upgrade or

update in products, and support decisions; see Kohavi et al. (2020) for a comprehensive review of

A/B tests. Conventionally, the term “A” refers to the control plan, which often describes the current

system in use. The term “B” refers to the treatment plan, which often describes a new system

that has not been tested online yet. For online platforms, the procedure of an A/B test typically

involves an experiment phase and an inference phase. The experiment phase involves randomly

assigning a portion of the sequentially arriving samples (customers) to the control plan and the

rest to the treatment plan. The inference phase includes deriving an estimator for the average

treatment eﬀect (the diﬀerence between the expected treatment outcome and the expected control

outcome), and establishing a central limit theorem to characterize the asymptotic distribution of

the estimator. Gupta et al. (2019) document that A/B tests are massively used at Airbnb, Amazon,

1

arXiv:2210.06737v2 [stat.ME] 6 Dec 2022

2

Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netﬂix, Twitter, Uber, among

other companies; see also Tang et al. (2010), Kohavi et al. (2013), Johari et al. (2017).

For a range of A/B tests implemented in practice, the treatment plan is not a single plan, but

is instead a inﬁnite continuum of plans indexed by a continuous parameter. The diﬀerent value of

such parameter can lead to diﬀerent expected outcome of the treatment plan. For example, when

the treatment plan is a newly designed system that selects what advertisement (ad) to display to

a customer, such system may involve a “relevance” threshold parameter θ. That is, the system

chooses to block ads that have a relevance score lower than the threshold parameter as part of

the ad selection-and-display process, even if those irrelevant ads may generate decent revenue.

When the parameter θtakes continuous value, the treatment plan itself eﬀectively has a continuum

inﬁnite number of variants, labeled by each choice of value for θ. We call such parameter that may

aﬀect the treatment plan’s expected outcome as treatment parameter.

We consider in this work the problem of adaptive A/B tests with simultaneous treatment param-

eter optimization. That is, in addition to the standard task of A/B tests to statistically assert

whether the treatment plan is better than the control, there is a simultaneous additional task

of recommending an optimal choice of value for treatment parameter by the end of experiment.

The term “adaptive” means that the experimenter has the ability to sequentially decide, based on

adaptively collected information, what choice of value of the treatment parameter to use for the

next sample (e.g., customer) that arrives at the experiment.

Technically, the simultaneous additional task of recommending an optimal choice of value for

treatment parameter can be formulated as a zero-th order stochastic optimization prob-

lem with continuous variable. The term “zero-th order” means that only noisy function value

evaluations are available, but no gradient information of the function value is directly available.

(One can always choose to use the ﬁnite diﬀerence method to “indirectly” construct estimators

or approximations of the function gradient based on the noisy function value evaluations.) Our

considered adaptive A/B tests applications correspond to this zeroth-order indication, because

an experimenter can only observe a single random outcome from giving a customer a treatment

plan with a certain treatment parameter. For example, such random outcome can be whether the

customer clicks on the ad or not. No gradient information or other information is directly available.

Our main results can be summarized as follows.

1. We argue that even though classical zeroth-order simulation optimization or stochastic opti-

mization algorithms can be used to asymptotically ﬁnd the optimizer at fast rates under a strongly

convex assumption on the objective function, they do not come up with a central limit theorem

that can be used for valid statistical inference due to a non-vanishing bias.

3

2. We ﬁx this gap by providing a new optimization algorithm that on one hand maintains the

same fast convergence rate and on the other hand permits the establishment of a valid central

limit theorem with vanishing bias that can be used to construct asymptotically valid conﬁdence

intervals. The new algorithm strategically draws four noisy function value to construct gradient and

estimate the objective function value, instead of the classical ﬁnite diﬀerence gradient estimator

based on two draws of noisy function value.

3. We prove under a strong convexity assumption that our proposed algorithm enjoys a con-

vergence rate of O(T−1

3+ε) for any arbitrary small ε > 0 on the estimator for the optimizer, and

simultaneously enjoys an almost optimal central limit theorem (CLT) for the estimated optimal

objective function value, a CLT that is almost as good as if the optimal treatment parameter were

known in advance. We conduct numerical experiments to illustrate the theoretical ﬁndings.

1.1. Connections to Literature

Our work is closely related to work on simulation optimization with continuous variable. There is

a rich literature on this topic, including L’Ecuyer and Glynn (1994), Hu et al. (2008), Fu et al.

(2008), Andrad´ottir and Prudius (2010), Pasupathy and Ghosh (2013), Hashemi et al. (2014), Zhou

et al. (2014), Zhou and Bhatnagar (2017), Fan and Hu (2018), Kiatsupaibul et al. (2018), Hunter

et al. (2019), Zhang and Hu (2021), Hong and Zhang (2021). The setting considered by our work,

motivated by the A/B tests applications, naturally forms a simulation optimization problem where

only noisy function value evaluation is available. In our setting, each sample comes not from a

computer simulation but from feeding one of the plans to a customer. Therefore, in our setting,

it is not possible to use, for example, inﬁnitesimal perturbation analysis to conveniently obtain a

gradient estimator. This is because the outcome observed from each customer can be viewed as an

outcome observed from a black-box noisy function oracle. There is not much we can do to look into

the black box and obtain more information. Another feature of our setting is that it is diﬃcult,

if not impossible, to apply common random numbers. Typically one way to use common random

numbers is to keep all randomness the same whereas adjusting the choice of the decision variable

to obtain random function output. However, due to the nature of A/B test with customers and

causality requirement, we are generally not able to keep all randomness. With customers, using

common random numbers is like creating parallel worlds and applying diﬀerent plans to the same

customer in diﬀerent parallel worlds to observe their outcomes. Creating such “parallel worlds”

would be easier on computer simulation compared to on customers.

What is discussed above may be summarized as that: the simulation/stochastic optimization

problem in the A/B tests setting is one of the more restrictive cases of the general zeroth-order

simulation/stochastic optimization problems, with only noisy function value evaluations available

4

and without the access to using common random numbers. When constructing a gradient estimator

using noisy function value evaluations, one is not able is use common random numbers for these

noisy function value evaluations to reduce the variance the gradient estimator. This fact leads

the algorithm design in our work to be diﬀerent from the literature; we use four noisy function

value draws to construct the ﬁnite-diﬀerence gradient estimator, instead of the classical two noisy

function value draws. The necessity, as we will discuss in the work, is that the classical two noisy

function value draws do not lead to valid statistical inference in the A/B test setting when no

common randomness can be used.

Our work is also connected to the stochastic approximation literature; see Robbins and Monro

(1951), Kiefer and Wolfowitz (1952), Spall (1992), L’Ecuyer and Yin (1998), Kushner and Yin

(2003), Kim et al. (2015), He et al. (2021), Hu et al. (2022) among others. Stochastic approximation

problems generally aim to adaptively ﬁnd the root of an equation in some region of the variable.

Stochastic approximation and simulation/stochastic optimization are tightly connected. One way

to connect is that some simulation/optimization problems can be formulated into ﬁnding the root

of the equation that equals the gradient to zero. Our work is especially connected to L’Ecuyer

and Yin (1998), who consider constructing central limit theorems for stochastic approximation

problems with a given budget. Even though the setting is diﬀerent, our central limit theorem proof

is partially inspired by L’Ecuyer and Yin (1998).

Our work is connected to the literature of stochastic zeroth-order optimization; see Ghadimi and

Lan (2013), Duchi et al. (2015), Nesterov and Spokoiny (2017), Liu et al. (2018), Balasubramanian

and Ghadimi (2018), Ajalloeian and Stich (2020) among others. The commonality in our work is

that we also need to use zeroth-order noisy function evaluations to construct gradient estimator

and do gradient search for optimizing the treatment parameter. Our work additionally provides

statistical inference results (because they are needed to assess whether the treatment plan is signif-

icantly better than the control under recommended value for the treatment parameter) in addition

to the optimization results. Our work is also connected to Xie et al. (2016), Wu et al. (2017),

Letham et al. (2019), Wang et al. (2020), Balandat et al. (2020); they consider the problem of

adaptively optimizing a continuous parameter of the treatment plan, under a Bayesian framework.

Our work is diﬀerent in three folds. Our algorithm takes a frequentist view, constructs gradient

information using noisy function value observations, and does gradient search, which is diﬀerent

from the Bayesian approach. Our work develops a central limit theorem (CLT) that can be directly

used for large-sample statistical inference, while the above mentioned work does not emphasize on

CLT.

Organization. In Section 2, we describe the problem setting for adaptive A/B tests and simul-

taneous treatment parameter optimization. Section 3provides a new algorithm design. In Section

5

4, we provide theoretical guarantees and central limit theorem for the proposed algorithm. Section

5presents numerical experiments. Section 6concludes.

2. Problem Setting

2.1. Adaptive A/B tests

We ﬁrst describe the problem setting of a simple adaptive A/B test with one treatment plan

and one control plan, without treatment parameter optimization. We then state the the problem

setting for adaptive A/B tests and simultaneous treatment parameter optimization in Section 2.2.

To put the discussion in context, we presume the control plan to be the currently in-use system

that decides what type of advertisement (ad) to display for a customer; whereas the treatment

plan is a newly designed system that decides what type of advertisement (ad) to display for a

customer. The term “adaptive” means that (i) the experimenter have timely observations for the

outcomes of treatment/control and (ii) the experimenter can adaptively adjust the assignment of

treatment/control. Such adaptivity is generally feasible for online platforms.

To elaborate, the experimenter sets a sampling budget N, where Nis a positive integer. Cus-

tomers arrive independently and sequentially. When the i-th customer arrives, the experimenter

chooses either the treatment plan or the control plan to assign to the customer, through the use of

some pre-speciﬁed randomization scheme. In the adaptive settings, the experimenter can immedi-

ately observe the outcome of the treatment/control, for example, whether the customer clicks on a

displayed ad or not. Denote the data collected in the experiment by {Ii, yi}i=1,2,··· ,N , where Iitakes

value 1 (or 0) representing the i-th customer receiving the treatment (or control), and yidenotes

the outcome observed for the i-th customer. When deciding whether to assign the treatment plan

or the control plan to the (i+ 1)-th customer, the experimenter has access to the observations of

{Ij, yj}j=1,2,··· ,i.

Once the experiment data {Ii, yi}i=1,2,··· ,N are collected, a standard estimator the average treat-

ment eﬀect (ATE) (i.e., the diﬀerence between the treatment’s expected outcome and the control’s

expected outcome) is given by

PN

i=1 1(Ii= 1) ·yi

PN

i=1 1(Ii= 1) −PN

i=1 1(Ii= 0) ·yi

PN

i=1 1(Ii= 0) ,

where the notation 1(·) represents the indicator function. A core statistical inference task is to

build a valid central limit theorem (CLT) and understand the asymptotic distribution for the ATE

estimator when the sample size Nis large.

2.2. Adaptive A/B tests with simultaneous treatment parameter optimization

This subsection describes the problem setting for adaptive A/B tests and simultaneous treatment

parameter optimization. In some practical scenarios, inside the treatment plan is a tunable param-

eter. When the parameter θtakes continuous value, the treatment plan itself eﬀectively has a

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AdaptiveA/BTestsandSimultaneousTreatmentParameterOptimizationYuhangWu,ZeyuZhengz,GuangyuZhangz,ZuohuaZhangz,ChuWangzDepartmentofIndustrialEngineeringandOperationsResearch,UniversityofCalifornia,Berkeley,CAzAmazon.comInc,Seattle,WashingtonA/Btests,alsoknownasonlinerandomizedcontrolledexperiments,a...

展开>> 收起<<

Adaptive AB Tests and Simultaneous Treatment Parameter Optimization Yuhang Wu Zeyu Zhengz Guangyu Zhangz Zuohua Zhangz Chu Wangz.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

相关推荐

更多

立即下载

分类：图书资源 价格：10玖币 属性：27 页 大小：1.6MB 格式：PDF 时间：2025-04-24

开通VIP享超值会员特权

多端同步记录
高速下载文档
免费文档工具
分享文档赚钱
每日登录抽奖
优质衍生服务

作者详情

MAOOA..
高级编辑

文档 14218 粉丝 0

相关内容

更多

热门标签

人际关系配电装置动力学连接体力的合成高考理综全宋诗作者索引公务员考试

/ 27

评分收藏

立即下载

关于我们联系我们隐私政策用户协议免责申明会员服务协议
本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！ Copyright ©Jiubeiyunall rights reserved SITEMAP| 备案号：渝ICP备2024044455号| 渝公网安备50010702506394 | 违法与不良信息举报方式：微信:jiubeiyun2024,QQ:264159069,电话:15523442343,邮箱:jiubeiyun@126.com

客服

关注

二维码已失效
刷新

打开微信，点击“扫一扫”

安全高效便捷

免密登录