Policy Learning with New Treatments Samuel Higbee Friday 29thSeptember 2023

2025-05-02 0 0 972.2KB 41 页 10玖币
侵权投诉
Policy Learning with New Treatments
Samuel Higbee
Friday 29th September, 2023
Abstract
I study the problem of a decision maker choosing a policy which allocates treatment to a heterogeneous
population on the basis of experimental data that includes only a subset of possible treatment values.
The effects of new treatments are partially identified by shape restrictions on treatment response. Policies
are compared according to the minimax regret criterion, and I show that the empirical analog of the
population decision problem has a tractable linear- and integer-programming formulation. I prove the
maximum regret of the estimated policy converges to the lowest possible maximum regret at a rate which
is the maximum of N1/2and the rate at which conditional average treatment effects are estimated in
the experimental data. I apply my results to design targeted subsidies for electrical grid connections in
rural Kenya, and estimate that 97% of the population should be given a treatment not implemented in
the experiment.
Department of Economics, University of Chicago. Email: samuelhigbee@uchicago.edu. I am grateful to Max Tabord-
Meehan and Alex Torgovitsky for helpful feedback and guidance for this paper. I would also like to thank seminar participants
at the University of Chicago for helpful comments.
1
arXiv:2210.04703v2 [econ.EM] 27 Sep 2023
1 Introduction
Heterogeneous treatment effects are often estimated with a decision problem in mind— should a particular
individual be treated? This question has fostered much research in econometrics, statistics, and machine
learning. However, relatively less attention has been given to another important margin of the decision—
should the treatment itself be adjusted? Whether the treatment is a medical treatment, subsidy, job training,
or audit probability, decision makers can usually entertain changing the treatment value that was observed in
the data. Even experiments with multivalued treatments may not implement an exhaustive list of treatment
values. This is especially true in the social sciences, where testing multiple interventions can be costly, and
in the medical sciences, where specific treatment doses are often tested in clinical trials. In this paper I
propose a method for allocating treatment to a population when the treatment values themselves can be
adjusted to values never before seen in the data. I show how combining the data on existing treatments
with economically motivated shape restrictions can be used to design policies that outperform those possible
when only previously implemented treatments are considered.
I first formulate a decision problem in which the decision maker observes experimental data on some
treatment values and seeks to construct a mapping, or policy, from the space of covariates to the space
of treatments in order to maximize some objective function. I assume all experimentation is done before
the policy is constructed. This setting, which is common in econometrics, is often referred to as treatment
choice or offline policy learning (examples include Athey and Wager (2021), Bhattacharya and Dupas (2012),
Kitagawa and Tetenov (2018) and other examples mentioned in the literature review thereof, Liu (2022),
Mbakop and Tabord-Meehan (2021), Qian and Murphy (2011), Sasaki and Ura (2020), Zhang et al. (2012),
Zhao et al. (2012)). A distinctive feature of this paper as opposed to most policy learning problems is
that the set of treatments that the decision maker can consider may be a strict superset of the support of
the treatment random variable observed in the data. This extends policy learning to practically relevant
situations in which constraints in the design and implementation of experiments or simply differences in the
objectives of the experimenter versus decision maker result in only a few treatment values being piloted in
the experiment, while the decision maker may want to consider many more.
Despite the lack of data on the impacts of these never-before-implemented treatments, I show how to
bound the response to new treatments using simple, economically interpretable restrictions on the shape of
treatment response. For example, a financial incentive may be assumed to have a positive effect, exhibit
diminishing returns, or satisfy smoothness conditions. Such shape restrictions are often exploited to partially
identify treatment effects (e.g. Manski 2009, Mogstad, Santos, and Torgovitsky 2018). The empirical analysis
of the present paper demonstrates that such bounds can be adequately informative for choosing whether and
2
how to implement new treatment values. Based on these bounds, I construct a population decision problem
to choose which treatment to assign to each covariate value. I use the minimax regret criterion to evaluate
treatment choice under partial identification following Manski (2007).
As in Manski (2004), Kitagawa and Tetenov (2018) and the subsequent literature on empirical welfare
maximization methods, I propose a decision rule based on solving the empirical analog of the decision
problem as a surrogate for the infeasible population objective. The resulting empirical minimax regret
estimator is constructed by minimizing maximum regret across an estimate of the partially identified set
of treatment response functions. In this way, the resulting policy is robust to model ambiguity induced
by introducing new treatments. Despite involving nested, non-closed form optimization problems which
characterize the identified set for treatment response, I show how the optimal policy can be computed using
the same linear and integer programming tools common in the policy learning literature. The estimator is
thus computationally feasible and can be implemented by widely available software.
I show that the proposed decision rule posesses desirable regret properties. The maximum regret obtained
under the estimated policy converges to the smallest possible maximum regret that the decision maker could
have achieved in the absence of sampling uncertainty– that is, if the population identified set were observed–
uniformly across a set of data distributions. The rate at which the regret of the estimated policy converges
to its optimum depends on the estimation rate of the response to the treatments which were observed in the
data, and hence is an asymptotic rather than finite-sample convergence guarantee. In the case of discrete
covariates, or more generally parametric rates of convergence for estimated treatment effects, the rate of
convergence of maximum regret is N1/2. Otherwise, maximum regret converges at the nonparametric rate.
I apply the method to data from Lee, Miguel, and Wolfram (2020b), in which households in rural Kenya
were offered one of four prices in 0, 15, 25, or 35 thousand shillings to connect to the electrical grid. I
consider a decision maker able to offer prices in increments of 2.5 thousand shillings based on household size
and income. This represents a much richer set of fifteen possible treatments, allowing for finer targeting
of personalized prices to optimize the cost-effectiveness of the subsidy program. To bound the takeup at
these new prices, I assume demand is downward sloping and convex. The estimated minimax regret optimal
policy assigns prices that were not implemented in the experiment to over 97% of the population, illustrating
that constraining the decision maker to treatments that appear in the experimental pilot data can result in
suboptimal decisions.
3
1.a Related Literature
This paper contributes to a growing literature on statistical treatment rules in econometrics beginning with
Manski (2004) and Kitagawa and Tetenov (2018), which introduced the now-common empirical welfare
maximization framework. I follow a similar strategy of constructing an empirical analog of the population
objective, but seek to minimize the worst-case regret that can occur within the identified set of treatment
response.
Forecasting the effects of treatments or policies never before observed in the data is a fundamental goal
of econometrics, especially when applied as a guide for public policy (see Heckman and Vytlacil (2007) and
Manski (2021) for a deep discussion, including a historical overview). Nonetheless, the recent literature on
policy learning and treatment choice has generally not considered the introduction of new treatments with
partially identified effects. A contemporaneous exception is Manski (2023), which studies policies which
change the dosage of a vaccine to levels not observed in the data, but does not consider statistical properties
of estimated decision rules.
Partial identification has appeared in policy learning and related decision problems in contexts other
than consideration of new treatments; examples include Ben-Michael et al. (2021), Christensen, Moon, and
Schorfheide (2022), D’Adamo (2021), Kallus and Zhou (2021), Manski (2006), Manski (2010), Pu and Zhang
(2021), Russell (2020), Stoye (2012), and Yata (2021), Zhang, Ben-Michael, and Imai (2022). Ben-Michael
et al. (2021) considers that the effects of new policies may be partially identified when historical data is
generated by a deterministic policy, violating the common assumption of strong overlap. Kallus and Zhou
(2021) studies policy learning when the effect of a binary treatment is partially identified due to unobserved
confounding, and proposes algorithms that aim to guarantee improvement relative to a baseline policy.
The present work differs not only in that the source of partial identification is new treatments instead of
unobserved confounding, but also in that I focus on minimax regret as opposed to regret relative to a baseline.
The policy resulting from a minimax regret approach will recommend new treatments more often since the
minimax regret criterion considers losses relative to the optimal policy in each state of the world.
D’Adamo (2021) studies policy learning with a binary treatment where the conditional average treatment
effect is identified up to a rectangular set, meaning it is characterized by bounds which depend only on the
covariate value. In contrast, shape restrictions generally yield nonrectangular identified sets. This leads to
difficulties when estimating the optimal policy in my setting because the bounds I identify do not in general
admit a closed form. However, the extra effort proves valuable in the empirical example of Section 5, where
I find that the non-closed form characterization of the identified set using shape restrictions ends up being
substantially more informative than pointwise bounds would be for calculating regret.
4
Many of the previously mentioned works are concerned with binary treatments, while I am concerned
with multivalued treatments. Zhou, Athey, and Wager (2018) and Kallus and Zhou (2018) consider policy
learning with multivalued treatments and continuous treatments, respectively, but in point-identified settings
where all possible treatment values are implemented in the experiment. Yata (2021) studies a binary decision
between two policies which may not concern the assignment of a binary treatment. Additionally, the new
policy may have partially identified effects. The minimax regret rule is derived for a general class of decision
rules and applied to the problem of changing the eligibility cutoff for a treatment. The decision problem
and assumptions of Yata (2021) and the present paper differ, yet the broad goal of choosing amongst new
policies with partially identified effects make the two complimentary.
Athey and Wager (2021) extends policy learning to observational studies where exogeneity of treatment
only holds after conditioning on high-dimensional covariates. In contrast, I am motivated by settings in
which decision makers have data from a pilot experiment which tested a few treatment values. When this
is the case, estimating the effects of policies involving new treatments only requires conditioning on the set
of covariates used in the treatment rule, which is typically low-dimensional due to exogenous constraints on
the policy class (Kitagawa and Tetenov 2018). Athey and Wager (2021) also considers infinitesimal, local
changes to treatment values; however, I consider new treatments that are sufficiently far from the support
of the data as to make local approximations or parametric extrapolations unreliable, necessitating a partial
identification approach.
An alternative to the plug-in approach used in this paper and common in policy learning is to average
across the parameter space according to some distribution. Christensen, Moon, and Schorfheide (2022)
study optimal decisions in a discrete set under partial identification where Bayes rules and the bootstrap
distribution are used to average over the space of identified parameters, while a minimax approach is taken
over the partially identified parameters. An important finding is that plug-in-rules may be dominated in
the asymptotic limit experiment. See Hirano and Porter (2009) and Hirano and Porter (2020) for further
discussion of asymptotic optimality of statistical treatment rules.
The rest of the article is organized as follows: Section 2 describes the decision problem in the population
and shows how to incorporate information from shape restrictions. Section 3 describes the empirical minimax
regret problem and the algorithm for estimating the optimal policy. Section 4 describes the convergence
guarantees. Section 5 applies the method to study personalized subsidies to connect to the electrical grid in
rural Kenya.
5
摘要:

PolicyLearningwithNewTreatmentsSamuelHigbee∗Friday29thSeptember,2023AbstractIstudytheproblemofadecisionmakerchoosingapolicywhichallocatestreatmenttoaheterogeneouspopulationonthebasisofexperimentaldatathatincludesonlyasubsetofpossibletreatmentvalues.Theeffectsofnewtreatmentsarepartiallyidentifiedbysh...

展开>> 收起<<
Policy Learning with New Treatments Samuel Higbee Friday 29thSeptember 2023.pdf

共41页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:41 页 大小:972.2KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 41
客服
关注