Policy Learning with New Treatments Samuel Higbee Friday 29thSeptember 2023

2025-05-02 0 0 972.2KB 41 页 10玖币

侵权投诉

Policy Learning with New Treatments

Samuel Higbee∗

Friday 29th September, 2023

Abstract

I study the problem of a decision maker choosing a policy which allocates treatment to a heterogeneous

population on the basis of experimental data that includes only a subset of possible treatment values.

The eﬀects of new treatments are partially identiﬁed by shape restrictions on treatment response. Policies

are compared according to the minimax regret criterion, and I show that the empirical analog of the

population decision problem has a tractable linear- and integer-programming formulation. I prove the

maximum regret of the estimated policy converges to the lowest possible maximum regret at a rate which

is the maximum of N−1/2and the rate at which conditional average treatment eﬀects are estimated in

the experimental data. I apply my results to design targeted subsidies for electrical grid connections in

rural Kenya, and estimate that 97% of the population should be given a treatment not implemented in

the experiment.

∗Department of Economics, University of Chicago. Email: samuelhigbee@uchicago.edu. I am grateful to Max Tabord-

Meehan and Alex Torgovitsky for helpful feedback and guidance for this paper. I would also like to thank seminar participants

at the University of Chicago for helpful comments.

arXiv:2210.04703v2 [econ.EM] 27 Sep 2023

1 Introduction

Heterogeneous treatment eﬀects are often estimated with a decision problem in mind— should a particular

individual be treated? This question has fostered much research in econometrics, statistics, and machine

learning. However, relatively less attention has been given to another important margin of the decision—

should the treatment itself be adjusted? Whether the treatment is a medical treatment, subsidy, job training,

or audit probability, decision makers can usually entertain changing the treatment value that was observed in

the data. Even experiments with multivalued treatments may not implement an exhaustive list of treatment

values. This is especially true in the social sciences, where testing multiple interventions can be costly, and

in the medical sciences, where speciﬁc treatment doses are often tested in clinical trials. In this paper I

propose a method for allocating treatment to a population when the treatment values themselves can be

adjusted to values never before seen in the data. I show how combining the data on existing treatments

with economically motivated shape restrictions can be used to design policies that outperform those possible

when only previously implemented treatments are considered.

I ﬁrst formulate a decision problem in which the decision maker observes experimental data on some

treatment values and seeks to construct a mapping, or policy, from the space of covariates to the space

of treatments in order to maximize some objective function. I assume all experimentation is done before

the policy is constructed. This setting, which is common in econometrics, is often referred to as treatment

choice or oﬄine policy learning (examples include Athey and Wager (2021), Bhattacharya and Dupas (2012),

Kitagawa and Tetenov (2018) and other examples mentioned in the literature review thereof, Liu (2022),

Mbakop and Tabord-Meehan (2021), Qian and Murphy (2011), Sasaki and Ura (2020), Zhang et al. (2012),

Zhao et al. (2012)). A distinctive feature of this paper as opposed to most policy learning problems is

that the set of treatments that the decision maker can consider may be a strict superset of the support of

the treatment random variable observed in the data. This extends policy learning to practically relevant

situations in which constraints in the design and implementation of experiments or simply diﬀerences in the

objectives of the experimenter versus decision maker result in only a few treatment values being piloted in

the experiment, while the decision maker may want to consider many more.

Despite the lack of data on the impacts of these never-before-implemented treatments, I show how to

bound the response to new treatments using simple, economically interpretable restrictions on the shape of

treatment response. For example, a ﬁnancial incentive may be assumed to have a positive eﬀect, exhibit

diminishing returns, or satisfy smoothness conditions. Such shape restrictions are often exploited to partially

identify treatment eﬀects (e.g. Manski 2009, Mogstad, Santos, and Torgovitsky 2018). The empirical analysis

of the present paper demonstrates that such bounds can be adequately informative for choosing whether and

how to implement new treatment values. Based on these bounds, I construct a population decision problem

to choose which treatment to assign to each covariate value. I use the minimax regret criterion to evaluate

treatment choice under partial identiﬁcation following Manski (2007).

As in Manski (2004), Kitagawa and Tetenov (2018) and the subsequent literature on empirical welfare

maximization methods, I propose a decision rule based on solving the empirical analog of the decision

problem as a surrogate for the infeasible population objective. The resulting empirical minimax regret

estimator is constructed by minimizing maximum regret across an estimate of the partially identiﬁed set

of treatment response functions. In this way, the resulting policy is robust to model ambiguity induced

by introducing new treatments. Despite involving nested, non-closed form optimization problems which

characterize the identiﬁed set for treatment response, I show how the optimal policy can be computed using

the same linear and integer programming tools common in the policy learning literature. The estimator is

thus computationally feasible and can be implemented by widely available software.

I show that the proposed decision rule posesses desirable regret properties. The maximum regret obtained

under the estimated policy converges to the smallest possible maximum regret that the decision maker could

have achieved in the absence of sampling uncertainty– that is, if the population identiﬁed set were observed–

uniformly across a set of data distributions. The rate at which the regret of the estimated policy converges

to its optimum depends on the estimation rate of the response to the treatments which were observed in the

data, and hence is an asymptotic rather than ﬁnite-sample convergence guarantee. In the case of discrete

covariates, or more generally parametric rates of convergence for estimated treatment eﬀects, the rate of

convergence of maximum regret is N−1/2. Otherwise, maximum regret converges at the nonparametric rate.

I apply the method to data from Lee, Miguel, and Wolfram (2020b), in which households in rural Kenya

were oﬀered one of four prices in 0, 15, 25, or 35 thousand shillings to connect to the electrical grid. I

consider a decision maker able to oﬀer prices in increments of 2.5 thousand shillings based on household size

and income. This represents a much richer set of ﬁfteen possible treatments, allowing for ﬁner targeting

of personalized prices to optimize the cost-eﬀectiveness of the subsidy program. To bound the takeup at

these new prices, I assume demand is downward sloping and convex. The estimated minimax regret optimal

policy assigns prices that were not implemented in the experiment to over 97% of the population, illustrating

that constraining the decision maker to treatments that appear in the experimental pilot data can result in

suboptimal decisions.

1.a Related Literature

This paper contributes to a growing literature on statistical treatment rules in econometrics beginning with

Manski (2004) and Kitagawa and Tetenov (2018), which introduced the now-common empirical welfare

maximization framework. I follow a similar strategy of constructing an empirical analog of the population

objective, but seek to minimize the worst-case regret that can occur within the identiﬁed set of treatment

response.

Forecasting the eﬀects of treatments or policies never before observed in the data is a fundamental goal

of econometrics, especially when applied as a guide for public policy (see Heckman and Vytlacil (2007) and

Manski (2021) for a deep discussion, including a historical overview). Nonetheless, the recent literature on

policy learning and treatment choice has generally not considered the introduction of new treatments with

partially identiﬁed eﬀects. A contemporaneous exception is Manski (2023), which studies policies which

change the dosage of a vaccine to levels not observed in the data, but does not consider statistical properties

of estimated decision rules.

Partial identiﬁcation has appeared in policy learning and related decision problems in contexts other

than consideration of new treatments; examples include Ben-Michael et al. (2021), Christensen, Moon, and

Schorfheide (2022), D’Adamo (2021), Kallus and Zhou (2021), Manski (2006), Manski (2010), Pu and Zhang

(2021), Russell (2020), Stoye (2012), and Yata (2021), Zhang, Ben-Michael, and Imai (2022). Ben-Michael

et al. (2021) considers that the eﬀects of new policies may be partially identiﬁed when historical data is

generated by a deterministic policy, violating the common assumption of strong overlap. Kallus and Zhou

(2021) studies policy learning when the eﬀect of a binary treatment is partially identiﬁed due to unobserved

confounding, and proposes algorithms that aim to guarantee improvement relative to a baseline policy.

The present work diﬀers not only in that the source of partial identiﬁcation is new treatments instead of

unobserved confounding, but also in that I focus on minimax regret as opposed to regret relative to a baseline.

The policy resulting from a minimax regret approach will recommend new treatments more often since the

minimax regret criterion considers losses relative to the optimal policy in each state of the world.

D’Adamo (2021) studies policy learning with a binary treatment where the conditional average treatment

eﬀect is identiﬁed up to a rectangular set, meaning it is characterized by bounds which depend only on the

covariate value. In contrast, shape restrictions generally yield nonrectangular identiﬁed sets. This leads to

diﬃculties when estimating the optimal policy in my setting because the bounds I identify do not in general

admit a closed form. However, the extra eﬀort proves valuable in the empirical example of Section 5, where

I ﬁnd that the non-closed form characterization of the identiﬁed set using shape restrictions ends up being

substantially more informative than pointwise bounds would be for calculating regret.

Many of the previously mentioned works are concerned with binary treatments, while I am concerned

with multivalued treatments. Zhou, Athey, and Wager (2018) and Kallus and Zhou (2018) consider policy

learning with multivalued treatments and continuous treatments, respectively, but in point-identiﬁed settings

where all possible treatment values are implemented in the experiment. Yata (2021) studies a binary decision

between two policies which may not concern the assignment of a binary treatment. Additionally, the new

policy may have partially identiﬁed eﬀects. The minimax regret rule is derived for a general class of decision

rules and applied to the problem of changing the eligibility cutoﬀ for a treatment. The decision problem

and assumptions of Yata (2021) and the present paper diﬀer, yet the broad goal of choosing amongst new

policies with partially identiﬁed eﬀects make the two complimentary.

Athey and Wager (2021) extends policy learning to observational studies where exogeneity of treatment

only holds after conditioning on high-dimensional covariates. In contrast, I am motivated by settings in

which decision makers have data from a pilot experiment which tested a few treatment values. When this

is the case, estimating the eﬀects of policies involving new treatments only requires conditioning on the set

of covariates used in the treatment rule, which is typically low-dimensional due to exogenous constraints on

the policy class (Kitagawa and Tetenov 2018). Athey and Wager (2021) also considers inﬁnitesimal, local

changes to treatment values; however, I consider new treatments that are suﬃciently far from the support

of the data as to make local approximations or parametric extrapolations unreliable, necessitating a partial

identiﬁcation approach.

An alternative to the plug-in approach used in this paper and common in policy learning is to average

across the parameter space according to some distribution. Christensen, Moon, and Schorfheide (2022)

study optimal decisions in a discrete set under partial identiﬁcation where Bayes rules and the bootstrap

distribution are used to average over the space of identiﬁed parameters, while a minimax approach is taken

over the partially identiﬁed parameters. An important ﬁnding is that plug-in-rules may be dominated in

the asymptotic limit experiment. See Hirano and Porter (2009) and Hirano and Porter (2020) for further

discussion of asymptotic optimality of statistical treatment rules.

The rest of the article is organized as follows: Section 2 describes the decision problem in the population

and shows how to incorporate information from shape restrictions. Section 3 describes the empirical minimax

regret problem and the algorithm for estimating the optimal policy. Section 4 describes the convergence

guarantees. Section 5 applies the method to study personalized subsidies to connect to the electrical grid in

rural Kenya.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PolicyLearningwithNewTreatmentsSamuelHigbee∗Friday29thSeptember,2023AbstractIstudytheproblemofadecisionmakerchoosingapolicywhichallocatestreatmenttoaheterogeneouspopulationonthebasisofexperimentaldatathatincludesonlyasubsetofpossibletreatmentvalues.Theeffectsofnewtreatmentsarepartiallyidentifiedbysh...

展开>> 收起<<

Policy Learning with New Treatments Samuel Higbee Friday 29thSeptember 2023.pdf

共41页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Policy Learning with New Treatments Samuel Higbee Friday 29thSeptember 2023

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: