Overlap matching or entropy weights what are we weighting for Roland A. Matsouaka12 Yi Liu1 Yunji Zhou1

2025-04-29 0 0 3.08MB 44 页 10玖币

侵权投诉

Overlap, matching, or entropy weights:

what are we weighting for?

Roland A. Matsouaka1,2,∗, Yi Liu1, Yunji Zhou1

1Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA

2Program for Comparative Eﬀectiveness Methodology, Duke Clinical Research Institute, Durham, NC, USA∗

Abstract

There has been a recent surge in statistical methods for handling the lack of adequate positivity

when using inverse probability weights (IPW). However, these nascent developments have raised a

number of questions. Thus, we demonstrate the ability of equipoise estimators (overlap, matching,

and entropy weights) to handle the lack of positivity. Compared to IPW, the equipoise estimators

have been shown to be ﬂexible and easy to interpret. However, promoting their wide use requires

that researchers know clearly why, when to apply them and what to expect.

In this paper, we provide the rationale to use these estimators to achieve robust results. We

speciﬁcally look into the impact imbalances in treatment allocation can have on the positivity and,

ultimately, on the estimates of the treatment eﬀect. We zero into the typical pitfalls of the IPW

estimator and its relationship with the estimators of the average treatment eﬀect on the treated (ATT)

and on the controls (ATC). Furthermore, we also compare IPW trimming to the equipoise estimators.

We focus particularly on two key points: What fundamentally distinguishes their estimands? When

should we expect similar results? Our ﬁndings are illustrated through Monte-Carlo simulation studies

and a data example on healthcare expenditure.

Keywords : Positivity; propensity scores; equipoise; overlap weights; matching weights, entropy weights.

∗Corresponding author: Roland A. Matsouaka; roland.matsouaka@duke.edu

arXiv:2210.12968v3 [stat.ME] 1 Feb 2024

1 Introduction

To assess the eﬀect of a new treatment regimen (Z= 1) over a standard (or control) treatment (Z= 0) based on

data from an observational study, using causal identiﬁcation a number of assumptions must be made, including

the positivity assumption. For instance, to estimate the average treatment eﬀect (ATE), this assumption requires

0< e(x)<1, where e(x) = P(Z= 1|X=x) is the propensity score (PS), i.e., the probability of treatment

assignment, given the vector of baseline covariates X(Rosenbaum and Rubin,1983;Rubin,1997). The positivity

assumption ensures that the distributions of the related baseline covariates have a good overlap and hence a good

common support (Petersen et al.,2012;Li et al.,2018b).

The inverse probability weighting (IPW) estimator for ATE assigns to study participants weights that are

inversely proportional to their respective PSs. Thus, IPW creates a pseudo-population of participants, corrects for

observed covariates distributions imbalances between the treatment groups, and adjusts for (measured) confound-

ing bias inherent to most non-randomized studies. Nevertheless, when PSs are equal to (or near) 0 or 1, there is

violation (or near violation) of the positivity assumption, which we often refer to as lack of adequate positivity

(Petersen et al.,2012). Violations (or near violations) of the positivity assumption occur either at random (or

stochastically), i.e., by chance due the data (or underlying model) characteristics or when some subgroups of

participants can never (or barely) receive one of the treatment options under study. This can lead to moderate or

even poor overlap of the distributions of the PSs and may result in large IPW weights, especially when the ratio

[e(x)(1 −e(x))]−1is highly variable (Li and Greene,2013;Zhou et al.,2020b). As such, IPW may put a large

amount of weights on a small number of observation, which can unduly inﬂuence the estimation of the treatment

eﬀect.

While violations of the positivity assumption can be remedied by either PS trimming or truncation, recent

advancements have introduced methods that aim to overcome the limitations of these ad hoc solutions. Some of

these novel methods propose bias-corrected estimators (Chaudhuri and Hill,2014;Ma and Wang,2020;Sasaki

and Ura,2022), while other reparametrize the PS estimation, by adding a priori covariate balancing constraints

to modify the PS model (Graham et al.,2012;Imai and Ratkovic,2014). Some consider direct optimization

techniques to derive sample weights under covariate constraints (Hainmueller,2012;Zubizarreta,2015;Wong and

Chan,2017;Hirshberg and Zubizarreta,2017) or redeﬁne the target population altogether and bypass the need

to account for the lack of positivity (Li et al.,2018a;Matsouaka and Zhou,2020;Zhou et al.,2020b).

1.1 The positivity assumption and propensity score weighting methods

The literature deﬁnes two speciﬁc violations of the positivity assumption: random (i.e., by chance) and structural

violations (Westreich,2019;Petersen et al.,2012). Random (or stochastic) violation of the positivity assumptions

arise by happenstance, e.g., when the sample size is small or the PS model is misspeciﬁed. In such cases, increased

sample size, bias-corrected IPW trimming, PS reparameterization or direct optimization oﬀer better alternatives

to estimate ATE (Chaudhuri and Hill,2014;Ma and Wang,2020;Sasaki and Ura,2022). Alternatively, methods

for equipoise treatment eﬀect, i.e., the overlap weight (OW), matching weight (MW), and Shannon’s entropy

weight (EW) estimators (Matsouaka and Zhou,2020;Li et al.,2018b), can also be considered. These estimators

target treatment eﬀects deﬁned within the subgroup of participants for whom treatment equipoise exists.

As noted by Petersen et al. (2012), violations of the positivity assumption can lead to substantial bias and

sometimes an increased variance of the causal eﬀect estimator. While checking the PS distributions (or the PS

weights) between treatment groups can help assess such violations, it is important to recognize that well-behaved

weights alone may not guarantee the satisfaction of the positivity assumption (Ma and Wang,2020;Petersen

et al.,2012). A better investigation into violations of the positivity assumption must always be preceded by an

expert-knowledge elicitation of the data at hand, the scientiﬁc questions as well as the source and the nature of

the data at hand.

1.2 The positivity assumption and imbalance in treatment allocations

Correct estimation of the treatment eﬀect is challenging when treatment (or exposure) allocation is rare (Pirrac-

chio et al.,2012;Rudolph et al.,2022;Hajage et al.,2016). Nevertheless, assessment of treatment eﬀects with

small proportion of treated participants is a common occurrence, particularly in pharmacoepidemiologic obser-

vational studies of drugs (Hajage et al.,2016;Platt et al.,2012). The evaluation of the risk-beneﬁt proﬁle of a

newly released drug is often conducted using observational studies where data on the eﬀectiveness and safety of

the drug are collected during routine care (Schneeweiss,2007;Rassen and Schneeweiss,2012). Schneeweiss et al.

(2011) provide an example in the comparative eﬀectiveness of newly marketed medications, which presents addi-

tional challenges. These challenges include potential bias due to patient channeling toward the newly marketed

medication (due to patient, provider, and system related factors), shifts in the user population (due to varying

background characteristics and comorbidities), timely data availability issues, and a smaller number of users in

the initial months of marketing. As Schneeweiss et al. (2011) indicated, “Of these challenges, channeling is often

the biggest threat to the validity of nonrandomized studies. . . ” Therefore, there is a pressing need for the use

and development of sound statistical methods that also aim at consistent and robust estimation of the treatment

eﬀects when the lack of positivity is expected or unavoidable.

While some authors have investigated the use of PS methods when the proportion of participant is small,

their focus has primarily been on traditional PS methods, overlooking alternative methods that are well suited

for lack of positivity. These alternatives go beyond the traditional use of PS matching, truncation, or trimming

(Hajage et al.,2016;Franklin et al.,2017;Austin,2011).

Causal inference, being inherently a missing data problem (Holland,1986), we often overlook the fundamental

task of any causal estimator: to use the available data to adequately input unknown potential outcome values. For

IPW, this means weighting participants to create a pseudo-population where causal inference can be drawn. When

the treatment allocation is imbalanced and extreme weights emerge, estimationn and inference of the treatment

eﬀects relies heavily on a few participants with extremely large inverse probability weights, which can introduce

severe bias due to data disparity. Therefore, regardless of whether violations of the positivity assumption are

structural or not, it is crucial to ensure that the estimated treatment eﬀects are disproportionally driven by a

small number of outlying participants, especially if there is a substantial treatment allocation imbalance. For

instance, in a tutorial for PS analysis, Austin uses a sample of current smokers discharged alive from a hospital

following an acute myocardial infarction (Austin,2011). What is remarkable in this paper are the small proportion

(32.20%) of patients who did not beneﬁt from in-patient smoking cessation counseling, the wide range of estimated

PSs, the presence of a few extreme weights, and the results on the 3-year survival outcomes (binary and time-

to-event). Thus, reading this paper, we can’t help but raise some questions. Were the diﬀerent conclusions

drawn were solely due to methodological diﬀerences? In fact, some of these methods showed a signiﬁcant reduced

risk of mortality, while others just indicated that the treatment eﬀect was not diﬀerent from the null. Could

the discrepancy in the study conclusion be solely driven by the diﬀerences in the selected methods and their

underlying estimands? Did the imbalance in the number of participants between the two treatment groups play

also a role? As we will demonstrate in this paper, we believe both the choice of speciﬁc methods and the imbalance

in treatment allocation play a preeminent role.

1.3 How about trimming or truncating extreme weights?

The IPW estimator of the ATE targets E[B/A], with B= (Z−e(X))Yand A=e(X)(1 −e(X)).When there is

a violation of the positivity assumption, some observations have A≈0, which can have unduly inﬂuences on the

na¨ıve sample mean of B/A. The primary objective of trimming (i.e., dropping participants) and truncation (i.e.,

capping weights) is to curtail such undue inﬂuences and provide a stable estimator. Trimming (or truncating)

participants with A≈0 (above given thresholds) is a common practice, as it eﬀectively constraints the weights

within reasonable bounds. However, the resulting estimate is often highly sensitive to the choice of on the

threshold(s) (Chaudhuri and Hill,2014;Ma and Wang,2020;Sasaki and Ura,2022). Unfortunately, the choice

of a threshold is often ad hoc and subjective as they rest solely on the user’s discretion (Crump et al.,2006).

In many applications, a user-selected threshold can drastically change the number of participants we discard

or for whom we curtail the weights (see, for instance, (Zhou et al.,2020b)). This tremendously aﬀects the

ﬁnite-sample performance of the estimator, inﬂuencing both its bias and eﬃciency. Moreover, the corresponding

estimator may not target the ATE based on the original population since, for instance, under structural violation

of the positivity assumption they both can alter their target estimands and the underlying populations of interest,

depending on the threshold considered (Chaudhuri and Hill,2014;Zhou et al.,2020b). For example, ATE trimming

by a threshold α∈(0,0.5) shifts its target population to the population of participants whose PS (of receiving

either treatment or control) is inside the interval (α, 1−α). Often, standard trimming and truncation method

result to a non-negligible bias (even asymptotically) when estimating ATE, which may have some inference

implications.

Landmark bias-correction strategies for trimming have been proposed. Chaudhuri and Hill (2014) proposed a

bias-corrected, tailed-trimmed IPW estimator of the ATE, based on the tail behavior of |B/A|. Their estimator

is robust and asymptotically valid, even under substantial limited over of the PS distributions. Rather than

trimming on the ratio B/A,Ma and Wang (2020) and Sasaki and Ura (2022) considered trimming observations

with A≈0 to build ﬂexible, biased-corrected estimators that allow for larger trimming and hence smaller variances

or faster rates of convergence. Robustness of the estimator by Ma and Wang (2020) is achieved by combining

resampling with a local polynomial-based bias-correction technique, where a data-driven threshold is selected by

minimizing the mean squared error. Sasaki and Ura (2022), on the other hand, leverage the smoothness of the

conditional moment function a7→ E[B|A=a] to achieve more robust inference and faster convergence rate.

The above strategies (implicitly) assume random violation of the positivity assumption, under which true

ATE exist and can be point estimated; their respective trimming and bias correction solutions aim at improving

inference. Unlike these papers, our proposed estimators do not even rely on the positivity assumption; thus,

they are applicable to either random or structural violation of the positivity assumption. In addition, they

automatically focus on the area of common support, identify a speciﬁc subgroup of participants where the estimate

has a strong internal validity, without involving the outcome Y.

Furthermore, when there is an important imbalance in treatment allocation (i.e., the proportion of participants

in one of the treatment groups is small), if often exacerbates the lack of adequate positivity, which can lead to more

trimming or truncation in one group instead of both. Such practices not only reduce the number of participants

in the ﬁnal sample (after trimming), but also inﬂuence the contribution of those from whom extreme weights are

capped (by truncation). This even further complicates point estimation and inference when structural violations

of the positivity assumption are expected. Therefore, there is a growing interest for new practical methods that

do not leave room to manually and subjectively pick a threshold; methods that can leverage inherent data-driven

mechanisms to control the impact of extreme weights and provide robust assessments of treatment eﬀects.

1.4 Are there better alternatives?

The overlap weight (OW), matching weight (MW), and Shannon’s entropy weight (EW) estimators (hereafter

referred to as equipoise treatment eﬀect estimators (Matsouaka and Zhou,2020)) eﬀectively circumvent the lack of

positivity without specifying any user-driven threshold. Besides, they provide both better causal estimations and

higher eﬀective sample sizes (Li and Greene,2013;Li et al.,2018a;Zhou et al.,2020b;Li and Li,2021). However,

it remains to see whether imbalances in treatment allocation can directly aﬀect their estimations (compared to

IPW estimation of ATE) and to what extent.

Therefore, the main objective of this paper is to provide a formal assessment of the impact of equipoise

estimators (i.e., OW, MW, EW) on the treatment eﬀect estimation in studies where there is a disproportionate

distribution of treated or control participants in the population and how it relates to the lack of positivity. The

title of our paper: “Overlap, matching and entropy weights: what are we weighting for?” is thus a call to action,

i.e., to delve into the unique characteristics of these equipoise estimators, which grant them the ﬂexibility to

address the lack of positivity eﬀectively (Zhou et al.,2020b). In the process, we showcase how OW, MW, and

EW methods can be strategically used to estimate treatment eﬀects and make asymptotically correct inferences

of the corresponding estimators, when there is a violation of the positivity assumption.

The rest of the paper is organized as follows. We start in the next section with key questions that help

deﬁne the purpose of our study and what we intend to accomplish. Then, in Section 3, we introduce notations

and present the family of balancing weights. We specify their related estimands and deﬁne the corresponding

estimators. Of particular interest are questions related to what is being estimated and what the target populations

are when using these balancing weights? Next, we explore how the estimators are impacted by the proportion of

treated participants and provide proper interpretations of their estimates. The illustrative example in Section 2.1

sets the scene for the main idea of this paper and informs the simulations in Section 4.

We evaluate the performance of the estimators using Monte-Carlo simulation studies in Section 4, covering

three diﬀerent treatment allocations, under various treatment eﬀects and model speciﬁcations. The methods are

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Overlap,matching,orentropyweights:whatareweweightingfor?RolandA.Matsouaka1,2,∗,YiLiu1,YunjiZhou11DepartmentofBiostatisticsandBioinformatics,DukeUniversity,Durham,NC,USA2ProgramforComparativeEffectivenessMethodology,DukeClinicalResearchInstitute,Durham,NC,USA∗AbstractTherehasbeenarecentsurgeinstatist...

展开>> 收起<<

Overlap matching or entropy weights what are we weighting for Roland A. Matsouaka12 Yi Liu1 Yunji Zhou1.pdf

共44页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Overlap matching or entropy weights what are we weighting for Roland A. Matsouaka12 Yi Liu1 Yunji Zhou1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: