Overlap matching or entropy weights what are we weighting for Roland A. Matsouaka12 Yi Liu1 Yunji Zhou1

2025-04-29 0 0 3.08MB 44 页 10玖币
侵权投诉
Overlap, matching, or entropy weights:
what are we weighting for?
Roland A. Matsouaka1,2,, Yi Liu1, Yunji Zhou1
1Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
2Program for Comparative Effectiveness Methodology, Duke Clinical Research Institute, Durham, NC, USA
Abstract
There has been a recent surge in statistical methods for handling the lack of adequate positivity
when using inverse probability weights (IPW). However, these nascent developments have raised a
number of questions. Thus, we demonstrate the ability of equipoise estimators (overlap, matching,
and entropy weights) to handle the lack of positivity. Compared to IPW, the equipoise estimators
have been shown to be flexible and easy to interpret. However, promoting their wide use requires
that researchers know clearly why, when to apply them and what to expect.
In this paper, we provide the rationale to use these estimators to achieve robust results. We
specifically look into the impact imbalances in treatment allocation can have on the positivity and,
ultimately, on the estimates of the treatment effect. We zero into the typical pitfalls of the IPW
estimator and its relationship with the estimators of the average treatment effect on the treated (ATT)
and on the controls (ATC). Furthermore, we also compare IPW trimming to the equipoise estimators.
We focus particularly on two key points: What fundamentally distinguishes their estimands? When
should we expect similar results? Our findings are illustrated through Monte-Carlo simulation studies
and a data example on healthcare expenditure.
Keywords : Positivity; propensity scores; equipoise; overlap weights; matching weights, entropy weights.
Corresponding author: Roland A. Matsouaka; roland.matsouaka@duke.edu
1
arXiv:2210.12968v3 [stat.ME] 1 Feb 2024
1 Introduction
To assess the effect of a new treatment regimen (Z= 1) over a standard (or control) treatment (Z= 0) based on
data from an observational study, using causal identification a number of assumptions must be made, including
the positivity assumption. For instance, to estimate the average treatment effect (ATE), this assumption requires
0< e(x)<1, where e(x) = P(Z= 1|X=x) is the propensity score (PS), i.e., the probability of treatment
assignment, given the vector of baseline covariates X(Rosenbaum and Rubin,1983;Rubin,1997). The positivity
assumption ensures that the distributions of the related baseline covariates have a good overlap and hence a good
common support (Petersen et al.,2012;Li et al.,2018b).
The inverse probability weighting (IPW) estimator for ATE assigns to study participants weights that are
inversely proportional to their respective PSs. Thus, IPW creates a pseudo-population of participants, corrects for
observed covariates distributions imbalances between the treatment groups, and adjusts for (measured) confound-
ing bias inherent to most non-randomized studies. Nevertheless, when PSs are equal to (or near) 0 or 1, there is
violation (or near violation) of the positivity assumption, which we often refer to as lack of adequate positivity
(Petersen et al.,2012). Violations (or near violations) of the positivity assumption occur either at random (or
stochastically), i.e., by chance due the data (or underlying model) characteristics or when some subgroups of
participants can never (or barely) receive one of the treatment options under study. This can lead to moderate or
even poor overlap of the distributions of the PSs and may result in large IPW weights, especially when the ratio
[e(x)(1 e(x))]1is highly variable (Li and Greene,2013;Zhou et al.,2020b). As such, IPW may put a large
amount of weights on a small number of observation, which can unduly influence the estimation of the treatment
effect.
While violations of the positivity assumption can be remedied by either PS trimming or truncation, recent
advancements have introduced methods that aim to overcome the limitations of these ad hoc solutions. Some of
these novel methods propose bias-corrected estimators (Chaudhuri and Hill,2014;Ma and Wang,2020;Sasaki
and Ura,2022), while other reparametrize the PS estimation, by adding a priori covariate balancing constraints
to modify the PS model (Graham et al.,2012;Imai and Ratkovic,2014). Some consider direct optimization
techniques to derive sample weights under covariate constraints (Hainmueller,2012;Zubizarreta,2015;Wong and
Chan,2017;Hirshberg and Zubizarreta,2017) or redefine the target population altogether and bypass the need
to account for the lack of positivity (Li et al.,2018a;Matsouaka and Zhou,2020;Zhou et al.,2020b).
1.1 The positivity assumption and propensity score weighting methods
The literature defines two specific violations of the positivity assumption: random (i.e., by chance) and structural
violations (Westreich,2019;Petersen et al.,2012). Random (or stochastic) violation of the positivity assumptions
arise by happenstance, e.g., when the sample size is small or the PS model is misspecified. In such cases, increased
sample size, bias-corrected IPW trimming, PS reparameterization or direct optimization offer better alternatives
to estimate ATE (Chaudhuri and Hill,2014;Ma and Wang,2020;Sasaki and Ura,2022). Alternatively, methods
for equipoise treatment effect, i.e., the overlap weight (OW), matching weight (MW), and Shannon’s entropy
weight (EW) estimators (Matsouaka and Zhou,2020;Li et al.,2018b), can also be considered. These estimators
target treatment effects defined within the subgroup of participants for whom treatment equipoise exists.
2
As noted by Petersen et al. (2012), violations of the positivity assumption can lead to substantial bias and
sometimes an increased variance of the causal effect estimator. While checking the PS distributions (or the PS
weights) between treatment groups can help assess such violations, it is important to recognize that well-behaved
weights alone may not guarantee the satisfaction of the positivity assumption (Ma and Wang,2020;Petersen
et al.,2012). A better investigation into violations of the positivity assumption must always be preceded by an
expert-knowledge elicitation of the data at hand, the scientific questions as well as the source and the nature of
the data at hand.
1.2 The positivity assumption and imbalance in treatment allocations
Correct estimation of the treatment effect is challenging when treatment (or exposure) allocation is rare (Pirrac-
chio et al.,2012;Rudolph et al.,2022;Hajage et al.,2016). Nevertheless, assessment of treatment effects with
small proportion of treated participants is a common occurrence, particularly in pharmacoepidemiologic obser-
vational studies of drugs (Hajage et al.,2016;Platt et al.,2012). The evaluation of the risk-benefit profile of a
newly released drug is often conducted using observational studies where data on the effectiveness and safety of
the drug are collected during routine care (Schneeweiss,2007;Rassen and Schneeweiss,2012). Schneeweiss et al.
(2011) provide an example in the comparative effectiveness of newly marketed medications, which presents addi-
tional challenges. These challenges include potential bias due to patient channeling toward the newly marketed
medication (due to patient, provider, and system related factors), shifts in the user population (due to varying
background characteristics and comorbidities), timely data availability issues, and a smaller number of users in
the initial months of marketing. As Schneeweiss et al. (2011) indicated, “Of these challenges, channeling is often
the biggest threat to the validity of nonrandomized studies. . . Therefore, there is a pressing need for the use
and development of sound statistical methods that also aim at consistent and robust estimation of the treatment
effects when the lack of positivity is expected or unavoidable.
While some authors have investigated the use of PS methods when the proportion of participant is small,
their focus has primarily been on traditional PS methods, overlooking alternative methods that are well suited
for lack of positivity. These alternatives go beyond the traditional use of PS matching, truncation, or trimming
(Hajage et al.,2016;Franklin et al.,2017;Austin,2011).
Causal inference, being inherently a missing data problem (Holland,1986), we often overlook the fundamental
task of any causal estimator: to use the available data to adequately input unknown potential outcome values. For
IPW, this means weighting participants to create a pseudo-population where causal inference can be drawn. When
the treatment allocation is imbalanced and extreme weights emerge, estimationn and inference of the treatment
effects relies heavily on a few participants with extremely large inverse probability weights, which can introduce
severe bias due to data disparity. Therefore, regardless of whether violations of the positivity assumption are
structural or not, it is crucial to ensure that the estimated treatment effects are disproportionally driven by a
small number of outlying participants, especially if there is a substantial treatment allocation imbalance. For
instance, in a tutorial for PS analysis, Austin uses a sample of current smokers discharged alive from a hospital
following an acute myocardial infarction (Austin,2011). What is remarkable in this paper are the small proportion
(32.20%) of patients who did not benefit from in-patient smoking cessation counseling, the wide range of estimated
3
PSs, the presence of a few extreme weights, and the results on the 3-year survival outcomes (binary and time-
to-event). Thus, reading this paper, we can’t help but raise some questions. Were the different conclusions
drawn were solely due to methodological differences? In fact, some of these methods showed a significant reduced
risk of mortality, while others just indicated that the treatment effect was not different from the null. Could
the discrepancy in the study conclusion be solely driven by the differences in the selected methods and their
underlying estimands? Did the imbalance in the number of participants between the two treatment groups play
also a role? As we will demonstrate in this paper, we believe both the choice of specific methods and the imbalance
in treatment allocation play a preeminent role.
1.3 How about trimming or truncating extreme weights?
The IPW estimator of the ATE targets E[B/A], with B= (Ze(X))Yand A=e(X)(1 e(X)).When there is
a violation of the positivity assumption, some observations have A0, which can have unduly influences on the
na¨ıve sample mean of B/A. The primary objective of trimming (i.e., dropping participants) and truncation (i.e.,
capping weights) is to curtail such undue influences and provide a stable estimator. Trimming (or truncating)
participants with A0 (above given thresholds) is a common practice, as it effectively constraints the weights
within reasonable bounds. However, the resulting estimate is often highly sensitive to the choice of on the
threshold(s) (Chaudhuri and Hill,2014;Ma and Wang,2020;Sasaki and Ura,2022). Unfortunately, the choice
of a threshold is often ad hoc and subjective as they rest solely on the user’s discretion (Crump et al.,2006).
In many applications, a user-selected threshold can drastically change the number of participants we discard
or for whom we curtail the weights (see, for instance, (Zhou et al.,2020b)). This tremendously affects the
finite-sample performance of the estimator, influencing both its bias and efficiency. Moreover, the corresponding
estimator may not target the ATE based on the original population since, for instance, under structural violation
of the positivity assumption they both can alter their target estimands and the underlying populations of interest,
depending on the threshold considered (Chaudhuri and Hill,2014;Zhou et al.,2020b). For example, ATE trimming
by a threshold α(0,0.5) shifts its target population to the population of participants whose PS (of receiving
either treatment or control) is inside the interval (α, 1α). Often, standard trimming and truncation method
result to a non-negligible bias (even asymptotically) when estimating ATE, which may have some inference
implications.
Landmark bias-correction strategies for trimming have been proposed. Chaudhuri and Hill (2014) proposed a
bias-corrected, tailed-trimmed IPW estimator of the ATE, based on the tail behavior of |B/A|. Their estimator
is robust and asymptotically valid, even under substantial limited over of the PS distributions. Rather than
trimming on the ratio B/A,Ma and Wang (2020) and Sasaki and Ura (2022) considered trimming observations
with A0 to build flexible, biased-corrected estimators that allow for larger trimming and hence smaller variances
or faster rates of convergence. Robustness of the estimator by Ma and Wang (2020) is achieved by combining
resampling with a local polynomial-based bias-correction technique, where a data-driven threshold is selected by
minimizing the mean squared error. Sasaki and Ura (2022), on the other hand, leverage the smoothness of the
conditional moment function a7→ E[B|A=a] to achieve more robust inference and faster convergence rate.
The above strategies (implicitly) assume random violation of the positivity assumption, under which true
4
ATE exist and can be point estimated; their respective trimming and bias correction solutions aim at improving
inference. Unlike these papers, our proposed estimators do not even rely on the positivity assumption; thus,
they are applicable to either random or structural violation of the positivity assumption. In addition, they
automatically focus on the area of common support, identify a specific subgroup of participants where the estimate
has a strong internal validity, without involving the outcome Y.
Furthermore, when there is an important imbalance in treatment allocation (i.e., the proportion of participants
in one of the treatment groups is small), if often exacerbates the lack of adequate positivity, which can lead to more
trimming or truncation in one group instead of both. Such practices not only reduce the number of participants
in the final sample (after trimming), but also influence the contribution of those from whom extreme weights are
capped (by truncation). This even further complicates point estimation and inference when structural violations
of the positivity assumption are expected. Therefore, there is a growing interest for new practical methods that
do not leave room to manually and subjectively pick a threshold; methods that can leverage inherent data-driven
mechanisms to control the impact of extreme weights and provide robust assessments of treatment effects.
1.4 Are there better alternatives?
The overlap weight (OW), matching weight (MW), and Shannon’s entropy weight (EW) estimators (hereafter
referred to as equipoise treatment effect estimators (Matsouaka and Zhou,2020)) effectively circumvent the lack of
positivity without specifying any user-driven threshold. Besides, they provide both better causal estimations and
higher effective sample sizes (Li and Greene,2013;Li et al.,2018a;Zhou et al.,2020b;Li and Li,2021). However,
it remains to see whether imbalances in treatment allocation can directly affect their estimations (compared to
IPW estimation of ATE) and to what extent.
Therefore, the main objective of this paper is to provide a formal assessment of the impact of equipoise
estimators (i.e., OW, MW, EW) on the treatment effect estimation in studies where there is a disproportionate
distribution of treated or control participants in the population and how it relates to the lack of positivity. The
title of our paper: “Overlap, matching and entropy weights: what are we weighting for?” is thus a call to action,
i.e., to delve into the unique characteristics of these equipoise estimators, which grant them the flexibility to
address the lack of positivity effectively (Zhou et al.,2020b). In the process, we showcase how OW, MW, and
EW methods can be strategically used to estimate treatment effects and make asymptotically correct inferences
of the corresponding estimators, when there is a violation of the positivity assumption.
The rest of the paper is organized as follows. We start in the next section with key questions that help
define the purpose of our study and what we intend to accomplish. Then, in Section 3, we introduce notations
and present the family of balancing weights. We specify their related estimands and define the corresponding
estimators. Of particular interest are questions related to what is being estimated and what the target populations
are when using these balancing weights? Next, we explore how the estimators are impacted by the proportion of
treated participants and provide proper interpretations of their estimates. The illustrative example in Section 2.1
sets the scene for the main idea of this paper and informs the simulations in Section 4.
We evaluate the performance of the estimators using Monte-Carlo simulation studies in Section 4, covering
three different treatment allocations, under various treatment effects and model specifications. The methods are
5
摘要:

Overlap,matching,orentropyweights:whatareweweightingfor?RolandA.Matsouaka1,2,∗,YiLiu1,YunjiZhou11DepartmentofBiostatisticsandBioinformatics,DukeUniversity,Durham,NC,USA2ProgramforComparativeEffectivenessMethodology,DukeClinicalResearchInstitute,Durham,NC,USA∗AbstractTherehasbeenarecentsurgeinstatist...

展开>> 收起<<
Overlap matching or entropy weights what are we weighting for Roland A. Matsouaka12 Yi Liu1 Yunji Zhou1.pdf

共44页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:44 页 大小:3.08MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 44
客服
关注