Matching Estimators of Causal Effects in Clustered Observational Studies with Application to Quantifying the Impact of

2025-05-02 0 0 557.52KB 34 页 10玖币
侵权投诉
Matching Estimators of Causal Effects in
Clustered Observational Studies with
Application to Quantifying the Impact of
Marine Protected Areas on Biodiversity
Can Cui, Shu Yang, Brian J Reich
Department of Statistics, North Carolina State University
and
David A Gill
Nicholas School of the Environment, Duke University
Abstract
Marine conservation preserves fish biodiversity, protects marine and coastal ecosys-
tems, and supports climate resilience and adaptation. Despite the importance of
establishing marine protected areas (MPAs), research on the effectiveness of MPAs
with different conservation policies is limited due to the lack of quantitative MPA
information. In this paper, leveraging a global MPA database, we investigate the
causal impact of MPA policies on fish biodiversity. To address challenges posed by
this clustered and confounded observational study, we construct a matching estimator
of the average treatment effect and a cluster-weighted bootstrap method for variance
estimation. We establish the theoretical guarantees of the matching estimator and
its variance estimator. Under our proposed matching framework, we recommend
matching on both cluster-level and unit-level covariates to achieve efficiency. The
simulation results demonstrate that our matching strategy minimizes the bias and
achieves the nominal confidence interval coverage. Applying our proposed matching
method to compare different MPA policies reveals that the no-take policy is more
effective than the multi-use policy in preserving fish biodiversity.
Key words: Causal inference, conservation, potential outcomes, weighted bootstrap
1
arXiv:2210.03890v1 [stat.ME] 8 Oct 2022
1 Introduction
1.1 Causal Impact of Marine Protected Areas on Biodiversity
Preserving marine biological diversity is an important objective of governments, scien-
tists, local communities, and conservationists. Marine protected areas (MPAs) have been
established worldwide to keep sustainable and resilient marine ecosystems by restricting
destructive and extractive activities within their boundaries (Grorud-Colvert et al., 2021;
UNEP-WCMC et al., 2021). Despite widespread use, the effectiveness of many MPAs and
different types of MPA policies in conserving marine biodiversity remain unclear (Grorud-
Colvert et al., 2021). Very few studies employ rigorous causal inference methods to assess
MPA impacts, and even less so to investigate the relative effects of different conservation
policies (Ferraro et al., 2019). Such studies, however, are important and have significant
policy implications, as prohibiting fishing activities that are potentially important for local
food and livelihood security can result in significant social costs and harm (e.g., Kamat
(2014); Bennett and Dearden (2014)).
Gill et al. (2017) investigated the effectiveness of MPA management and its impacts
on fish populations. They developed a database of ecological, management, social, and
environmental conditions in and around hundreds of MPAs globally. In their study, man-
agement attributes such as available capacity were strongly associated with increases in
fish biomass observed in MPAs. Nonetheless, the relative effects of different types of MPAs
(referred to as policies or treatments), such as those that restrict fishing (hereafter called
multi-use or MU MPAs) and those that prohibit all fishing (hereafter called no-take or NT
MPAs) require further investigation.
While the Gill et al. (2017) database represents one of the largest global datasets of MPA
conditions and ecological outcomes to date, its properties present significant challenges for
applying traditional causal inference methods. First, given the intractability of conducting
randomized experiments in many conservation settings, the global MPA dataset is observa-
tional, and thus subject to confounding biases not present when treatment is randomized
(Pynegar et al., 2021). MU and NT MPAs are likely to be located in areas with different
social, environmental and regulatory conditions. Direct comparisons of the biodiversity
between MU and NT MPAs are fallible. Second, the MPA data are spatially clustered as
nearby sites are usually under the same conservation policy, whether it be because they lie
within the same MPA, specific management zone within an MPA (e.g., no diving area),
or larger-scale management policy area (e.g., regional or national level fishing policies).
Individual sites also share similar geographical, environmental, and social features that are
possibly dependent on each other. Therefore, estimating the causal impacts of policies such
as MPAs requires appropriate methods for clustered and confounded data.
1.2 Previous Work: Causal Inference in Observational Studies
Although randomized experiments serve as the gold standard, observational studies can
estimate causal effects when all confounding variables are well balanced between treatment
groups. To adjust for the imbalance in observed confounding covariates, matching (Stuart,
2010) is often applied to isolate causal effects due to its transparency and intuitive appeal.
While statistical methods to estimate causal effect in observational studies are growing,
most methods apply to unstructured data (i.e., without clustering). However, clustering
2
often exists because subjects may be grouped by experimental design, geography, or by
sharing higher-level features. Examples include health and educational studies, where
patients are nested in hospitals and students are clustered in classrooms or schools. Such
clustered data structure poses additional challenges when inferring the causal effect. In
our motivating example, the MPA database is naturally clustered, where several sites are
nested in the MPA. Capturing MPA-level as well as site-level features (e.g., local social or
environmental conditions) is important to remove confounding biases when evaluating the
effectiveness of environmental policies.
To estimate causal effects in clustered data, Cafri et al. (2019) showed that treatment
effect estimation is more accurate when accounting for cluster-level confounding variables.
Even if sufficient individual-level covariates are included, ignoring cluster-level confounding
covariates would leave a bias on estimation. Within the matching framework, several
propensity score methods are developed for the clustered data (Hong and Raudenbush,
2006; Arpino and Mealli, 2011; Li et al., 2013; Yang, 2018). However, King and Nielsen
(2019) discussed the inefficiency and failure of balancing covariate distributions between
treatment groups using the propensity score. They attribute the inefficiency of matching
on propensity scores to its goal of mimicking a completely randomized trial rather than a
block-randomized trial as well as error in estimating the propensity score.
1.3 Our Contribution: A Matching Strategy in Clustered Obser-
vational Studies
This article focuses on matching as a nonparametric approach and intuitively mimics a
cluster-randomized experiment. We aim to estimate the causal effect by matching estima-
tors under the framework in Abadie and Imbens (2006). Following the characteristics in
the MPA database, we analyze the clustered data where the treatment is clustered within
the MPA. Nearby sites tend to be assigned the same MPA policy, and one MPA usually
contains a single policy only. Cluster-level and unit-level covariates are available, and the
outcome is collected at the unit level. To account for the conditional bias when matching on
multiple covariates, we adopt the bias-corrected matching estimator (Abadie and Imbens,
2011) in clustered data for two common estimands, the average treatment effect and aver-
age treatment effect on the treated, and establish the large sample properties. Under this
data structure, matching on cluster-level covariates is sufficient to remove the confound-
ing biases. However, we recommend including relevant unit-level covariates in matching
to achieve higher efficiency and lower variance. We show reduced variance in theory and
simulation to demonstrate the advantages of matching on both cluster-level and unit-level
covariates.
To account for clustered dependence, we propose a cluster-weighted bootstrap method
for variance estimation, which combines the idea of cluster bootstrap (Davison and Hink-
ley, 1997) and weighted bootstrap (Otsu and Rai, 2017). Based on a linearization of the
matching estimator, the weighted bootstrap method creates residuals so that matching
estimators can be viewed as the sample averages of residuals. The variance of the match-
ing estimator can then be approximated by bootstrapping the residuals with appropriate
weights. This method preserves the distribution of the number of times that each unit is
matched in the resampling procedure. Thus, it avoids the failure of the standard bootstrap
in this setting, as discussed in Abadie and Imbens (2008).
3
The rest of this paper is organized as follows. In Section 2, we introduce the motivating
data and describe challenges in establishing causal effects due to the nature of the data
structure. Section 3 introduces the notation, assumptions, and estimands of interests. Sec-
tion 4 explores the large sample properties of matching estimators in clustered data. Section
5 presents the cluster-weighted bootstrap procedure for variance estimation. An extension
to unit-level treatment assignments for matching estimators is described in Section 6. In
Section 7, we apply the proposed matching estimator in the MPA data to investigate the
causal effect of different marine protection policies on fish biodiversity. In Section 8, a sim-
ulation study is reported to evaluate the performance of the proposed matching estimator
in clustered data. Finally, we conclude our findings in Section 9.
2 MPA Data and Exploratory Analysis
The MPA dataset created by Gill et al. (2017) includes social, environmental, and ecological
information in 9987 sites within 215 MPAs worldwide (Figure 1). The number of sites in
each MPA ranges from 1 to 1619, with a mean of 46 and a median of 8. The multi-use
(MU) and the no-take (NT) policy represent two broad categories of types of MPAs. The
MU policy regulates fishing activities to reduce negative impacts, while the NT policy is
more rigorous and prohibits all fishing within the MPA boundaries. Among 9987 sites,
3988 sites receive the NT policy, whereas 5999 are under the MU policy. The outcome
variable is total fish biomass at each site, recorded in underwater visual surveys. There
are 13 continuous covariates and 4 categorical covariates that describe the MPA-level and
site-level features (Table 1). MPA-level covariates include MPA size and country. The
other covariates include site-level social and environmental conditions, as well as sampling
protocol, location, and date.
Figure 1: Map showing MPA location, size and policy type (MU = multi-use, NT = no-
take); MPA policies are present by the majority within each MPA.
4
Table 1: Feature list in the MPA database with units in parentheses for continuous variables
and number of levels in parentheses for categorical variables. A detailed summary of the
covariates is given in Gill et al. (2017) Supplementary Table 5.
Site-level Covariates MPA-level Covariates
Continuous (13)
Latitude (degree)
Longitude (degree)
Depth (m)
Wave exposure (kW/m)
Distance to shoreline (km)
Distance to population center (“market”; km)
Coastal population (million/100km2)
Sample date (year)
Minimum sea surface temperature (C)
Chlorophyll-A (mg/m3)
Reef area within 15km (km2)
MPA age (years)
MPA size (km2)
Categorical (4)
Habitat type (16)
Marine ecoregion (56)
Sampling protocol (6)
Country (43)
Sites within the same MPA usually receive the same policy (i.e., same fishing regula-
tions), and each site belongs only to one MPA. As a result, the dataset is naturally clustered
where observed sites are nested within the MPA, and conservation policies are geographi-
cally clustered. The cluster structure brings difficulty in causal inference due to potential
confounding. Both cluster-level and site-level covariates could contribute to the confound-
ing bias. Sites in the same MPA share common environmental, MPA-level and geographical
characteristics, affecting both the fish population and MPA policy assignment (Ahmadia
et al., 2015; Gill et al., 2017; Ferraro et al., 2019). Site-specific covariates, including depth,
distance to population centers (also called “markets”), size of neighboring human popu-
lation, and chlorophyll concentration, are also relevant to the ecological outcome (Brewer
et al., 2013; Edgar et al., 2014; Gill et al., 2017; Campbell et al., 2020). Within the same
MPA, implementing either the MU or NT policy could be heavily influenced by pre-existing
ecological conditions, local tourism, fishing, or politics (Toth et al., 2014; Karr et al., 2015),
which are ideally captured as site-specific covariates.
Confounding and clustering present two major challenges. We compare the covariate
distributions under the two MPA policies for both unadjusted and adjusted samples. The
unadjusted sample refers to the raw observation, while the adjusted sample results from
multiple matching (one-to-three) using the Mahalanobis distance and with replacement.
Figure 2 is a hypothetical example to illustrate the applied multiple matching. The letters
A, G, and K represent sites under the multi-use policy, while the rest of the letters represent
sites under the no-take policy. With 1:3 matching, one multi-use site is matched with three
no-take sites. Meanwhile, the matched no-take sites can be paired with other multi-use
sites. For example,the matched no-take site E is used twice to match site A and G.
5
摘要:

MatchingEstimatorsofCausalE ectsinClusteredObservationalStudieswithApplicationtoQuantifyingtheImpactofMarineProtectedAreasonBiodiversityCanCui,ShuYang,BrianJReichDepartmentofStatistics,NorthCarolinaStateUniversityandDavidAGillNicholasSchooloftheEnvironment,DukeUniversityAbstractMarineconservationpre...

展开>> 收起<<
Matching Estimators of Causal Effects in Clustered Observational Studies with Application to Quantifying the Impact of.pdf

共34页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:34 页 大小:557.52KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 34
客服
关注