DeepMed Semiparametric Causal Mediation Analysis with Debiased Deep Learning Siqi Xu

2025-05-06 0 0 578.36KB 28 页 10玖币
侵权投诉
DeepMed: Semiparametric Causal Mediation
Analysis with Debiased Deep Learning
Siqi Xu
Department of Statistics and Actuarial Sciences
University of Hong Kong
Hong Kong SAR, China
sqxu@hku.hk
Lin Liu
Institute of Natural Sciences, MOE-LSC,
School of Mathematical Sciences, CMA-Shanghai,
and SJTU-Yale Joint Center for Biostatistics and Data Science
Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory
Shanghai, China
linliu@sjtu.edu.cn
Zhonghua Liu
Department of Biostatistics
Columbia University
New York, NY, USA
zl2509@cumc.columbia.edu
Abstract
Causal mediation analysis can unpack the black box of causality and is therefore a
powerful tool for disentangling causal pathways in biomedical and social sciences,
and also for evaluating machine learning fairness. To reduce bias for estimating
Natural Direct and Indirect Effects in mediation analysis, we propose a new method
called
DeepMed
that uses deep neural networks (DNNs) to cross-fit the infinite-
dimensional nuisance functions in the efficient influence functions. We obtain
novel theoretical results that our
DeepMed
method (1) can achieve semiparametric
efficiency bound without imposing sparsity constraints on the DNN architecture
and (2) can adapt to certain low-dimensional structures of the nuisance functions,
significantly advancing the existing literature on DNN-based semiparametric causal
inference. Extensive synthetic experiments are conducted to support our findings
and also expose the gap between theory and practice. As a proof of concept, we
apply
DeepMed
to analyze two real datasets on machine learning fairness and
reach conclusions consistent with previous findings.
1 Introduction
Tremendous progress has been made in this decade on deploying deep neural networks (DNNs) in
real-world problems (Krizhevsky et al.,2012;Wolf et al.,2019;Jumper et al.,2021;Brown et al.,
2022). Causal inference is no exception. In semiparametric causal inference, a series of seminal
works (Chen et al.,2020;Chernozhukov et al.,2020;Farrell et al.,2021) initiated the investigation of
Co-corresponding authors, alphabetical order
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.04389v2 [stat.ML] 26 Dec 2022
statistical properties of causal effect estimators when the nuisance functions (the outcome regressions
and propensity scores) are estimated by DNNs. However, there are a few limitations in the current
literature that need to be addressed before the theoretical results can be used to guide practice:
(1) Most recent works mainly focus on total effect (Chen et al.,2020;Farrell et al.,2021). In many
settings, however, more intricate causal parameters are often of greater interests. In biomedical and
social sciences, one is often interested in “mediation analysis” to decompose the total effect into
direct and indirect effect to unpack the underlying black-box causal mechanism (Baron and Kenny,
1986). More recently, mediation analysis also percolated into machine learning fairness. For instance,
in the context of predicting the recidivism risk, Nabi and Shpitser (2018) argued that, for a “fair”
algorithm, sensitive features such as race should have no direct effect on the predicted recidivism
risk. If such direct effects can be accurately estimated, one can detect the potential unfairness of a
machine learning algorithm. We will revisit such applications in Section 5and Appendix G.
(2) Statistical properties of DNN-based causal estimators in recent works mostly follow from
several (recent) results on the convergence rates of DNN-based nonparametric regression estimators
(Suzuki,2019;Schmidt-Hieber,2020;Tsuji and Suzuki,2021), with the limitation of relying on
sparse DNN architectures. The theoretical properties are in turn evaluated by relatively simple
synthetic experiments not designed to generate nearly infinite-dimensional nuisance functions, a
setting considered by almost all the above related works.
The above limitations raise the tantalizing question whether the available statistical guarantees for
DNN-based causal inference have practical relevance. In this work, we plan to partially fill these gaps
by developing a new method called
DeepMed
for semiparametric mediation analysis with DNNs. We
focus on the Natural Direct/Indirect Effects (NDE/NIE) (Robins and Greenland,1992;Pearl,2001)
(defined in Section 2.1), but our results can also be applied to more general settings; see Remark 2.
The
DeepMed
estimators leverage the “multiply-robust” property of the efficient influence function
(EIF) of NDE/NIE (Tchetgen Tchetgen and Shpitser,2012;Farbmacher et al.,2022) (see Proposition
1in Section 2.2), together with the flexibility and superior predictive power of DNNs (see Section
3.1 and Algorithm 1). In particular, we also make the following novel contributions to deepen our
understanding of DNN-based semiparametric causal inference:
On the theoretical side, we obtain new results that our
DeepMed
method can achieve semi-
parametric efficiency bound without imposing sparsity constraints on the DNN architecture
and can adapt to certain low-dimensional structures of the nuisance functions (see Section
3.2), thus significantly advancing the existing literature on DNN-based semiparametric
causal inference. Non-sparse DNN architecture is more commonly employed in practice
(Farrell et al.,2021), and the low-dimensional structures of nuisance functions can help
avoid curse-of-dimensionality. These two points, taken together, significantly advance our
understanding of the statistical guarantee of DNN-based causal inference.
More importantly, on the empirical side, in Section 4, we designed sophisticated synthetic
experiments to simulate nearly infinite-dimensional functions, which are much more complex
than those in previous related works (Chen et al.,2020;Farrell et al.,2021;Adcock and
Dexter,2021). We emphasize that these nontrivial experiments could be of independent
interest to the theory of deep learning beyond causal inference, to further expose the gap
between deep learning theory and practice (Adcock and Dexter,2021;Gottschling et al.,
2020); see Remark 9for an extended discussion. As a proof of concept, in Section 5and
Appendix G, we also apply
DeepMed
to re-analyze two real-world datasets on algorithmic
fairness and reach similar conclusions to related works.
Finally, a user-friendly R package can be found at https://github.com/siqixu/DeepMed.
Making such resources available helps enhance reproducibility, a highly recognized problem
in all scientific disciplines, including (causal) machine learning (Pineau et al.,2021;Kaddour
et al.,2022).
2 Definition, identification, and estimation of NDE and NIE
2.1 Definition of NDE and NIE
Throughout this paper, we denote
Y
as the primary outcome of interest,
D
as a binary treatment
variable,
M
as the mediator on the causal pathway from
D
to
Y
, and
X[0,1]p
(or more generally,
2
compactly supported in
Rp
) as baseline covariates including all potential confounders. We denote
the observed data vector as
O(X, D, M, Y )
. Let
M(d)
denote the potential outcome for the
mediator when setting
D=d
and
Y(d, m)
be the potential outcome of
Y
under
D=d
and
M=m
,
where
d∈ {0,1}
and
m
is in the support
M
of
M
. We define the average total (treatment) effect
as
τtot :=E[Y(1, M(1)) Y(0, M(0))]
, the average NDE of the treatment
D
on the outcome
Y
when the mediator takes the natural potential outcome when
D=d
as
τNDE(d):=E[Y(1, M(d))
Y(0, M(d))]
, and the average NIE of the treatment
D
on the outcome
Y
via the mediator
M
as
τNIE(d):=E[Y(d, M(1)) Y(d, M(0))]
. We have the trivial decomposition
τtot τNDE(d) +
τNIE(d0)
for
d6=d0
. In causal mediation analysis, the parameters of interest are
τNDE(d)
and
τNIE(d)
.
2.2 Semiparametric multiply-robust estimators of NDE/NIE
Estimating
τNDE(d)
and
τNIE(d)
can be reduced to estimating
φ(d, d0):=E[Y(d, M(d0))]
for
d, d0
{0,1}. We make the following standard identification assumptions:
i. Consistency
: if
D=d
, then
M=M(d)
for all
d∈ {0,1}
; while if
D=d
and
M=m
,
then Y=Y(d, m)for all d∈ {0,1}and all min the support of M.
ii. Ignorability
:
Y(d, m)D|X
,
Y(d, m)M|X, D
,
M(d)D|X
, and
Y(d, m)
M(d0)|X
, almost surely for all
d, ∈ {0,1}
and all
m∈ M
. The first three conditions are,
respectively, no unmeasured treatment-outcome, mediator-outcome and treatment-mediator
confounding, whereas the fourth condition is often referred to as the “cross-world” condition.
We provide more detailed comments on these four conditions in Appendix A.
iii. Positivity
: The propensity score
a(d|X)Pr(D=d|X)(c, C)
for some constants
0< c C < 1
, almost surely for all
d∈ {0,1}
;
f(m|X, d)
, the conditional density (mass)
function of
M=m
(when
M
is discrete) given
X
and
D=d
, is strictly bounded between
[¯
ρ, ¯ρ]for some constants 0<¯
ρ¯ρ < almost surely for all min Mand all d∈ {0,1}.
Under the above assumptions, the causal parameter
φ(d, d0)
for
d, d0∈ {0,1}
can be identified as
either of the following three observed-data functionals:
φ(d, d0)E{D=d}f(M|X, d0)Y
a(d|X)f(M|X, d)E{D=d0}
a(d0|X)µ(X, d, M )
Zµ(x, d, m)f(m|x, d0)p(x) dmdx,
(1)
where
{·}
denotes the indicator function,
p(x)
denotes the marginal density of
X
, and
µ(x, d, m):=
E[Y|X=x, D =d, M =m]
is the outcome regression model, for which we also make the following
standard boundedness assumption:
iv. µ(x, d, m)is also strictly bounded between [R, R]for some constant R > 0.
Following the convention in the semiparametric causal inference literature, we call
a, f, µ
“nuisance
functions”. Tchetgen Tchetgen and Shpitser (2012) derived the EIF of
φ(d, d0)
:
EIFd,d0ψd,d0(O)
φ(d, d0), where
ψd,d0(O) = {D=d} · f(M|X, d0)
a(d|X)·f(M|X, d)(Yµ(X, d, M))
+1{D=d0}
a(d0|X)Zm∈M
µ(X, d, m)f(m|X, d0)dm+{D=d0}
a(d0|X)µ(X, d, M ).(2)
The nuisance functions
µ(x, d, m)
,
a(d|x)
and
f(m|x, d)
appeared in
ψd,d0(o)
are unknown and
generally high-dimensional. But with a sample
D ≡ {Oj}N
j=1
of the observed data, based on
ψd,d0(o)
,
one can construct the following generic sample-splitting multiply-robust estimator of φ(d, d0):
e
φ(d, d0) = 1
nX
i∈Dne
ψd,d0(Oi),(3)
where
Dn≡ {Oi}n
i=1
is a subset of all
N
data, and
e
ψd,d0(o)
replaces the unknown nuisance functions
a, f, µ
in
ψd,d0(o)
by some generic estimators
ea, e
f, eµ
computed using the remaining
Nn
nuisance
3
sample data, denoted as
Dν
. Cross-fit is then needed to recover the information lost due to sample
splitting; see Algorithm 1. It is clear from
(2)
that
e
φ(d, d0)
is a consistent estimator of
φ(d, d0)
as
long as any two of
ea, e
f, eµ
are consistent estimators of the corresponding true nuisance functions,
hence the name “multiply-robust”. Throughout this paper, we take nNnand assume:
v.
Any nuisance function estimators are strictly bounded within the respective lower and upper
bounds of a, f, µ.
To further ease notation, we define: for any
d∈ {0,1}
,
ra,d :=Rδa,d(x)2dF(x)1/2, rf,d :=
Rδf,d(x, m)2dF(x, m|d= 0)1/2,
and
rµ,d :=Rδµ,d(x, m)2dF(x, m|d= 0)1/2,
where
δa,d(x):=ea(d|x)a(d|x)
,
δf,d(x, m):=e
f(m|x, d)f(m|x, d)
and
δµ,d(x, m):=eµ(x, d, m)
µ(x, d, m)
are point-wise estimation errors of the estimated nuisance functions. In defining the above
L2
-estimation errors, we choose to take expectation with respect to (w.r.t.) the law
F(m, x|d= 0)
only for convenience, with no loss of generality by Assumptions iii and v.
To show the cross-fit version of
e
φ(d, d0)
is semiparametric efficient for
φ(d, d0)
, we shall demonstrate
under what conditions
n(e
φ(d, d0)φ(d, d0)) L
→ N(0,E[EIF2
d,d0])
(Newey,1990). The following
proposition on the statistical properties of e
φ(d, d0)is a key step towards this objective.
Proposition 1.
Denote
Bias(e
φ(d, d0)) :=E[e
φ(d, d0)φ(d, d0)|Dν]
as the bias of
e
φ(d, d0)
conditional
on the nuisance sample Dν. Under Assumptions i – v, Bias(e
φ(d, d0)) is of second-order:
|Bias(e
φ(d, d0))|.max ra,d ·rf,d,max
d00∈{0,1}rf,d00 ·rµ,d, ra,d ·rµ,d.(4)
Furthermore, if the RHS of (4)is o(n1/2), then
ne
φ(d, d0)φ(d, d0)=1
n
n
X
i=1
(ψd,d0(Oi)φ(d, d0)) + o(1) d
→ N 0,EEIF2
d,d0.(5)
Although the above result is a direct consequence of the EIF
ψd,d0(O)
, we prove Proposition 1in
Appendix Bfor completeness.
Remark 2.
The total effect
τtot =φ(1,1) φ(0,0)
can be viewed as a special case, for which
d=d0
for
φ(d, d0)
. Then
EIFd,d EIFd
corresponds to the nonparametric EIF of
φ(d, d)φ(d)
E[Y(d, M(d))]:
EIFd=ψd(O)φ(d)with ψd(O) = {D=d}
a(d|X)Y+1{D=d}
a(d|X)µ(X, d),
where
µ(x, d):=E[Y|X=x, D =d]
. Hence all the theoretical results in this paper are applicable
to total effect estimation. Our framework can also be applied to all the statistical functionals that
satisfy a so-called “mixed-bias” property, characterized recently in Rotnitzky et al. (2021). This
class includes the quadratic functional, which is important for uncertainty quantification in machine
learning.
3 Estimation and inference of NDE/NIE using DeepMed
We now introduce
DeepMed
, a method for mediation analysis with nuisance functions estimated by
DNNs. By leveraging the second-order bias property of the multiply-robust estimators of NDE/NIE
(Proposition 1), we will derive statistical properties of
DeepMed
in this section. The nuisance function
estimators by DNNs are denoted as ba, b
f, bµ.
3.1 Details on DeepMed
First, we introduce the fully-connected feed-forward neural network with the rectified linear units
(ReLU) as the activation function for the hidden layer neurons (FNN-ReLU), which will be used to
estimate the nuisance functions. Then, we will introduce an estimation procedure using a
V
-fold
4
cross-fitting with sample-splitting to avoid the Donsker-type empirical-process assumption on the
nuisance functions, which, in general, is violated in high-dimensional setup. Finally, we provide the
asymptotic statistical properties of the DNN-based estimators of τtot,τNDE(d)and τNIE(d).
We denote the ReLU activation function as
σ(u):= max(u, 0)
for any
uR
. Given vectors
x, b
, we
denote σb(x):=σ(xb), with σacting on the vector xbcomponent-wise.
Let Fnn denote the class of the FNN-ReLU functions
Fnn :=nf:RpR;f(x) = W(L)σb(L)◦ ··· ◦ W(1)σb(1) (x)o,
where
is the composition operator,
L
is the number of layers (i.e. depth) of the network, and
for
l= 1,··· , L
,
W(l)
is a
Kl+1 ×Kl
-dimensional weight matrix with
Kl
being the number of
neurons in the
l
-th layer (i.e. width) of the network, with
K1=p
and
KL+1 = 1
, and
b(l)
is a
Kl
-dimensional vector. To avoid notation clutter, we concatenate all the network parameters as
Θ=(W(l), b(l), l = 1,··· , L)
and simply take
K2=··· =KL=K
. We also assume
Θ
to be
bounded:
kΘkB
for some universal constant
B > 0
. We may let the dependence on
L
,
K
,
B
explicit by writing Fnn as Fnn(L, K, B).
DeepMed
estimates
τtot, τNDE(d), τNIE(d)
by
(3)
, with the nuisance functions
a, f, µ
estimated using
Fnn
with the
V
-fold cross-fitting strategy, summarized in Algorithm 1below; also see Farbmacher
et al. (2022).
DeepMed
inputs the observed data
D ≡ {Oi}N
i=1
and outputs the estimated total
effect
bτtot
, NDE
bτNDE (d)
and NIE
bτNIE (d)
, together with their variance estimators
bσ2
tot
,
bσ2
NDE(d)
and
bσ2
NIE(d).
Algorithm 1 DeepMed with V-fold cross-fitting
1: Choose some integer V(usually V∈ {2,3,· · · ,10})
2: Split the Nobservations into Vsubsamples Iv⊂ {1,· · · , N} ≡ [N]with equal size n=N/V ;
3: for v= 1,· · · , V :do
4: Fit the nuisance functions by DNNs using observations in [N]\Iv
5: Compute the nuisance functions in the subsample Ivusing the estimated DNNs in step 4
6:
Obtain
{b
ψd(Oi),b
ψd,d0(Oi)}iIv
for the subsample
Iv
based on
(2)
, respectively, with the nuisance
functions replaced by their estimates in step 5
7: end for
8: Estimate average potential outcomes by b
φ(d):=1
N
N
P
i=1 b
ψd(Oi),b
φ(d, d0):=1
N
N
P
i=1 b
ψd,d0(Oi)
9: Estimate causal effects by bτtot,bτNDE (d)and bτNIE (d)with b
φ(d)and b
φ(d, d0)
10: Estimate the variances of bτtot,bτNDE (d)and bτNIE (d)by:
bσ2
tot :=1
N2
N
P
i=1
(b
ψ1(Oi)b
ψ0(Oi))21
Nbτ2
tot;bσ2
NDE(d):=1
N2
N
P
i=1
(b
ψ1,d(Oi)b
ψ0,d(Oi))21
Nbτ2
NDE(d);
bσ2
NIE(d):=1
N2
N
P
i=1
(b
ψd,1(Oi)b
ψd,0(Oi))21
Nbτ2
NIE(d)
Output: bτtot,bτNDE (d),bτNIE(d),bσ2
tot,bσ2
NDE(d)and bσ2
NIE(d)
Remark 3
(Continuous or multi-dimensional mediators)
.
For binary treatment
D
and continuous or
multi-dimensional M, to avoid nonparametric/high-dimensional conditional density estimation, we
can rewrite
f(m|x,d0)
a(d|x)f(m|x,d)
as
1a(d|x,m)
a(d|x,m)(1a(d|x))
by the Bayes’ rule and the integral w.r.t.
f(m|x, d0)
in
(2)
as
E[µ(X, d, M )|X=x, D =d0]
. Then we can first estimate
µ(x, d, m)
by
bµ(x, d, m)
and
in turn estimate
E[µ(X, d, M )|X=x, D =d0]
by regressing
bµ(X, d, M)
against
(X, D)
using
the FNN-ReLU class. We mainly consider binary
M
to avoid unnecessary complications; but see
Appendix Gfor an example in which this strategy is used. Finally, the potential incompatibility
between models posited for
a(d|x)
and
a(d|x, m)
and the joint distribution of
(X, A, M, Y )
is not of
great concern under the semiparametric framework because all nuisance functions are estimated
nonparametrically; again, see Appendix Gfor an extended discussion.
3.2 Statistical properties of DeepMed: Non-sparse DNN architecture and low-dimensional
structures of the nuisance functions
According to Proposition 1, to analyze the statistical properties
DeepMed
, it is sufficient to control
the
L2
-estimation errors of nuisance function estimates
ba, b
f, bµ
fit by DNNs. To ease presentation,
5
摘要:

DeepMed:SemiparametricCausalMediationAnalysiswithDebiasedDeepLearningSiqiXuDepartmentofStatisticsandActuarialSciencesUniversityofHongKongHongKongSAR,Chinasqxu@hku.hkLinLiuInstituteofNaturalSciences,MOE-LSC,SchoolofMathematicalSciences,CMA-Shanghai,andSJTU-YaleJointCenterforBiostatisticsandDataScien...

展开>> 收起<<
DeepMed Semiparametric Causal Mediation Analysis with Debiased Deep Learning Siqi Xu.pdf

共28页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:28 页 大小:578.36KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 28
客服
关注