DeepMed Semiparametric Causal Mediation Analysis with Debiased Deep Learning Siqi Xu

2025-05-06 0 0 578.36KB 28 页 10玖币

侵权投诉

DeepMed: Semiparametric Causal Mediation

Analysis with Debiased Deep Learning

Siqi Xu

Department of Statistics and Actuarial Sciences

University of Hong Kong

Hong Kong SAR, China

sqxu@hku.hk

Lin Liu∗

Institute of Natural Sciences, MOE-LSC,

School of Mathematical Sciences, CMA-Shanghai,

and SJTU-Yale Joint Center for Biostatistics and Data Science

Shanghai Jiao Tong University and Shanghai Artiﬁcial Intelligence Laboratory

Shanghai, China

linliu@sjtu.edu.cn

Zhonghua Liu∗

Department of Biostatistics

Columbia University

New York, NY, USA

zl2509@cumc.columbia.edu

Abstract

Causal mediation analysis can unpack the black box of causality and is therefore a

powerful tool for disentangling causal pathways in biomedical and social sciences,

and also for evaluating machine learning fairness. To reduce bias for estimating

Natural Direct and Indirect Effects in mediation analysis, we propose a new method

called

DeepMed

that uses deep neural networks (DNNs) to cross-ﬁt the inﬁnite-

dimensional nuisance functions in the efﬁcient inﬂuence functions. We obtain

novel theoretical results that our

DeepMed

method (1) can achieve semiparametric

efﬁciency bound without imposing sparsity constraints on the DNN architecture

and (2) can adapt to certain low-dimensional structures of the nuisance functions,

signiﬁcantly advancing the existing literature on DNN-based semiparametric causal

inference. Extensive synthetic experiments are conducted to support our ﬁndings

and also expose the gap between theory and practice. As a proof of concept, we

apply

DeepMed

to analyze two real datasets on machine learning fairness and

reach conclusions consistent with previous ﬁndings.

1 Introduction

Tremendous progress has been made in this decade on deploying deep neural networks (DNNs) in

real-world problems (Krizhevsky et al.,2012;Wolf et al.,2019;Jumper et al.,2021;Brown et al.,

2022). Causal inference is no exception. In semiparametric causal inference, a series of seminal

works (Chen et al.,2020;Chernozhukov et al.,2020;Farrell et al.,2021) initiated the investigation of

∗Co-corresponding authors, alphabetical order

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.04389v2 [stat.ML] 26 Dec 2022

statistical properties of causal effect estimators when the nuisance functions (the outcome regressions

and propensity scores) are estimated by DNNs. However, there are a few limitations in the current

literature that need to be addressed before the theoretical results can be used to guide practice:

(1) Most recent works mainly focus on total effect (Chen et al.,2020;Farrell et al.,2021). In many

settings, however, more intricate causal parameters are often of greater interests. In biomedical and

social sciences, one is often interested in “mediation analysis” to decompose the total effect into

direct and indirect effect to unpack the underlying black-box causal mechanism (Baron and Kenny,

1986). More recently, mediation analysis also percolated into machine learning fairness. For instance,

in the context of predicting the recidivism risk, Nabi and Shpitser (2018) argued that, for a “fair”

algorithm, sensitive features such as race should have no direct effect on the predicted recidivism

risk. If such direct effects can be accurately estimated, one can detect the potential unfairness of a

machine learning algorithm. We will revisit such applications in Section 5and Appendix G.

(2) Statistical properties of DNN-based causal estimators in recent works mostly follow from

several (recent) results on the convergence rates of DNN-based nonparametric regression estimators

(Suzuki,2019;Schmidt-Hieber,2020;Tsuji and Suzuki,2021), with the limitation of relying on

sparse DNN architectures. The theoretical properties are in turn evaluated by relatively simple

synthetic experiments not designed to generate nearly inﬁnite-dimensional nuisance functions, a

setting considered by almost all the above related works.

The above limitations raise the tantalizing question whether the available statistical guarantees for

DNN-based causal inference have practical relevance. In this work, we plan to partially ﬁll these gaps

by developing a new method called

DeepMed

for semiparametric mediation analysis with DNNs. We

focus on the Natural Direct/Indirect Effects (NDE/NIE) (Robins and Greenland,1992;Pearl,2001)

(deﬁned in Section 2.1), but our results can also be applied to more general settings; see Remark 2.

The

DeepMed

estimators leverage the “multiply-robust” property of the efﬁcient inﬂuence function

(EIF) of NDE/NIE (Tchetgen Tchetgen and Shpitser,2012;Farbmacher et al.,2022) (see Proposition

1in Section 2.2), together with the ﬂexibility and superior predictive power of DNNs (see Section

3.1 and Algorithm 1). In particular, we also make the following novel contributions to deepen our

understanding of DNN-based semiparametric causal inference:

•

On the theoretical side, we obtain new results that our

DeepMed

method can achieve semi-

parametric efﬁciency bound without imposing sparsity constraints on the DNN architecture

and can adapt to certain low-dimensional structures of the nuisance functions (see Section

3.2), thus signiﬁcantly advancing the existing literature on DNN-based semiparametric

causal inference. Non-sparse DNN architecture is more commonly employed in practice

(Farrell et al.,2021), and the low-dimensional structures of nuisance functions can help

avoid curse-of-dimensionality. These two points, taken together, signiﬁcantly advance our

understanding of the statistical guarantee of DNN-based causal inference.

•

More importantly, on the empirical side, in Section 4, we designed sophisticated synthetic

experiments to simulate nearly inﬁnite-dimensional functions, which are much more complex

than those in previous related works (Chen et al.,2020;Farrell et al.,2021;Adcock and

Dexter,2021). We emphasize that these nontrivial experiments could be of independent

interest to the theory of deep learning beyond causal inference, to further expose the gap

between deep learning theory and practice (Adcock and Dexter,2021;Gottschling et al.,

2020); see Remark 9for an extended discussion. As a proof of concept, in Section 5and

Appendix G, we also apply

DeepMed

to re-analyze two real-world datasets on algorithmic

fairness and reach similar conclusions to related works.

•

Finally, a user-friendly R package can be found at https://github.com/siqixu/DeepMed.

Making such resources available helps enhance reproducibility, a highly recognized problem

in all scientiﬁc disciplines, including (causal) machine learning (Pineau et al.,2021;Kaddour

et al.,2022).

2 Deﬁnition, identiﬁcation, and estimation of NDE and NIE

2.1 Deﬁnition of NDE and NIE

Throughout this paper, we denote

as the primary outcome of interest,

as a binary treatment

variable,

as the mediator on the causal pathway from

, and

X∈[0,1]p

(or more generally,

compactly supported in

) as baseline covariates including all potential confounders. We denote

the observed data vector as

O≡(X, D, M, Y )

. Let

M(d)

denote the potential outcome for the

mediator when setting

D=d

and

Y(d, m)

be the potential outcome of

under

D=d

and

M=m

where

d∈ {0,1}

and

is in the support

. We deﬁne the average total (treatment) effect

τtot :=E[Y(1, M(1)) −Y(0, M(0))]

, the average NDE of the treatment

on the outcome

when the mediator takes the natural potential outcome when

D=d

τNDE(d):=E[Y(1, M(d)) −

Y(0, M(d))]

, and the average NIE of the treatment

on the outcome

via the mediator

τNIE(d):=E[Y(d, M(1)) −Y(d, M(0))]

. We have the trivial decomposition

τtot ≡τNDE(d) +

τNIE(d0)

for

d6=d0

. In causal mediation analysis, the parameters of interest are

τNDE(d)

and

τNIE(d)

2.2 Semiparametric multiply-robust estimators of NDE/NIE

Estimating

τNDE(d)

and

τNIE(d)

can be reduced to estimating

φ(d, d0):=E[Y(d, M(d0))]

for

d, d0∈

{0,1}. We make the following standard identiﬁcation assumptions:

i. Consistency

: if

D=d

, then

M=M(d)

for all

d∈ {0,1}

; while if

D=d

and

M=m

then Y=Y(d, m)for all d∈ {0,1}and all min the support of M.

ii. Ignorability

Y(d, m)⊥D|X

Y(d, m)⊥M|X, D

M(d)⊥D|X

, and

Y(d, m)⊥

M(d0)|X

, almost surely for all

d, ∈ {0,1}

and all

m∈ M

. The ﬁrst three conditions are,

respectively, no unmeasured treatment-outcome, mediator-outcome and treatment-mediator

confounding, whereas the fourth condition is often referred to as the “cross-world” condition.

We provide more detailed comments on these four conditions in Appendix A.

iii. Positivity

: The propensity score

a(d|X)≡Pr(D=d|X)∈(c, C)

for some constants

0< c ≤C < 1

, almost surely for all

d∈ {0,1}

;

f(m|X, d)

, the conditional density (mass)

function of

M=m

(when

is discrete) given

and

D=d

, is strictly bounded between

[¯

ρ, ¯ρ]for some constants 0<¯

ρ≤¯ρ < ∞almost surely for all min Mand all d∈ {0,1}.

Under the above assumptions, the causal parameter

φ(d, d0)

for

d, d0∈ {0,1}

can be identiﬁed as

either of the following three observed-data functionals:

φ(d, d0)≡E{D=d}f(M|X, d0)Y

a(d|X)f(M|X, d)≡E{D=d0}

a(d0|X)µ(X, d, M )

≡Zµ(x, d, m)f(m|x, d0)p(x) dmdx,

(1)

where

{·}

denotes the indicator function,

p(x)

denotes the marginal density of

, and

µ(x, d, m):=

E[Y|X=x, D =d, M =m]

is the outcome regression model, for which we also make the following

standard boundedness assumption:

iv. µ(x, d, m)is also strictly bounded between [−R, R]for some constant R > 0.

Following the convention in the semiparametric causal inference literature, we call

a, f, µ

“nuisance

functions”. Tchetgen Tchetgen and Shpitser (2012) derived the EIF of

φ(d, d0)

EIFd,d0≡ψd,d0(O)−

φ(d, d0), where

ψd,d0(O) = {D=d} · f(M|X, d0)

a(d|X)·f(M|X, d)(Y−µ(X, d, M))

+1−{D=d0}

a(d0|X)Zm∈M

µ(X, d, m)f(m|X, d0)dm+{D=d0}

a(d0|X)µ(X, d, M ).(2)

The nuisance functions

µ(x, d, m)

a(d|x)

and

f(m|x, d)

appeared in

ψd,d0(o)

are unknown and

generally high-dimensional. But with a sample

D ≡ {Oj}N

j=1

of the observed data, based on

ψd,d0(o)

one can construct the following generic sample-splitting multiply-robust estimator of φ(d, d0):

φ(d, d0) = 1

i∈Dne

ψd,d0(Oi),(3)

where

Dn≡ {Oi}n

i=1

is a subset of all

data, and

ψd,d0(o)

replaces the unknown nuisance functions

a, f, µ

ψd,d0(o)

by some generic estimators

ea, e

f, eµ

computed using the remaining

N−n

nuisance

sample data, denoted as

Dν

. Cross-ﬁt is then needed to recover the information lost due to sample

splitting; see Algorithm 1. It is clear from

(2)

that

φ(d, d0)

is a consistent estimator of

φ(d, d0)

long as any two of

ea, e

f, eµ

are consistent estimators of the corresponding true nuisance functions,

hence the name “multiply-robust”. Throughout this paper, we take nN−nand assume:

Any nuisance function estimators are strictly bounded within the respective lower and upper

bounds of a, f, µ.

To further ease notation, we deﬁne: for any

d∈ {0,1}

ra,d :=Rδa,d(x)2dF(x)1/2, rf,d :=

Rδf,d(x, m)2dF(x, m|d= 0)1/2,

and

rµ,d :=Rδµ,d(x, m)2dF(x, m|d= 0)1/2,

where

δa,d(x):=ea(d|x)−a(d|x)

δf,d(x, m):=e

f(m|x, d)−f(m|x, d)

and

δµ,d(x, m):=eµ(x, d, m)−

µ(x, d, m)

are point-wise estimation errors of the estimated nuisance functions. In deﬁning the above

-estimation errors, we choose to take expectation with respect to (w.r.t.) the law

F(m, x|d= 0)

only for convenience, with no loss of generality by Assumptions iii and v.

To show the cross-ﬁt version of

φ(d, d0)

is semiparametric efﬁcient for

φ(d, d0)

, we shall demonstrate

under what conditions

√n(e

φ(d, d0)−φ(d, d0)) L

→ N(0,E[EIF2

d,d0])

(Newey,1990). The following

proposition on the statistical properties of e

φ(d, d0)is a key step towards this objective.

Proposition 1.

Denote

Bias(e

φ(d, d0)) :=E[e

φ(d, d0)−φ(d, d0)|Dν]

as the bias of

φ(d, d0)

conditional

on the nuisance sample Dν. Under Assumptions i – v, Bias(e

φ(d, d0)) is of second-order:

|Bias(e

φ(d, d0))|.max ra,d ·rf,d,max

d00∈{0,1}rf,d00 ·rµ,d, ra,d ·rµ,d.(4)

Furthermore, if the RHS of (4)is o(n−1/2), then

√ne

φ(d, d0)−φ(d, d0)=1

√n

i=1

(ψd,d0(Oi)−φ(d, d0)) + o(1) d

→ N 0,EEIF2

d,d0.(5)

Although the above result is a direct consequence of the EIF

ψd,d0(O)

, we prove Proposition 1in

Appendix Bfor completeness.

Remark 2.

The total effect

τtot =φ(1,1) −φ(0,0)

can be viewed as a special case, for which

d=d0

for

φ(d, d0)

. Then

EIFd,d ≡EIFd

corresponds to the nonparametric EIF of

φ(d, d)≡φ(d)≡

E[Y(d, M(d))]:

EIFd=ψd(O)−φ(d)with ψd(O) = {D=d}

a(d|X)Y+1−{D=d}

a(d|X)µ(X, d),

where

µ(x, d):=E[Y|X=x, D =d]

. Hence all the theoretical results in this paper are applicable

to total effect estimation. Our framework can also be applied to all the statistical functionals that

satisfy a so-called “mixed-bias” property, characterized recently in Rotnitzky et al. (2021). This

class includes the quadratic functional, which is important for uncertainty quantiﬁcation in machine

learning.

3 Estimation and inference of NDE/NIE using DeepMed

We now introduce

DeepMed

, a method for mediation analysis with nuisance functions estimated by

DNNs. By leveraging the second-order bias property of the multiply-robust estimators of NDE/NIE

(Proposition 1), we will derive statistical properties of

DeepMed

in this section. The nuisance function

estimators by DNNs are denoted as ba, b

f, bµ.

3.1 Details on DeepMed

First, we introduce the fully-connected feed-forward neural network with the rectiﬁed linear units

(ReLU) as the activation function for the hidden layer neurons (FNN-ReLU), which will be used to

estimate the nuisance functions. Then, we will introduce an estimation procedure using a

-fold

cross-ﬁtting with sample-splitting to avoid the Donsker-type empirical-process assumption on the

nuisance functions, which, in general, is violated in high-dimensional setup. Finally, we provide the

asymptotic statistical properties of the DNN-based estimators of τtot,τNDE(d)and τNIE(d).

We denote the ReLU activation function as

σ(u):= max(u, 0)

for any

u∈R

. Given vectors

x, b

, we

denote σb(x):=σ(x−b), with σacting on the vector x−bcomponent-wise.

Let Fnn denote the class of the FNN-ReLU functions

Fnn :=nf:Rp→R;f(x) = W(L)σb(L)◦ ··· ◦ W(1)σb(1) (x)o,

where

◦

is the composition operator,

is the number of layers (i.e. depth) of the network, and

for

l= 1,··· , L

W(l)

is a

Kl+1 ×Kl

-dimensional weight matrix with

being the number of

neurons in the

-th layer (i.e. width) of the network, with

K1=p

and

KL+1 = 1

, and

b(l)

is a

-dimensional vector. To avoid notation clutter, we concatenate all the network parameters as

Θ=(W(l), b(l), l = 1,··· , L)

and simply take

K2=··· =KL=K

. We also assume

to be

bounded:

kΘk∞≤B

for some universal constant

B > 0

. We may let the dependence on

explicit by writing Fnn as Fnn(L, K, B).

DeepMed

estimates

τtot, τNDE(d), τNIE(d)

(3)

, with the nuisance functions

a, f, µ

estimated using

Fnn

with the

-fold cross-ﬁtting strategy, summarized in Algorithm 1below; also see Farbmacher

et al. (2022).

DeepMed

inputs the observed data

D ≡ {Oi}N

i=1

and outputs the estimated total

effect

bτtot

, NDE

bτNDE (d)

and NIE

bτNIE (d)

, together with their variance estimators

bσ2

tot

bσ2

NDE(d)

and

bσ2

NIE(d).

Algorithm 1 DeepMed with V-fold cross-ﬁtting

1: Choose some integer V(usually V∈ {2,3,· · · ,10})

2: Split the Nobservations into Vsubsamples Iv⊂ {1,· · · , N} ≡ [N]with equal size n=N/V ;

3: for v= 1,· · · , V :do

4: Fit the nuisance functions by DNNs using observations in [N]\Iv

5: Compute the nuisance functions in the subsample Ivusing the estimated DNNs in step 4

Obtain

ψd(Oi),b

ψd,d0(Oi)}i∈Iv

for the subsample

based on

(2)

, respectively, with the nuisance

functions replaced by their estimates in step 5

7: end for

8: Estimate average potential outcomes by b

φ(d):=1

i=1 b

ψd(Oi),b

φ(d, d0):=1

i=1 b

ψd,d0(Oi)

9: Estimate causal effects by bτtot,bτNDE (d)and bτNIE (d)with b

φ(d)and b

φ(d, d0)

10: Estimate the variances of bτtot,bτNDE (d)and bτNIE (d)by:

bσ2

tot :=1

i=1

ψ1(Oi)−b

ψ0(Oi))2−1

Nbτ2

tot;bσ2

NDE(d):=1

i=1

ψ1,d(Oi)−b

ψ0,d(Oi))2−1

Nbτ2

NDE(d);

bσ2

NIE(d):=1

i=1

ψd,1(Oi)−b

ψd,0(Oi))2−1

Nbτ2

NIE(d)

Output: bτtot,bτNDE (d),bτNIE(d),bσ2

tot,bσ2

NDE(d)and bσ2

NIE(d)

Remark 3

(Continuous or multi-dimensional mediators)

For binary treatment

and continuous or

multi-dimensional M, to avoid nonparametric/high-dimensional conditional density estimation, we

can rewrite

f(m|x,d0)

a(d|x)f(m|x,d)

1−a(d|x,m)

a(d|x,m)(1−a(d|x))

by the Bayes’ rule and the integral w.r.t.

f(m|x, d0)

(2)

E[µ(X, d, M )|X=x, D =d0]

. Then we can ﬁrst estimate

µ(x, d, m)

bµ(x, d, m)

and

in turn estimate

E[µ(X, d, M )|X=x, D =d0]

by regressing

bµ(X, d, M)

against

(X, D)

using

the FNN-ReLU class. We mainly consider binary

to avoid unnecessary complications; but see

Appendix Gfor an example in which this strategy is used. Finally, the potential incompatibility

between models posited for

a(d|x)

and

a(d|x, m)

and the joint distribution of

(X, A, M, Y )

is not of

great concern under the semiparametric framework because all nuisance functions are estimated

nonparametrically; again, see Appendix Gfor an extended discussion.

3.2 Statistical properties of DeepMed: Non-sparse DNN architecture and low-dimensional

structures of the nuisance functions

According to Proposition 1, to analyze the statistical properties

DeepMed

, it is sufﬁcient to control

the

-estimation errors of nuisance function estimates

ba, b

f, bµ

ﬁt by DNNs. To ease presentation,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DeepMed:SemiparametricCausalMediationAnalysiswithDebiasedDeepLearningSiqiXuDepartmentofStatisticsandActuarialSciencesUniversityofHongKongHongKongSAR,Chinasqxu@hku.hkLinLiuInstituteofNaturalSciences,MOE-LSC,SchoolofMathematicalSciences,CMA-Shanghai,andSJTU-YaleJointCenterforBiostatisticsandDataScien...

展开>> 收起<<

DeepMed Semiparametric Causal Mediation Analysis with Debiased Deep Learning Siqi Xu.pdf

共28页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DeepMed Semiparametric Causal Mediation Analysis with Debiased Deep Learning Siqi Xu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: