statistical properties of causal effect estimators when the nuisance functions (the outcome regressions
and propensity scores) are estimated by DNNs. However, there are a few limitations in the current
literature that need to be addressed before the theoretical results can be used to guide practice:
(1) Most recent works mainly focus on total effect (Chen et al.,2020;Farrell et al.,2021). In many
settings, however, more intricate causal parameters are often of greater interests. In biomedical and
social sciences, one is often interested in “mediation analysis” to decompose the total effect into
direct and indirect effect to unpack the underlying black-box causal mechanism (Baron and Kenny,
1986). More recently, mediation analysis also percolated into machine learning fairness. For instance,
in the context of predicting the recidivism risk, Nabi and Shpitser (2018) argued that, for a “fair”
algorithm, sensitive features such as race should have no direct effect on the predicted recidivism
risk. If such direct effects can be accurately estimated, one can detect the potential unfairness of a
machine learning algorithm. We will revisit such applications in Section 5and Appendix G.
(2) Statistical properties of DNN-based causal estimators in recent works mostly follow from
several (recent) results on the convergence rates of DNN-based nonparametric regression estimators
(Suzuki,2019;Schmidt-Hieber,2020;Tsuji and Suzuki,2021), with the limitation of relying on
sparse DNN architectures. The theoretical properties are in turn evaluated by relatively simple
synthetic experiments not designed to generate nearly infinite-dimensional nuisance functions, a
setting considered by almost all the above related works.
The above limitations raise the tantalizing question whether the available statistical guarantees for
DNN-based causal inference have practical relevance. In this work, we plan to partially fill these gaps
by developing a new method called
DeepMed
for semiparametric mediation analysis with DNNs. We
focus on the Natural Direct/Indirect Effects (NDE/NIE) (Robins and Greenland,1992;Pearl,2001)
(defined in Section 2.1), but our results can also be applied to more general settings; see Remark 2.
The
DeepMed
estimators leverage the “multiply-robust” property of the efficient influence function
(EIF) of NDE/NIE (Tchetgen Tchetgen and Shpitser,2012;Farbmacher et al.,2022) (see Proposition
1in Section 2.2), together with the flexibility and superior predictive power of DNNs (see Section
3.1 and Algorithm 1). In particular, we also make the following novel contributions to deepen our
understanding of DNN-based semiparametric causal inference:
•
On the theoretical side, we obtain new results that our
DeepMed
method can achieve semi-
parametric efficiency bound without imposing sparsity constraints on the DNN architecture
and can adapt to certain low-dimensional structures of the nuisance functions (see Section
3.2), thus significantly advancing the existing literature on DNN-based semiparametric
causal inference. Non-sparse DNN architecture is more commonly employed in practice
(Farrell et al.,2021), and the low-dimensional structures of nuisance functions can help
avoid curse-of-dimensionality. These two points, taken together, significantly advance our
understanding of the statistical guarantee of DNN-based causal inference.
•
More importantly, on the empirical side, in Section 4, we designed sophisticated synthetic
experiments to simulate nearly infinite-dimensional functions, which are much more complex
than those in previous related works (Chen et al.,2020;Farrell et al.,2021;Adcock and
Dexter,2021). We emphasize that these nontrivial experiments could be of independent
interest to the theory of deep learning beyond causal inference, to further expose the gap
between deep learning theory and practice (Adcock and Dexter,2021;Gottschling et al.,
2020); see Remark 9for an extended discussion. As a proof of concept, in Section 5and
Appendix G, we also apply
DeepMed
to re-analyze two real-world datasets on algorithmic
fairness and reach similar conclusions to related works.
•
Finally, a user-friendly R package can be found at https://github.com/siqixu/DeepMed.
Making such resources available helps enhance reproducibility, a highly recognized problem
in all scientific disciplines, including (causal) machine learning (Pineau et al.,2021;Kaddour
et al.,2022).
2 Definition, identification, and estimation of NDE and NIE
2.1 Definition of NDE and NIE
Throughout this paper, we denote
Y
as the primary outcome of interest,
D
as a binary treatment
variable,
M
as the mediator on the causal pathway from
D
to
Y
, and
X∈[0,1]p
(or more generally,
2