
Figure 1: Inaccessibility to counterfactual data (e.g., a par-
allel universe where the treatments are reversed) makes
transferring causal knowledge more challenging.
method is needed for selecting the optimal source model
from multiple source tasks. This is discussed in Section 5,
where we introduce a framework endowed with a new task
affinity, namely the Causal Inference Task Affinity (CITA),
tailored explicitly for causal knowledge transfer. This task
affinity is used for selecting the “closest” source task. Subse-
quently its knowledge (e.g., trained models, source dataset)
is utilized in the learning of the target task, as depicted in
Figure 2. Our contributions are summarized below:
1.
We establish a new lower bound to demonstrate the
challenges of transferring ITE knowledge. Addition-
ally, we prove new regret bounds for learning the
counterfactual outcomes and ITEs of the target tasks
in causal transfer learning scenarios. These bounds
demonstrate the feasibility of transferring ITE knowl-
edge by stating that the error of any source model on
the target task is upper bounded by quantifiable mea-
sures related to (i) the performance of the source model
on the source task and (ii) the differences between the
source and the target causal inference tasks.
2.
We introduce CITA, a task affinity for causal inference,
which captures the symmetry of ITEs (i.e., invariance
to the relabeling of treatment assignments under the ac-
tion of the symmetric group). Additionally, we provide
theoretical (e.g., Theorem F.3) and empirical evidence
to show that CITA is highly correlated with the coun-
terfactuals loss, which is not measurable in practice.
3.
We propose an ITE estimation framework and a set of
causal inference datasets suitable for learning causal
knowledge transfer. The empirical evidence on the
above datasets demonstrates that our methods can es-
timate the ITEs for the target task with significantly
fewer (up to 95% reduction) data samples compared to
the case where transfer learning is not performed.
2 RELATED WORK
Many approaches in transfer learning [Thrun and Pratt,
2012, Blum and Mitchell, 1998, Silver and Bennett, 2008,
Sharif Razavian et al., 2014, Finn et al., 2016, Fernando
et al., 2017, Rusu et al., 2016, Le et al., 2020] have been
proposed, analyzed and applied in various machine learning
applications. Transfer learning techniques inherently assume
that prior knowledge in the selected source model helps with
learning a target task [Pan and Yang, 2010, Zhuang et al.,
2021]. In other words, these methods often do not consider
the selection of the base task to perform knowledge transfer.
Consequently, in some rare cases, transfer learning may
even degrade the performance of the model Standley et al.
[2020]. In order to avoid potential performance loss during
knowledge transfer to a target task, task affinity (or task
similarity) is considered as a selection method that identifies
a group of closest base candidates from the set of the prior
learned tasks. Task affinity has been investigated and applied
to various domains (e.g., transfer learning [Zamir et al.,
2018, Dwivedi and Roig, 2019, Wang et al., 2019], neural
architecture search [Le et al., 2021, 2022a, Le et al., 2021],
few-shot learning [Pal and Balasubramanian, 2019, Le et al.,
2022b], multi-task learning [Standley et al., 2020], continual
learning [Kirkpatrick et al., 2017, Chen et al., 2018]).
While transfer learning and task affinity have been inves-
tigated in numerous application areas, their applications
to causal inference have yet to be thoroughly investigated.
Neyman-Rubin Causal Model [Neyman, 1923, Donald,
2005] and Pearl’s Do-calculus [Pearl, 2009] are popular
frameworks for causal studies based on different perspec-
tives. A central question in the Neyman-Rubin Causal Model
framework is determining conditions for identifiability of
causal quantities such as Average and Individual Treatment
Effects. Previous work considered estimators for Average
Treatment Effect based on various methods such as Covari-
ate Adjustment [Rubin, 1978], weighting methods such as
those utilizing propensity scores [Rosenbaum and Rubin,
1983], and Doubly Robust estimators [Funk et al., 2011].
With the emergence of Machine Learning techniques, more
recent approaches to causal inference include the appli-
cations of decision trees[Wager and Athey, 2018, Athey
and Imbens, 2016], Gaussian Processes [Alaa and Van
Der Schaar, 2017], and Generative Modeling [Yoon et al.,
2018] to ITE estimation. In particular, deep neural networks
have successfully learned ITEs and estimated counterfactual
outcomes by data balancing in the latent domain [Johansson
et al., 2016, Shalit et al., 2017]. Please note that the trans-
portation of causal graphs is another well-studied closely
related field in the causality literature [Bareinboim and Pearl,
2012]. It studies transferring knowledge of causal relation-
ships in Pearl’s do-calculus framework. In contrast, in this
paper, we are interested in transferring knowledge of ITE
from a source task to a target task in the Neyman-Rubin
framework using representation learning. A closely related
problem to ours is the domain adaptation problem for ITE
estimation, as explored in [Bica and van der Schaar, 2022,
Vo et al., 2022, Aglietti et al., 2020]. These works primarily
focus on situations where only the distribution of popula-
tions changes, leaving the causal functions unaltered. In
our research, we provide theoretical analysis and empirical
studies for the case where both the population distributions
and the causal mechanisms can change.