
(d)
(a)
(c)
(b)
(e)
X
SS
YY
VV
X
S
Y
V
X YY
DD OO
XA
XAXC
XC
X Y
D O
XAXC
Xns
Xns
Xs
Xs
EE
YYXX Xns
Xs
E
YX
ZZ
OO
CC
YYXX
EE
DD
Z
O
C
YX
E
D OODD
XX YY
CC
ZZ
EE
Figure 1: Comparisons between the causal graphs of CCM
(a), (b) and the previous methods (c) (Liu et al. 2021; Sun
et al. 2021), (d) (Mahajan, Tople, and Sharma 2021), and
(e) (Wald et al. 2021). In Figure (a), based on Figure (c),
Figure (d), and Figure (e), we add prior knowledge Zas a
bridge to link unseen image Xand label Yand make domain
Dpoint to object Oand category factor Cto explain the
limitations of source domain on models in DG. In Figure (b),
by controlling domain D, the remaining part is a standard
causal graph that can calculate causal effects from Xto Y
via the front-door criterion.
gory C. Compared to Figure 1 (c), Figure 1 (d) and Figure
1 (e), we add prior knowledge Zas a bridge to link unseen
image Xand label Y. The the relationships between domain
Dand other factors can be explained more clearly by figure
1 (b). In Figure 1 (a), the domain Das a confounder disturbs
models to learn causal effects from image Xto labels Y. So
we control domain Dto cut off D→Oand D→E, and
the remaining part is a standard causal graph that can calcu-
late causal effects from Xto Yvia the front-door criterion,
as shown in Figure 1 (b).
To learn the causal effects of Xto Yshown in Figure 1
(b), we introduce the front-door criterion. It splits the causal
effects of P(Y|do(X)) into the estimation of three parts:
P(X),P(Z|X), and P(Y|Z, X). Furthermore, to permit
stable distribution estimation under causal learning, we fur-
ther design a contrastive training paradigm that calibrates
the learning process with the similarity of the current and
previous knowledge to strengthen true causal effects.
Our main contributions are summarized as follows. (i) We
develop a Contrastive Causal Model to transfer unseen im-
ages into taught knowledge that, and quantify the causal ef-
fects between images and labels based on taught knowledge.
(ii) We propose an inclusive causal graph that can explain
the inference of domain in the DG task. Based on this graph,
our model cuts off the excess causal paths and quantifies the
causal effects between images and labels via the front-door
criterion. (iii) Extensive experiments on public benchmark
datasets demonstrate the effectiveness and superiority of our
method.
Related Work
Domain Generalization
Domain generalization (DG) aims to learn from multiple
source domains a model that can perform well on unseen tar-
get domains. Data augmentation-based methods (Volpi et al.
2018; Shankar et al. 2018; Carlucci et al. 2019; Wang et al.
2020; Zhou et al. 2020b,a, 2021) try to improve the general-
ization robustness of the model by learning from the data
with novel distributions. Among them, some work (Volpi
et al. 2018; Shankar et al. 2018) generates new data based on
model gradient and leverages it to train a model for boost-
ing its robustness. While others (Wang et al. 2020; Carlucci
et al. 2019) introduce an interesting jigsaw puzzle strategy
that improves model out-of-distribution generalization via
self-supervised learning. Adversarial training (Zhou et al.
2020b,a) is also employed to generate data with various
styles yet consistent semantic information. Meta-learning
(Balaji, Sankaranarayanan, and Chellappa 2018; Li et al.
2018a; Dou et al. 2019; Li et al. 2019a,b) is also a popu-
lar topic in DG. The idea is similar to the problem setting
of DG: learning from the known and preparing for inference
from the unknown. However, it might not be easy to design
effective meta-learning strategies for training a generaliz-
able model. Another conventional direction is to perform in-
variant representation learning (Zhao et al. 2020; Matsuura
and Harada 2020; Li et al. 2018d,c). These methods try to
learn the feature representations that are discriminative for
the classification task but invariant to the domain changes.
For example, (Zhao et al. 2020) proposes conditional en-
tropy regularization to extract effective conditional invariant
feature representations. While favorable results have been
achieved by these approaches, they might try to model the
statistical dependence between the input features and the la-
bels, hence could be biased by the spurious correlation (Liu
et al. 2021).
Domain Generalization with Causality
In this paper, we assume the data is generated from the root
factors of the object Oand domain Das shown in Figure 1
(a). The class features Ccontrol both the input feature Xand
the label Y, meanwhile, the environment feature Eonly af-
fects X. We aim to learn an informative representation from
Xto predict Y. (Liu et al. 2021) proposes a causal semantic
generative model (see Figure 1 (c)). It separates the latent
semantic factor Sand variation factor Vfrom data, where
only the former causes the change in label Y. Similarly, (Sun
et al. 2021) introduces latent causal invariant models based
on the same causal model structure. Their semantic factor S
and variation factor Vare similar to the class feature Cand
the environment feature Ein our causal graph respectively,
while we further show their causal relationship with domain
and object. (Mahajan, Tople, and Sharma 2021) proposes a
causal graph with the domain Dand object Owhich is simi-
lar to ours, as shown in Figure 1 (d). It assumes that the input
feature Xis determined by causal feature XCand domain-
dependent feature XA, and the label Yis determined by XC.
Actually, the representation Z(Figure 1 (a)) that we aim to
learn is to capture the information of causal feature XC(Fig-