Preprint COUNTERFACTUAL GENERATION UNDER CONFOUNDING

2025-05-02 0 0 7.83MB 16 页 10玖币
侵权投诉
Preprint
COUNTERFACTUAL GENERATION UNDER
CONFOUNDING
Abbavaram Gowtham Reddy
IIT Hyderabad, India
cs19resch11002@iith.ac.in
Saloni Dash*
Microsoft Research Bengaluru, India
salonidash77@gmail.com
Amit Sharma
Microsoft Research Bengaluru, India
amshar@microsoft.com
Vineeth N Balasubramanian
IIT Hyderabad, India
vineethnb@iith.ac.in
ABSTRACT
A machine learning model, under the influence of observed or unobserved con-
founders in the training data, can learn spurious correlations and fail to gener-
alize when deployed. For image classifiers, augmenting a training dataset using
counterfactual examples has been empirically shown to break spurious correla-
tions. However, the counterfactual generation task itself becomes more difficult
as the level of confounding increases. Existing methods for counterfactual gen-
eration under confounding consider a fixed set of interventions (e.g., texture, ro-
tation) and are not flexible enough to capture diverse data-generating processes.
Given a causal generative process, we formally characterize the adverse effects
of confounding on any downstream tasks and show that the correlation between
generative factors (attributes) can be used to quantitatively measure confounding
between generative factors. To minimize such correlation, we propose a counter-
factual generation method that learns to modify the value of any attribute in an
image and generate new images given a set of observed attributes, even when the
dataset is highly confounded. These counterfactual images are then used to regu-
larize the downstream classifier such that the learned representations are the same
across various generative factors conditioned on the class label. Our method is
computationally efficient, simple to implement, and works well for any number
of generative factors and confounding variables. Our experimental results on both
synthetic (MNIST variants) and real-world (CelebA) datasets show the usefulness
of our approach.
1 INTRODUCTION
A confounder is a variable that causally influences two or more variables that are not necessarily
directly causally dependent (Pearl, 2001). Often, the presence of confounders in a data-generating
process is the reason for spurious correlations among variables in the observational data. The bias
caused by such confounders is inevitable in observational data, making it challenging to identify
invariant features representative of a target variable (Rothenh¨
ausler et al., 2021; Meinshausen &
B¨
uhlmann, 2015; Wang et al., 2022). For example, the demographic area an individual resides in
often confounds the race and perhaps the level of education that individual receives. Using such
observational data, if the goal is to predict an individual’s salary, a machine learning model may
exploit the spurious correlation between education and race even though those two variables should
ideally be treated as independent variables. Removing the effects of confounding in trained machine
learning models has shown to be helpful in various applications such as zero or few-shot learning,
disentanglement, domain generalization, counterfactual generation, algorithmic fairness, healthcare,
etc. (Suter et al., 2019; Kilbertus et al., 2020; Atzmon et al., 2020; Zhao et al., 2020; Yue et al., 2021;
Sauer & Geiger, 2021; Goel et al., 2021; Dash et al., 2022; Reddy et al., 2022; Dinga et al., 2020).
In observational data, confounding may be observed or unobserved and can pose various challenges
in learning models depending on the task. For example, disentangling spuriously correlated features
Equal contribution
1
arXiv:2210.12368v2 [cs.LG] 10 Dec 2022
Preprint
using generative modeling when there are confounders is challenging (Sauer & Geiger, 2021; Reddy
et al., 2022; Funke et al., 2022). As stated earlier, a classifier may rely on non-causal features to
make predictions in the presence of confounders (Sch¨
olkopf et al., 2021). Recent years have seen
a few efforts to handle the spurious correlations caused by confounding effects in observational
data (Tr¨
auble et al., 2021; Sauer & Geiger, 2021; Goel et al., 2021; Reddy et al., 2022). However,
these methods either make strong assumptions on the underlying causal generative process or require
strong supervision. In this paper, we study the adversarial effect of confounding in observational data
on a classifier’s performance and propose a mechanism to marginalize such effects when performing
data augmentation using counterfactual data. Counterfactual data generation provides a mechanism
to address such issues arising from confounding and building robust learning models without the
additional task of building complex generative models.
The causal generative processes considered throughout this paper are shown in Figure 1(a). We
assume that a set of generative factors (attributes) Z1, Z2, . . . , Zn(e.g., background, shape, texture)
and a label Y(e.g., cow)cause a real-world observation X(e.g., an image of a cow in a particular
background) through an unknown causal mechanism g(Peters et al., 2017b). To study the effects
of confounding, we consider Y, Z1, Z2, . . . , Znto be confounded by a set of confounding variables
C1, . . . , Cm(e.g., certain breeds of cows appear only in certain shapes or colors and appear only in
certain countries). Such causal generative processes have been considered earlier for other kinds of
tasks such as disentanglement Suter et al. (2019); Von K¨
ugelgen et al. (2021); Reddy et al. (2022).
The presence of confounding variables results in spurious correlations among generative factors in
the observed data, whose effect we aim to remove using counterfactual data augmentation.
Figure 1: (a) causal data generating process con-
sidered in this paper (CONIC = Ours); (b) causal
data generating process considered in CGN (Sauer
& Geiger, 2021).
A related recent effort by (Sauer & Geiger,
2021) proposes Counterfactual Generative Net-
works (CGN) to address this problem using
a data augmentation approach. This work as-
sumes each image to be composed of three
Independent Causal Mechanisms (ICMs) (Pe-
ters et al., 2017a) responsible for three fixed
factors of variations: shape, texture, and back-
ground (as represented by Z1, Z2, and Z3in
Figure 1(b). This work then trains a generative
model that learns three ICMs for shape, texture,
and background separately, and combines them
in a deterministic fashion to generate observa-
tions. Once the ICMs are learned, sampling im-
ages by making interventions to these mecha-
nisms give counterfactual data that can be used along with training data to improve classification
results. However, fixing the architecture to specific number and types of mechanisms (shape, tex-
ture, background) is not generalizable, and may not directly be applicable to settings where the
number of underlying generative factors is unknown. It is also computationally expensive to train
different generative models for each aspect of an image such as texture,shape or background.
In this work, we begin with quantifying confounding in observational data that is generated by an
underlying causal graph (more general than considered by CGN) of the form shown in Figure 1(a).
We then provide a counterfactual data augmentation methodology called CONIC (COunterfactual
geNeratIon under Confounding). We hypothesize that the counterfactual images generated using the
proposed CONIC method provide a mechanism to marginalize the causal mechanisms responsible
for spurious correlations (i.e., causal arrows from Cito Zjfor some i, j). We take a generative mod-
eling approach and propose a neural network architecture based on conditional CycleGAN (Zhu
et al., 2017) to generate counterfactual images. The proposed architecture improves CycleGAN’s
ability to generate quality counterfactual images under confounded data by adding additional con-
trastive losses to distinguish between fixed and modified features, while learning the cross domain
translations. To demonstrate the usefulness of such counterfactual images, we consider classification
as a downstream task and study the performance of various models on unconfounded test set. Our
key contributions include:
We formally quantify confounding in causal generative processes of the form in Fig 1(a), and
study the relationship between correlation and confounding between any pair of generative factors.
2
Preprint
We present a counterfactual data augmentation methodology to generate counterfactual instances
of observed data, that can work even under highly confounded data (95% confounding) and
provides a mechanism to marginalize the causal mechanisms responsible for confounding.
We modify conditional CycleGAN to improve the quality of generated counterfactuals. Our
method is computationally efficient and easy to implement.
Following previous work, we perform extensive experiments on well-known benchmarks – three
MNIST variants and CelebA datasets – to showcase the usefulness of our proposed methodology
in improving the accuracy of a downstream classifier.
2 RELATED WORK
Counterfactual Inference: (Pearl, 2009), in his seminal text on causality, provided a three-step pro-
cedure for generation of a counterfactual data instance, given an observed instance: (i) Abduction:
abduct/recover the values of exogenous noise variables; (ii) Action: perform the required interven-
tion; and (iii) Prediction: generate the counterfactual instance. One however needs access to the
underlying structural causal model (SCM) to perform the above steps for counterfactual genera-
tion. Since real-world data do not come with an underlying SCM, many recent efforts have focused
on modeling the underlying causal mechanisms generating data under various assumptions. These
methods then perform the required intervention on specific variables in the learned model to generate
counterfactual instances that can be used for various downstream tasks such as classification, fair-
ness, explanations etc. (Kusner et al., 2017; Joo & K¨
arkk¨
ainen, 2020; Denton et al., 2019; Zmigrod
et al., 2019; Pitis et al., 2020; Yoon et al., 2018; Bica et al., 2020; Pawlowski et al., 2020).
Generating Counterfactuals by Learning ICMs: In a more recent effort, assuming any real-world
image is generated with three independent causal mechanisms for shape, texture, background, and
acomposition mechanism of the first three, (Sauer & Geiger, 2021) developed Counterfactual Gen-
erative Networks (CGN) that generate counterfactual images of a given image. CGN trains three
Generative Adversarial Networks (GANs) (Goodfellow et al., 2014b) to learn shape, texture, back-
ground mechanisms and combine these three mechanisms using a composition mechanism gas
g(shape, texture, background) = shape texture + (1 shape)background where is the
Hadamard product. Each of these independent mechanisms is given an input of noise vector uand
a label yspecific to that independent mechanism while training. Once the independent mechanisms
are trained, counterfactual images are generated by sampling a label and a noise vector correspond-
ing to each mechanism and then feeding the input to CGN. Finally, a classifier is trained with both
original and counterfactual images to achieve better test time accuracy, showing the usefulness of
CGN. However, such deterministic nature of the architecture is not generalizable to the case where
the number of underlying generative factors are unknown and it is computationally infeasible to train
generative models for specific aspect of an image such as texture/background.
Disentanglement and Data Augmentation: The spurious correlations among generative factors
have been considered in disentanglement (Funke et al., 2022; von K¨
ugelgen et al., 2021). The gen-
eral idea in these efforts is to separate the causal predictive features from non-causal/spurious pre-
dictive features to predict an outcome. Our goal is different from disentanglement, and we focus on
the performance of a downstream classifier instead of separating the sources of generative factors.
Traditional data augmentation methods such as rotation, scaling, corruption, etc. (Hendrycks et al.,
2020; Devries & Taylor, 2017; Zhang et al., 2018; Yun et al., 2019) do not consider the causal gen-
erative process and hence they can not remove the confounding in the images via data augmentation
(e.g., color and shape of an object can not be separated using simple augmentations). We hence
focus on counterfactual data augmentations that is focused on marginalizing the confounding effect
caused by confounders.
A similar effort to our paper is by (Goel et al., 2021) who use CycleGAN to generate counterfactual
data points. However, they focus on the performance of a subgroup (a subset of data with specific
properties) which is different from our goal of controlling confounding in the entire dataset. Another
recent work by (Wang et al., 2022) considers spurious correlations among generative factors and
uses CycleGAN to generate counterfactual images. Compared to these efforts, rather than using
CycleGAN directly, we propose a CycleGAN-based architecture that is optimized for controlled
generation using contrastive losses.
Applications of Counterfactuals: Augmenting the training data with appropriate counterfactual
data has shown to be helpful in many applications ranging from vision to natural language tasks (Joo
3
Preprint
& K¨
arkk¨
ainen, 2020; Lample et al., 2017; Kusner et al., 2017; Kaushik et al., 2019; Dash et al.,
2022). (Joo & K¨
arkk¨
ainen, 2020) identified existing biases in computer vision APIs deployed in
the real world by Amazon, Google, IBM, and Clarifai by looking at the differences made by those
APIs on counterfactual images that differ by protected/sensitive attributes (e.g., race and gender).
Using locally independent causal mechanisms, (Pitis et al., 2020) augmented training data with
counterfactual data points in a model-free reinforcement learning setting. Here, the idea is to use
any two factual trajectories of an episode and combine the two trajectories at a particular point
in time to generate the counterfactual data point, which will then be added to the replay buffer.
Independently factored samples are essential to get plausible and realistic counterfactual instances.
3 INFORMATION THEORETIC MEASURE OF CONFOUNDING
Background and Problem Formulation: Let {Z1, Z2, . . . , Zn}be a set of random variables denot-
ing the generative factors of an observed data point X, and Ybe the label of the observation X. Each
generative factor Zi(e.g., color) can take on a value form a discrete set of values {z1
i, . . . , zd
i}(e.g.,
red, green etc.). Let the set S={Y, Z1, . . . , Zn}generates Nreal-world observations {Xi}N
i=1
through an unknown causal mechanism g. Each Xican be thought of as an observation generated
using the causal mechanism gwith certain intervention on the variables in the set S. Variables in S
may potentially be confounded by a set of confounders C={C1, . . . , Cm}that denote real-world
confounding such as selection bias. Let Dbe the dataset of real-world observations along with cor-
responding values taken by {Y, Z1, . . . , Zn}. Causal graph in Figure 1(a) shows the general form
of this setting. From a causal effect perspective, each variable in Shas a direct causal influence on
the observation X(e.g., the causal edge ZiX) and also has non-causal influence on Xvia the
confounding variables C1, . . . , Cm(e.g., ZiCjZkXfor some Cjand Zk). These paths
via the confounding variables, in which there is an incoming arrow to the variables in S, are also
referred to as backdoor paths (Pearl, 2001). Due to the presence of backdoor paths, we may observe
spurious correlations among the variables in Sin the observational data D.
In any downstream application where Dis used to train a model (e.g., classification, disentanglement
etc.), it is desirable to minimize or remove the effect of confounding variable to ensure that a model
is not exploiting the spurious correlations in the data to arrive at a decision. In this paper, we present
a method to remove the effect of such confounding variables using counterfactual data augmenta-
tion. We start by studying the relationship between the amount of confounding and the correlation
between any pair of generative factors in causal processes of the form shown in Figure 1(a).
Definition 3.1. No Confounding (Pearl, 2009). In a causal directed acyclic graph (DAG) G=
(V,E), where Vdenotes the set of variables and Edenotes the set of directed edges denoting the
direction of causal influence among the variables in V, an ordered pair (Zi, Zj); Zi, Zj V is
unconfounded if and only if p(Zi=zi|do(Zj=zj)) = p(Zi=zi|Zj=zj),zi, zj. Where
do(Zi=zi)denotes an intervention to the variable Ziwith the value zi. This definition can also be
extended to disjoint sets of random variables.
Definition 3.1 provides the notion of no confounding, however, to quantify the notion of confound-
ing between a pair of variables, we consider the following definition that relates the interventional
distribution p(Zi|do(Zj)) and the conditional distribution p(Zi|Zj).
Definition 3.2. (Directed Information (Raginsky, 2011; Wieczorek & Roth, 2019)). In a causal
directed acyclic graph (DAG) G= (V,E), where Vdenotes the set of variables and Edenotes the
set of directed edges denoting the direction of causal influence among the variables in V, the directed
information from a variable Zi∈ V to another variable Zj∈ V is denoted by I(ZiZj). It is
defined as follows.
I(ZiZj):=DKL(p(Zi|Zj)||p(Zi|do(Zj))|p(Zj)) :=Ep(Zi,Zj)log p(Zi|Zj)
p(Zi|do(Zj)) (1)
Using Definitions 3.1 and 3.2, it is easy to see that the variables Ziand Zjare unconfounded if
and only if I(ZjZi) = 0. Non zero directed information I(ZjZi)entails that, p(Zi|Zj)6=
p(Zi|do(Zj)) and hence the presence of confounding (if there is no confounder, p(Zi|Zj)should be
equal to p(Zi|do(Zj))). Also, it is important to note that the directed information is not symmetric
(i.e., I(ZiZj)6=I(ZjZi)) (Jiao et al., 2013). We use this fact in defining the measure
4
摘要:

PreprintCOUNTERFACTUALGENERATIONUNDERCONFOUNDINGAbbavaramGowthamReddyIITHyderabad,Indiacs19resch11002@iith.ac.inSaloniDash*MicrosoftResearchBengaluru,Indiasalonidash77@gmail.comAmitSharmaMicrosoftResearchBengaluru,Indiaamshar@microsoft.comVineethNBalasubramanianIITHyderabad,Indiavineethnb@iith.ac.i...

展开>> 收起<<
Preprint COUNTERFACTUAL GENERATION UNDER CONFOUNDING.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:7.83MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注