Preprint COUNTERFACTUAL GENERATION UNDER CONFOUNDING

2025-05-02 0 0 7.83MB 16 页 10玖币

侵权投诉

Preprint

COUNTERFACTUAL GENERATION UNDER

CONFOUNDING

Abbavaram Gowtham Reddy∗

IIT Hyderabad, India

cs19resch11002@iith.ac.in

Saloni Dash*

Microsoft Research Bengaluru, India

salonidash77@gmail.com

Amit Sharma

Microsoft Research Bengaluru, India

amshar@microsoft.com

Vineeth N Balasubramanian

IIT Hyderabad, India

vineethnb@iith.ac.in

ABSTRACT

A machine learning model, under the inﬂuence of observed or unobserved con-

founders in the training data, can learn spurious correlations and fail to gener-

alize when deployed. For image classiﬁers, augmenting a training dataset using

counterfactual examples has been empirically shown to break spurious correla-

tions. However, the counterfactual generation task itself becomes more difﬁcult

as the level of confounding increases. Existing methods for counterfactual gen-

eration under confounding consider a ﬁxed set of interventions (e.g., texture, ro-

tation) and are not ﬂexible enough to capture diverse data-generating processes.

Given a causal generative process, we formally characterize the adverse effects

of confounding on any downstream tasks and show that the correlation between

generative factors (attributes) can be used to quantitatively measure confounding

between generative factors. To minimize such correlation, we propose a counter-

factual generation method that learns to modify the value of any attribute in an

image and generate new images given a set of observed attributes, even when the

dataset is highly confounded. These counterfactual images are then used to regu-

larize the downstream classiﬁer such that the learned representations are the same

across various generative factors conditioned on the class label. Our method is

computationally efﬁcient, simple to implement, and works well for any number

of generative factors and confounding variables. Our experimental results on both

synthetic (MNIST variants) and real-world (CelebA) datasets show the usefulness

of our approach.

1 INTRODUCTION

A confounder is a variable that causally inﬂuences two or more variables that are not necessarily

directly causally dependent (Pearl, 2001). Often, the presence of confounders in a data-generating

process is the reason for spurious correlations among variables in the observational data. The bias

caused by such confounders is inevitable in observational data, making it challenging to identify

invariant features representative of a target variable (Rothenh¨

ausler et al., 2021; Meinshausen &

B¨

uhlmann, 2015; Wang et al., 2022). For example, the demographic area an individual resides in

often confounds the race and perhaps the level of education that individual receives. Using such

observational data, if the goal is to predict an individual’s salary, a machine learning model may

exploit the spurious correlation between education and race even though those two variables should

ideally be treated as independent variables. Removing the effects of confounding in trained machine

learning models has shown to be helpful in various applications such as zero or few-shot learning,

disentanglement, domain generalization, counterfactual generation, algorithmic fairness, healthcare,

etc. (Suter et al., 2019; Kilbertus et al., 2020; Atzmon et al., 2020; Zhao et al., 2020; Yue et al., 2021;

Sauer & Geiger, 2021; Goel et al., 2021; Dash et al., 2022; Reddy et al., 2022; Dinga et al., 2020).

In observational data, confounding may be observed or unobserved and can pose various challenges

in learning models depending on the task. For example, disentangling spuriously correlated features

∗Equal contribution

arXiv:2210.12368v2 [cs.LG] 10 Dec 2022

Preprint

using generative modeling when there are confounders is challenging (Sauer & Geiger, 2021; Reddy

et al., 2022; Funke et al., 2022). As stated earlier, a classiﬁer may rely on non-causal features to

make predictions in the presence of confounders (Sch¨

olkopf et al., 2021). Recent years have seen

a few efforts to handle the spurious correlations caused by confounding effects in observational

data (Tr¨

auble et al., 2021; Sauer & Geiger, 2021; Goel et al., 2021; Reddy et al., 2022). However,

these methods either make strong assumptions on the underlying causal generative process or require

strong supervision. In this paper, we study the adversarial effect of confounding in observational data

on a classiﬁer’s performance and propose a mechanism to marginalize such effects when performing

data augmentation using counterfactual data. Counterfactual data generation provides a mechanism

to address such issues arising from confounding and building robust learning models without the

additional task of building complex generative models.

The causal generative processes considered throughout this paper are shown in Figure 1(a). We

assume that a set of generative factors (attributes) Z1, Z2, . . . , Zn(e.g., background, shape, texture)

and a label Y(e.g., cow)cause a real-world observation X(e.g., an image of a cow in a particular

background) through an unknown causal mechanism g(Peters et al., 2017b). To study the effects

of confounding, we consider Y, Z1, Z2, . . . , Znto be confounded by a set of confounding variables

C1, . . . , Cm(e.g., certain breeds of cows appear only in certain shapes or colors and appear only in

certain countries). Such causal generative processes have been considered earlier for other kinds of

tasks such as disentanglement Suter et al. (2019); Von K¨

ugelgen et al. (2021); Reddy et al. (2022).

The presence of confounding variables results in spurious correlations among generative factors in

the observed data, whose effect we aim to remove using counterfactual data augmentation.

Figure 1: (a) causal data generating process con-

sidered in this paper (CONIC = Ours); (b) causal

data generating process considered in CGN (Sauer

& Geiger, 2021).

A related recent effort by (Sauer & Geiger,

2021) proposes Counterfactual Generative Net-

works (CGN) to address this problem using

a data augmentation approach. This work as-

sumes each image to be composed of three

Independent Causal Mechanisms (ICMs) (Pe-

ters et al., 2017a) responsible for three ﬁxed

factors of variations: shape, texture, and back-

ground (as represented by Z1, Z2, and Z3in

Figure 1(b). This work then trains a generative

model that learns three ICMs for shape, texture,

and background separately, and combines them

in a deterministic fashion to generate observa-

tions. Once the ICMs are learned, sampling im-

ages by making interventions to these mecha-

nisms give counterfactual data that can be used along with training data to improve classiﬁcation

results. However, ﬁxing the architecture to speciﬁc number and types of mechanisms (shape, tex-

ture, background) is not generalizable, and may not directly be applicable to settings where the

number of underlying generative factors is unknown. It is also computationally expensive to train

different generative models for each aspect of an image such as texture,shape or background.

In this work, we begin with quantifying confounding in observational data that is generated by an

underlying causal graph (more general than considered by CGN) of the form shown in Figure 1(a).

We then provide a counterfactual data augmentation methodology called CONIC (COunterfactual

geNeratIon under Confounding). We hypothesize that the counterfactual images generated using the

proposed CONIC method provide a mechanism to marginalize the causal mechanisms responsible

for spurious correlations (i.e., causal arrows from Cito Zjfor some i, j). We take a generative mod-

eling approach and propose a neural network architecture based on conditional CycleGAN (Zhu

et al., 2017) to generate counterfactual images. The proposed architecture improves CycleGAN’s

ability to generate quality counterfactual images under confounded data by adding additional con-

trastive losses to distinguish between ﬁxed and modiﬁed features, while learning the cross domain

translations. To demonstrate the usefulness of such counterfactual images, we consider classiﬁcation

as a downstream task and study the performance of various models on unconfounded test set. Our

key contributions include:

• We formally quantify confounding in causal generative processes of the form in Fig 1(a), and

study the relationship between correlation and confounding between any pair of generative factors.

Preprint

• We present a counterfactual data augmentation methodology to generate counterfactual instances

of observed data, that can work even under highly confounded data (∼95% confounding) and

provides a mechanism to marginalize the causal mechanisms responsible for confounding.

• We modify conditional CycleGAN to improve the quality of generated counterfactuals. Our

method is computationally efﬁcient and easy to implement.

• Following previous work, we perform extensive experiments on well-known benchmarks – three

MNIST variants and CelebA datasets – to showcase the usefulness of our proposed methodology

in improving the accuracy of a downstream classiﬁer.

2 RELATED WORK

Counterfactual Inference: (Pearl, 2009), in his seminal text on causality, provided a three-step pro-

cedure for generation of a counterfactual data instance, given an observed instance: (i) Abduction:

abduct/recover the values of exogenous noise variables; (ii) Action: perform the required interven-

tion; and (iii) Prediction: generate the counterfactual instance. One however needs access to the

underlying structural causal model (SCM) to perform the above steps for counterfactual genera-

tion. Since real-world data do not come with an underlying SCM, many recent efforts have focused

on modeling the underlying causal mechanisms generating data under various assumptions. These

methods then perform the required intervention on speciﬁc variables in the learned model to generate

counterfactual instances that can be used for various downstream tasks such as classiﬁcation, fair-

ness, explanations etc. (Kusner et al., 2017; Joo & K¨

arkk¨

ainen, 2020; Denton et al., 2019; Zmigrod

et al., 2019; Pitis et al., 2020; Yoon et al., 2018; Bica et al., 2020; Pawlowski et al., 2020).

Generating Counterfactuals by Learning ICMs: In a more recent effort, assuming any real-world

image is generated with three independent causal mechanisms for shape, texture, background, and

acomposition mechanism of the ﬁrst three, (Sauer & Geiger, 2021) developed Counterfactual Gen-

erative Networks (CGN) that generate counterfactual images of a given image. CGN trains three

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014b) to learn shape, texture, back-

ground mechanisms and combine these three mechanisms using a composition mechanism gas

g(shape, texture, background) = shape texture + (1 −shape)background where is the

Hadamard product. Each of these independent mechanisms is given an input of noise vector uand

a label yspeciﬁc to that independent mechanism while training. Once the independent mechanisms

are trained, counterfactual images are generated by sampling a label and a noise vector correspond-

ing to each mechanism and then feeding the input to CGN. Finally, a classiﬁer is trained with both

original and counterfactual images to achieve better test time accuracy, showing the usefulness of

CGN. However, such deterministic nature of the architecture is not generalizable to the case where

the number of underlying generative factors are unknown and it is computationally infeasible to train

generative models for speciﬁc aspect of an image such as texture/background.

Disentanglement and Data Augmentation: The spurious correlations among generative factors

have been considered in disentanglement (Funke et al., 2022; von K¨

ugelgen et al., 2021). The gen-

eral idea in these efforts is to separate the causal predictive features from non-causal/spurious pre-

dictive features to predict an outcome. Our goal is different from disentanglement, and we focus on

the performance of a downstream classiﬁer instead of separating the sources of generative factors.

Traditional data augmentation methods such as rotation, scaling, corruption, etc. (Hendrycks et al.,

2020; Devries & Taylor, 2017; Zhang et al., 2018; Yun et al., 2019) do not consider the causal gen-

erative process and hence they can not remove the confounding in the images via data augmentation

(e.g., color and shape of an object can not be separated using simple augmentations). We hence

focus on counterfactual data augmentations that is focused on marginalizing the confounding effect

caused by confounders.

A similar effort to our paper is by (Goel et al., 2021) who use CycleGAN to generate counterfactual

data points. However, they focus on the performance of a subgroup (a subset of data with speciﬁc

properties) which is different from our goal of controlling confounding in the entire dataset. Another

recent work by (Wang et al., 2022) considers spurious correlations among generative factors and

uses CycleGAN to generate counterfactual images. Compared to these efforts, rather than using

CycleGAN directly, we propose a CycleGAN-based architecture that is optimized for controlled

generation using contrastive losses.

Applications of Counterfactuals: Augmenting the training data with appropriate counterfactual

data has shown to be helpful in many applications ranging from vision to natural language tasks (Joo

Preprint

& K¨

arkk¨

ainen, 2020; Lample et al., 2017; Kusner et al., 2017; Kaushik et al., 2019; Dash et al.,

2022). (Joo & K¨

arkk¨

ainen, 2020) identiﬁed existing biases in computer vision APIs deployed in

the real world by Amazon, Google, IBM, and Clarifai by looking at the differences made by those

APIs on counterfactual images that differ by protected/sensitive attributes (e.g., race and gender).

Using locally independent causal mechanisms, (Pitis et al., 2020) augmented training data with

counterfactual data points in a model-free reinforcement learning setting. Here, the idea is to use

any two factual trajectories of an episode and combine the two trajectories at a particular point

in time to generate the counterfactual data point, which will then be added to the replay buffer.

Independently factored samples are essential to get plausible and realistic counterfactual instances.

3 INFORMATION THEORETIC MEASURE OF CONFOUNDING

Background and Problem Formulation: Let {Z1, Z2, . . . , Zn}be a set of random variables denot-

ing the generative factors of an observed data point X, and Ybe the label of the observation X. Each

generative factor Zi(e.g., color) can take on a value form a discrete set of values {z1

i, . . . , zd

i}(e.g.,

red, green etc.). Let the set S={Y, Z1, . . . , Zn}generates Nreal-world observations {Xi}N

i=1

through an unknown causal mechanism g. Each Xican be thought of as an observation generated

using the causal mechanism gwith certain intervention on the variables in the set S. Variables in S

may potentially be confounded by a set of confounders C={C1, . . . , Cm}that denote real-world

confounding such as selection bias. Let Dbe the dataset of real-world observations along with cor-

responding values taken by {Y, Z1, . . . , Zn}. Causal graph in Figure 1(a) shows the general form

of this setting. From a causal effect perspective, each variable in Shas a direct causal inﬂuence on

the observation X(e.g., the causal edge Zi→X) and also has non-causal inﬂuence on Xvia the

confounding variables C1, . . . , Cm(e.g., Zi←Cj→Zk→Xfor some Cjand Zk). These paths

via the confounding variables, in which there is an incoming arrow to the variables in S, are also

referred to as backdoor paths (Pearl, 2001). Due to the presence of backdoor paths, we may observe

spurious correlations among the variables in Sin the observational data D.

In any downstream application where Dis used to train a model (e.g., classiﬁcation, disentanglement

etc.), it is desirable to minimize or remove the effect of confounding variable to ensure that a model

is not exploiting the spurious correlations in the data to arrive at a decision. In this paper, we present

a method to remove the effect of such confounding variables using counterfactual data augmenta-

tion. We start by studying the relationship between the amount of confounding and the correlation

between any pair of generative factors in causal processes of the form shown in Figure 1(a).

Deﬁnition 3.1. No Confounding (Pearl, 2009). In a causal directed acyclic graph (DAG) G=

(V,E), where Vdenotes the set of variables and Edenotes the set of directed edges denoting the

direction of causal inﬂuence among the variables in V, an ordered pair (Zi, Zj); Zi, Zj∈ V is

unconfounded if and only if p(Zi=zi|do(Zj=zj)) = p(Zi=zi|Zj=zj),∀zi, zj. Where

do(Zi=zi)denotes an intervention to the variable Ziwith the value zi. This deﬁnition can also be

extended to disjoint sets of random variables.

Deﬁnition 3.1 provides the notion of no confounding, however, to quantify the notion of confound-

ing between a pair of variables, we consider the following deﬁnition that relates the interventional

distribution p(Zi|do(Zj)) and the conditional distribution p(Zi|Zj).

Deﬁnition 3.2. (Directed Information (Raginsky, 2011; Wieczorek & Roth, 2019)). In a causal

directed acyclic graph (DAG) G= (V,E), where Vdenotes the set of variables and Edenotes the

set of directed edges denoting the direction of causal inﬂuence among the variables in V, the directed

information from a variable Zi∈ V to another variable Zj∈ V is denoted by I(Zi→Zj). It is

deﬁned as follows.

I(Zi→Zj):=DKL(p(Zi|Zj)||p(Zi|do(Zj))|p(Zj)) :=Ep(Zi,Zj)log p(Zi|Zj)

p(Zi|do(Zj)) (1)

Using Deﬁnitions 3.1 and 3.2, it is easy to see that the variables Ziand Zjare unconfounded if

and only if I(Zj→Zi) = 0. Non zero directed information I(Zj→Zi)entails that, p(Zi|Zj)6=

p(Zi|do(Zj)) and hence the presence of confounding (if there is no confounder, p(Zi|Zj)should be

equal to p(Zi|do(Zj))). Also, it is important to note that the directed information is not symmetric

(i.e., I(Zi→Zj)6=I(Zj→Zi)) (Jiao et al., 2013). We use this fact in deﬁning the measure

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PreprintCOUNTERFACTUALGENERATIONUNDERCONFOUNDINGAbbavaramGowthamReddyIITHyderabad,Indiacs19resch11002@iith.ac.inSaloniDash*MicrosoftResearchBengaluru,Indiasalonidash77@gmail.comAmitSharmaMicrosoftResearchBengaluru,Indiaamshar@microsoft.comVineethNBalasubramanianIITHyderabad,Indiavineethnb@iith.ac.i...

展开>> 收起<<

Preprint COUNTERFACTUAL GENERATION UNDER CONFOUNDING.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Preprint COUNTERFACTUAL GENERATION UNDER CONFOUNDING

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: