CORE A Retrieve-then-Edit Framework for Counterfactual Data Generation Tanay Dixit1Bhargavi Paranjape2Hannaneh Hajishirzi23Luke Zettlemoyer24

2025-05-06 0 0 1.85MB 22 页 10玖币

侵权投诉

CORE: A Retrieve-then-Edit Framework

for Counterfactual Data Generation

Tanay Dixit1Bhargavi Paranjape2Hannaneh Hajishirzi2,3Luke Zettlemoyer2,4

1Indian Institute of Technology, Madras

2Paul G. Allen School of Computer Science & Engineering, University of Washington

3Allen Institute of Artiﬁcial Intelligence, Seattle 4Meta AI

tanay.dixit@smail.iitm.ac.in

{bparan,hannaneh,lsz}@cs.washington.edu

Abstract

Counterfactual data augmentation (CDA) –

i.e., adding minimally perturbed inputs dur-

ing training – helps reduce model reliance

on spurious correlations and improves gen-

eralization to out-of-distribution (OOD) data.

Prior work on generating counterfactuals only

considered restricted classes of perturbations,

limiting their effectiveness. We present

COunterfactual Generation via Retrieval and

Editing (CORE), a retrieval-augmented gener-

ation framework for creating diverse counter-

factual perturbations for CDA. For each train-

ing example, CORE ﬁrst performs a dense re-

trieval over a task-related unlabeled text cor-

pus using a learned bi-encoder and extracts rel-

evant counterfactual excerpts. CORE then in-

corporates these into prompts to a large lan-

guage model with few-shot learning capabili-

ties, for counterfactual editing. Conditioning

language model edits on naturally occurring

data results in diverse perturbations. Experi-

ments on natural language inference and sen-

timent analysis benchmarks show that CORE

counterfactuals are more effective at improv-

ing generalization to OOD data compared to

other DA approaches. We also show that the

CORE retrieval framework can be used to en-

courage diversity in manually authored pertur-

bations 1.

1 Introduction

Contrast sets (Gardner et al.,2020) and counter-

factual data (Kaushik et al.,2020) provide mini-

mal input perturbations that change model predic-

tions, and serve as an effective means to evaluate

brittleness to out-of-distribution data (Wang et al.,

2021). Counterfactual data augmentation (CDA)

has shown to improve model robustness to OOD

data and input perturbations (Geva et al.,2021;Wu

et al.,2021;Paranjape et al.,2022;Khashabi et al.,

2020). Alternate methods like debiasing data (Wu

1Code at https://github.com/tanay2001/CORE

Control activities happen

everywhere in the entity.

There is a lack of control activities

everywhere in the entity.

Control activities exist everywhere

in the company.

Control activities occur at all levels

and functions of the entity.

Premise

Hypothesis

Control activities happen nowhere

in the entity.

Counterfactual Hypothesis

Control activities do not happen

everywhere in the entity.

Internal activities happen at some

levels in the entity

Figure 1: Diverse counterfactuals are generated for an

MNLI example. The red arrow represents the most

trivial way of generating a counterfactual hypothesis,

while the violet arrows depict several other perturba-

tions that intervene on different predictive features.

et al.,2022) have also shown promising results on

improving model robustness, but in this work we

focus on CDA strategies. Recently, Joshi and He

(2022) ﬁnd that diversity in the set of perturbations

of different predictive features is key to the effec-

tiveness of CDA (see Figure 1). In this paper, we

introduce

unterfactual Generation via

etrieval

and

diting (

CORE

) – retrieval augmented gen-

eration framework for creating diverse counterfac-

tual perturbations. CORE combines dense retrieval

with the few-shot learning capabilities of large lan-

guage models, while using minimal supervision

about perturbation type.

Retrieval-augmented models (Guu et al.,2020;

Lewis et al.,2020) learn to search over a dense

index of a text corpus to condition generation on re-

trieved texts and are especially effective at improv-

ing the diversity of generated text for paraphrase

generation (Chen et al.,2019) and style-transfer

(Xiao et al.,2021). CORE uses this insight by

learning to retrieve counterfactual excerpts from

a large text corpus. Arbitrarily conditioning on

these retrieved text excerpts to generate a rich set

of counterfactual perturbations, without explicit su-

pervision, can be challenging (Qin et al.,2022).

Instead, CORE uses few-shot prompting of mas-

sive pretrained language models, which is found

to be effective at controlled generation tasks like

arXiv:2210.04873v2 [cs.CL] 1 Nov 2022

arbitrary style-transfer (Reif et al.,2022). CORE

prompts GPT-3 (Brown et al.,2020;Wei et al.,

2022) with a few demonstrations of using these

excerpts for counterfactual editing.

The CORE retriever is a transformer-based bi-

encoder model trained using contrastive learning

(Le-Khac et al.,2020) on a small set of human-

authored counterfactuals for the task. For each

training example, CORE retrieves excerpts from

an unlabeled task-related corpus that bear a label-

ﬂipping counterfactual relationship with the origi-

nal input instance. Retrieval may extract excerpts

that have signiﬁcant semantic drift from input

text, while still containing relevant counterfactual

phrases (Table 1). Using prompts, the CORE GPT-

3 editor generates counterfactual edits to the in-

put conditioned on the retrieved excerpts (and the

original inputs). The prompts consist of instruc-

tions and a few demonstrations of using the re-

trieved text for editing. Unlike prior work that

use rule-based (Ribeiro et al.,2020) or semantic

frameworks (Wu et al.,2021;Ross et al.,2022) and

restrict perturbation types, CORE uses naturally

occurring data to encourage perturbation diversity.

Intrinsic evaluation of CORE counterfactuals

demonstrates a rich set of perturbation types which

existing methods like Wu et al. (2021) generate

(Table 7) and new perturbation types (Table 5) with

more diverse outputs (Table 6), without explicit

supervision. Our extensive data augmentation ex-

periments and analyses show that the combination

of retrieval and few-shot editing generates data for

CAD that is effective in reducing model biases and

improves performance on out of distribution (OOD)

and challenge test sets. Perturbing only 3% and 7%

of the data for NLI and Sentiment analysis respec-

tively, we achieve improvements up to 4.5% and

6.2% over standard DA (Tables 2,3). Additionally,

we show that CORE’s learned retriever can assist

humans in generating more diverse counterfactu-

als, spurring their creativity and reducing priming

effects (Gardner et al.,2021).

2 Related Work

Counterfactual Data Augmentation

There is

growing interest in the area of CDA for model

robustness, with early efforts focused on human-

authored counterfactuals (Kaushik et al.,2020;

Gardner et al.,2020). However, manual rewrites

can be costly and prone to systematic omissions.

Techniques have been proposed for the automatic

generation of counterfactual data or contrast sets

(Wu et al.,2021;Ross et al.,2022,2021;Bitton

et al.,2021;Asai and Hajishirzi,2020;Geva et al.,

2021;Madaan et al.,2021;Li et al.,2020). Ex-

isting techniques rely on using rules/heuristics for

perturbing sentences (Webster et al.,2020;Dua

et al.,2021;Ribeiro et al.,2020;Asai and Ha-

jishirzi,2020), or using sentence-level semantic

representations (eg. SRL) and a ﬁnite set of struc-

tured control codes (Geva et al.,2021;Ross et al.,

2022;Wu et al.,2021). However, Joshi and He

(2022) ﬁnd that a limited set of perturbation types

further exacerbates biases, resulting in poor gener-

alization to unseen perturbation types. Generally,

creating an assorted set of instance-speciﬁc per-

turbations is challenging, often requiring external

knowledge (Paranjape et al.,2022).

Retrieval Augmented Generation

Retrieving

task-relevant knowledge from a large corpus of

unstructured and unlabeled text has proven to be

very effective for knowledge-intensive language

generation tasks like question answering (Lewis

et al.,2020), machine translation (Gu et al.,2018)

and dialogue generation (Weston et al.,2018). Re-

trieval has also been used for paraphrase generation

(Chen et al.,2019) and style-transfer (Xiao et al.,

2021) to speciﬁcally address the lack of diversity

in generations from pretrained language models.

In a similar vein, CORE uses learned retrieval for

counterfactual generation. While Paranjape et al.

(2022) use off-the-shelf retrieval models to gen-

erate counterfactuals for QA, learning to retrieve

counterfactuals is non-trivial for problems other

than QA. CORE provides a recipe to train retrieval

for general tasks.

In-context learning

Massive language models like

GPT-3 have been found to be effective at controlled

generation tasks like arbitrary style-transfer (Reif

et al.,2022), counterfactual reasoning (Frohberg

and Binder,2022), step-wise reasoning for com-

plex problems (Wei et al.,2022;Zhou et al.,2022),

and dataset generation (Liu et al.,2022), by learn-

ing in-context from few-shot demonstrations and

natural language instructions (Wei et al.,2021).

While GPT-3 has been used for data augmentation,

it has not been used for counterfactual generation,

which is fundamentally different in nature.

3 Method

A high level overview of CORE is shown in Fig-

ure 2. The ﬁrst stage (§3.1) retrieves counterfactual

Retrieval

Index

Counterfactual

Retriever

The most opaque , self-indulgent and just

plain boring sequel to End Game

A light, engaging comedy with a

beautifully written plot. Congrats Disney,

a job well done!

The most delightful trailer ever

Spot on awesomeness in this

Obviously, this has to be Marvel.

Training Samples

Retrieved Samples

GPT3 Editor

Completely monotonous story with an

average plot

The most delightful , self-indulgent and just

spot on awesome sequel to End Game .

A light, monotonous comedy with

an average plot. Obviously, a job by Disney

Counterfactual Samples

Word

Extraction

Given a review, make use of the

given words to edit the review in

order to change the sentiment ..

Review: The movie is great

Words to use: "bad"

Edited: The movie is bad

Review: The most opaque...

Words to use: "delightful"

Edited:

Figure 2: Overview of CORE: COunterfactual Retrieval Editing framework. With the help of the 1 trained

counterfactual retriever we retrieve text excerpts from a large text corpus. These text excerpts are passed through

a simple word extraction module that extracts all the non stopwords, which are then used by 2 the Editor to edit

the given training instances to generate minimally edited label ﬂipped instances.

excerpts from a large unlabeled corpus related to

the target task. In the second stage (§3.2), retrieved

excerpts are supplied, along with instructions and

demonstrations, as a language-modeling prompt

to GPT-3 to counterfactually edit input text. The

resultant data is used for augmentation in §5.1.2.

We describe each stage below; additional imple-

mentation details are provided in Appendix A.

3.1 CF-DPR: Counterfactual Dense Passage

Retriever

Our counterfactual retriever is based on the dense

passage retrieval (DPR) framework (Karpukhin

et al.,2020). CF-DPR retrieves similar instances

from a large unlabeled corpus that have different

labels. Formally, given a training set,

N(x) =

{(x1, y1),(x2, y2),...,(xn, yn)}

, for a text clas-

siﬁcation task and a large corpus

, CF-DPR re-

trieves samples

C(x) = {ˆx1,ˆx2,...,ˆxn}

from

such that the associated labels for samples in

C(x)

are not the same as the corresponding labels in

N(x)

. Speciﬁcally,

ˆyi6=yi∀i∈(0, n)

, where

ˆyi

the class label for retrieved sample

ˆxi

. In Figure 2,

for the input “The most opaque , self-indulgent and

just plain boring sequel to End Game.”, CF-DPR

retrieves the excerpts “The most delightful trailer

ever" and “Spot on awesomeness in this”.

Training

We use the same contrastive learning

objective as Karpukhin et al. (2020) to train the

bi-encoder model. It consists of two independent

BERT (Devlin et al.,2018) encoders: a query en-

coder

that encodes

N(x)

and a doc-

ument encoder

that encodes text excerpts in

. To train the bi-encoder, we use a small seed

training dataset

[qi, p+

i, p−

i]m

i=1

of size

contain-

ing

m < |N(x)|

positive and negative retrieval

samples. For a given training instance

, we use

its corresponding positive sample

and hard neg-

atives

p−

’s to optimize the following loss function.

(1)

L({qi, p+

i, p−

i,1, ..p−

i,n}m

i=1)

=−log esim(qi,p+

esim(qi,p+

i)+Pn

j=1 esim(qi,p−

i,j )

To model the task of counterfactual retrieval, for

each training instance

xi=qi

, we use the corre-

sponding counterfactual instance as the positive

sample (

) and use paraphrases of

as the hard

negative (

p−

). Positive samples can be obtained

from a seed dataset consisting of manually au-

thored counterfactuals for existing NLU datasets

like IMDb, and SNLI (Kaushik et al.,2020;Gard-

ner et al.,2020). This manual data is of the form

T={(q1, p+

1),(q2, p+

2),...,(qm, p+

m)}

. We make

use of the diverse paraphraser model (Krishna

et al.,2020) that generates paraphrases as hard

negatives for

{q1, q2, . . . , qm}

{p−

1, p−

2, . . . , p−

Contrastive training pulls counterfactual samples

closer to

and pushes semantically identical

sentences

p−

away from

. We show that this

counterfactual retrieval framework can be used to

retrieve counterfactuals for tasks with only a small

amount of seed training data (§4.1). Additional

details about training and evaluation of the trained

CF-DPR are in Appendix A.

Inference

We create

for a speciﬁc task dataset

using (1) text corpora that have similar domains as

that dataset and (2) other datasets for the same task.

For instance, for sentiment analysis over IMDb, we

use a large (1.6 million) corpus of movie reviews

from Amazon (McAuley and Leskovec,2013) and

the Yelp review dataset (Asghar,2016). We en-

code the entire search corpus using the trained doc-

ument encoder and index it using FAISS (John-

son et al.,2019). For every input training in-

stance

, we retrieve top krelevant counterfactuals

{ˆx1

i,ˆx2

i,...,ˆxk

. We refer to these as CF-DPR

counterfactuals.

3.2 GPT-3 Editor

The retrieved counterfactuals often contain relevant

phrases that perturb predictive features in different

ways (“opaque”

→

“delighful”, “boring”

→

“spot-

on awesome” in Figure 2), but are typically not

a minimally edited version of the training sample.

“The most delightful trailer ever” has the opposite

sentiment as the original review, but is about an-

other entity. To incorporate perturbation diversity

while controlling for minimality, CORE uses an ed-

itor module. The Editor takes the training sample

and retrieved counterfactuals as input and generates

a minimally edited counterfactual. This involves

selecting parts of the retrieved text that maybe use-

ful in ﬂipping the original text’s label and then

seamlessly integrating them into the original input.

This ﬁne-grained instance-speciﬁc style-transfer

task can be hard to ﬁnd supervision for.

We use GPT-3 (Brown et al.,2020) since it has

been successfully used for ﬁne-grained style trans-

fer with few-shot supervision in prior work Reif

et al. (2021). For instance, GPT-3 can learn from

natural language constraints in its prompt, such

as “include the word balloon,” for constrained de-

coding. To prompt GPT-3, we make use of the

set of instructions that were provided in Kaushik

et al. (2020) to instruct crowd workers to author

counterfactuals. We also append four human au-

thored demonstrations of incorporating retrieved

data, depicting various perturbation types.

Following Reif et al. (2022), we simplify our

demonstrations by extracting keywords from the re-

trieved samples and providing them as token-level

constraints in the prompt. To encourage the model

to perturb certain classes of words, we remove de-

terminers, conjunctions, and punctuation from the

retrieved samples and tokenize the rest of the input

into a list of keywords: [

w1, . . . , wn

]. The resul-

tant demonstration in our prompt thus becomes:

“Input: ... Words to use: [

w1, . . . , wn

], Edited:...”.

This is motivated by Wu et al. (2021)’s observa-

tion that perturbing certain classes of words (like

preposition and adjectives) leads to better counter-

factual generation. More details about the prompt

construction are in Appendix A.

4 Experimental Setup

We generate CORE counterfactuals for two tasks –

sentiment classiﬁcation of movie reviews and natu-

ral language inference. We describe task-speciﬁc

details about CORE training and inference below.

4.1 Sentiment Classiﬁcation

Task Dataset (N(x))

We create CORE counter-

factuals for the IMDb movie review dataset (Maas

et al.,2011), which has been used to manually cre-

ate contrastive data (Kaushik et al.,2020;Gardner

et al.,2020). This dataset presents unique chal-

lenges due to the longer average length of reviews

(233 words), that existing counterfactual generation

techniques (Wu et al.,2021) struggle at.

CF-DPR training data

(

i, p−

)Kaushik et al.

(2020) augment a subset of the IMDb dataset

(1.7K examples) with human edited counterfactu-

als, which we use to train CF-DPR. Negative pairs

p−

iare created by paraphrase models.

Task-speciﬁc corpus (S)

We use datasets that

are of similar domain — Amazon Movie reviews

(McAuley and Leskovec,2013), Yelp reviews (As-

ghar,2016), and IMDb reviews (Maas et al.,2011).

Our initial experiments indicated that indexing full

movie reviews did not yield good CF-DPR per-

formance, owing to more dense retrieval noise

when encoding longer contexts (Luan et al.,2021).

Hence, we sentence tokenize the reviews and index

each sentence independently. The search corpus

contains approximately 8 million sentences.

Subset to augment N(x)

We generate CORE

counterfactuals for the same subset of

1.7k

reviews

from IMDb chosen by Kaushik et al. (2020) in

order to make fair comparisons in §5.

Key-word list [w1, . . . , wn]

We observed that sev-

eral sentences in the review contain descriptive

information (plot, cast, etc) that don’t convey senti-

ment. In Kaushik et al. (2020) too, humans edited

a few sentences to change the sentiment of a long

review. We extract these sentences containing po-

larity features, use them as queries to CF-DPR, and

retrieve their counterfactuals. This selection strat-

egy can also be extended to any review by learning

a sentence selection model on annotated data (DeY-

oung et al.,2020). The keyword list is extracted

from retrieved outputs as described in §3.2. Note

IMDb

Review:

It’s the worst movie I’ve ever seen. The action is so unclear, work of cameras is so poor, actors are so

affected ... and this lamentable 5 minutes of Arnie on the screen. My advice from the bottom of my heart - don’t

watch it unless you like such a low class torture. Sentiment: Negative

CF-DPR GPT-3 CORE

Probably the dozenth time

I’ve seen it. This is another

such effort. This is one spec-

tacular production. that is

deﬁnitely worth taking advan-

tage of.

It’s one of the best movies I’ve ever seen. The

action is so clear, work of cameras is so great,

actors are so unaffected ... and this wonderful

5 minutes of Arnie on the screen. My advice

from the bottom of my heart - watch it unless

you don’t like such a high class torture.

It’s the best movie I’ve ever seen. The ac-

tion is so clear, work of cameras is so great,

actors are so professional ... and this spec-

tacular 5 minutes of Arnie on the screen.

My advice from the bottom of my heart -

deﬁnitely watch it if you like such a high

class production.

MNLI

Premise: and my my uh taxes are a hundred and thirty ﬁve. Hypothesis: My taxes are $135 (Entails)

CF-DPR GPT-3 CORE

My sister spent over

$2,000

on a computer that

she’ll probably never use.

My taxes are not $135.

My taxes are probably over

$135

Premise:After the battle against the slave lord, the group needed it. Hypothesis:The slaves battled the group. (Contradicts)

CF-DPR GPT-3 CORE

The slaves never fought against their tyrannical cap-

tors.

The slaves battled the slave lord.

The slaves fought against their

captors.

Table 1: CORE generated counterfactuals for IMDb and MNLI, along with the CF-DPR retrieved outputs and the

independent GPT-3 Editor. For both the tasks, the retriever introduces several new words/phrases in the outputs.

that we do not impose any restrictions on the editor

regarding which sentences to edit.

4.2 Natural Language Inference

Task Dataset N(x)

We focus on MNLI

(Williams et al.,2018), a popular NLI dataset that

tests for complex language reasoning.

CF-DPR training data p+

i, p−

We use the in-

herent paired nature of MNLI. In MNLI, given

a premise, annotators are asked to manually

write one sentences that entail, contradict or are

neutral to the premise. These three hypothe-

ses serve as mutual counterfactuals. In this

work, we limit counterfactual perturbations to

entailment(E)

→

contradiction(C) and vice-versa, to

simplify the different permutations of positives and

negatives required for CF-DPR training. We ﬁnd

that including the neutral class leads to increasingly

noisy retrieved data, as the semantic differences be-

tween neutral class and the other two NLI classes

are subtle and hard to distinguish. In Equation 1,

is generated by concatenating the premise and hy-

pothesis separated by the special token

[SEP]

. For

every such input,

is a hypothesis from the coun-

terfactual class, while

p−

are diverse paraphrases

of the original hypothesis.

Task-speciﬁc corpus (S)

is constructed by com-

bining the following NLI datasets (Williams et al.,

2018;Bowman, Samuel R. and Angeli, Gabor and

Potts, Christopher, and Manning, Christopher D.,

2015;Liu et al.,2022) in addition to the source cor-

pus

that was used to generate premises in MNLI.

We also include tokenized wikipedia articles (Mer-

ity et al.,2016) as several domains in MNLI (Eg.

travel, government) are related. The search corpus

contains approximately 7 million text excerpts.

Subset to augment N(x)

To compare with the

state-of-the-art data augmentation technique for

MNLI, WaNLI (Liu et al.,2022), we choose a sub-

set of the MNLI dataset for augmentation based on

their selection strategy. WaNLI uses dataset cartog-

raphy (Swayamdipta et al.,2020) to select the most

ambiguous examples — where model conﬁdence

across training epochs is low and variance is high

— for augmentation. We generate 9.5K additional

examples in two classes.

Cross-Encoder

We incorporate a re-ranker mod-

ule to boost retrieval results for MNLI, that uses

a cross-encoder architecture (Thakur et al.,2021)

to jointly encode query

and top-K documents

retrieved by the bi-encoder. Given a

and

re-

trieved sentences from the bi-encoder, the re-ranker

learns to classify them as positive or negative. Dur-

ing inference, bi-encoder outputs are re-ranked

based on their cross-encoder probability. The cross

encoder is trained on the binary classiﬁcation task

on the same seed dataset as the bi-encoder. Re-

trieval performances are reported in Appendix C.

2http://www.anc.org/

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CORE:ARetrieve-then-EditFrameworkforCounterfactualDataGenerationTanayDixit1BhargaviParanjape2HannanehHajishirzi2;3LukeZettlemoyer2;41IndianInstituteofTechnology,Madras2PaulG.AllenSchoolofComputerScience&Engineering,UniversityofWashington3AllenInstituteofArticialIntelligence,Seattle4MetaAItanay.dixi...

展开>> 收起<<

CORE A Retrieve-then-Edit Framework for Counterfactual Data Generation Tanay Dixit1Bhargavi Paranjape2Hannaneh Hajishirzi23Luke Zettlemoyer24.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CORE A Retrieve-then-Edit Framework for Counterfactual Data Generation Tanay Dixit1Bhargavi Paranjape2Hannaneh Hajishirzi23Luke Zettlemoyer24

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: