CORE A Retrieve-then-Edit Framework for Counterfactual Data Generation Tanay Dixit1Bhargavi Paranjape2Hannaneh Hajishirzi23Luke Zettlemoyer24

2025-05-06 0 0 1.85MB 22 页 10玖币
侵权投诉
CORE: A Retrieve-then-Edit Framework
for Counterfactual Data Generation
Tanay Dixit1Bhargavi Paranjape2Hannaneh Hajishirzi2,3Luke Zettlemoyer2,4
1Indian Institute of Technology, Madras
2Paul G. Allen School of Computer Science & Engineering, University of Washington
3Allen Institute of Artificial Intelligence, Seattle 4Meta AI
tanay.dixit@smail.iitm.ac.in
{bparan,hannaneh,lsz}@cs.washington.edu
Abstract
Counterfactual data augmentation (CDA) –
i.e., adding minimally perturbed inputs dur-
ing training – helps reduce model reliance
on spurious correlations and improves gen-
eralization to out-of-distribution (OOD) data.
Prior work on generating counterfactuals only
considered restricted classes of perturbations,
limiting their effectiveness. We present
COunterfactual Generation via Retrieval and
Editing (CORE), a retrieval-augmented gener-
ation framework for creating diverse counter-
factual perturbations for CDA. For each train-
ing example, CORE first performs a dense re-
trieval over a task-related unlabeled text cor-
pus using a learned bi-encoder and extracts rel-
evant counterfactual excerpts. CORE then in-
corporates these into prompts to a large lan-
guage model with few-shot learning capabili-
ties, for counterfactual editing. Conditioning
language model edits on naturally occurring
data results in diverse perturbations. Experi-
ments on natural language inference and sen-
timent analysis benchmarks show that CORE
counterfactuals are more effective at improv-
ing generalization to OOD data compared to
other DA approaches. We also show that the
CORE retrieval framework can be used to en-
courage diversity in manually authored pertur-
bations 1.
1 Introduction
Contrast sets (Gardner et al.,2020) and counter-
factual data (Kaushik et al.,2020) provide mini-
mal input perturbations that change model predic-
tions, and serve as an effective means to evaluate
brittleness to out-of-distribution data (Wang et al.,
2021). Counterfactual data augmentation (CDA)
has shown to improve model robustness to OOD
data and input perturbations (Geva et al.,2021;Wu
et al.,2021;Paranjape et al.,2022;Khashabi et al.,
2020). Alternate methods like debiasing data (Wu
1Code at https://github.com/tanay2001/CORE
Control activities happen
everywhere in the entity.
There is a lack of control activities
everywhere in the entity.
Control activities exist everywhere
in the company.
Control activities occur at all levels
and functions of the entity.
Premise
Hypothesis
Control activities happen nowhere
in the entity.
Counterfactual Hypothesis
Control activities do not happen
everywhere in the entity.
Internal activities happen at some
levels in the entity
Figure 1: Diverse counterfactuals are generated for an
MNLI example. The red arrow represents the most
trivial way of generating a counterfactual hypothesis,
while the violet arrows depict several other perturba-
tions that intervene on different predictive features.
et al.,2022) have also shown promising results on
improving model robustness, but in this work we
focus on CDA strategies. Recently, Joshi and He
(2022) find that diversity in the set of perturbations
of different predictive features is key to the effec-
tiveness of CDA (see Figure 1). In this paper, we
introduce
CO
unterfactual Generation via
R
etrieval
and
E
diting (
CORE
) – retrieval augmented gen-
eration framework for creating diverse counterfac-
tual perturbations. CORE combines dense retrieval
with the few-shot learning capabilities of large lan-
guage models, while using minimal supervision
about perturbation type.
Retrieval-augmented models (Guu et al.,2020;
Lewis et al.,2020) learn to search over a dense
index of a text corpus to condition generation on re-
trieved texts and are especially effective at improv-
ing the diversity of generated text for paraphrase
generation (Chen et al.,2019) and style-transfer
(Xiao et al.,2021). CORE uses this insight by
learning to retrieve counterfactual excerpts from
a large text corpus. Arbitrarily conditioning on
these retrieved text excerpts to generate a rich set
of counterfactual perturbations, without explicit su-
pervision, can be challenging (Qin et al.,2022).
Instead, CORE uses few-shot prompting of mas-
sive pretrained language models, which is found
to be effective at controlled generation tasks like
arXiv:2210.04873v2 [cs.CL] 1 Nov 2022
arbitrary style-transfer (Reif et al.,2022). CORE
prompts GPT-3 (Brown et al.,2020;Wei et al.,
2022) with a few demonstrations of using these
excerpts for counterfactual editing.
The CORE retriever is a transformer-based bi-
encoder model trained using contrastive learning
(Le-Khac et al.,2020) on a small set of human-
authored counterfactuals for the task. For each
training example, CORE retrieves excerpts from
an unlabeled task-related corpus that bear a label-
flipping counterfactual relationship with the origi-
nal input instance. Retrieval may extract excerpts
that have significant semantic drift from input
text, while still containing relevant counterfactual
phrases (Table 1). Using prompts, the CORE GPT-
3 editor generates counterfactual edits to the in-
put conditioned on the retrieved excerpts (and the
original inputs). The prompts consist of instruc-
tions and a few demonstrations of using the re-
trieved text for editing. Unlike prior work that
use rule-based (Ribeiro et al.,2020) or semantic
frameworks (Wu et al.,2021;Ross et al.,2022) and
restrict perturbation types, CORE uses naturally
occurring data to encourage perturbation diversity.
Intrinsic evaluation of CORE counterfactuals
demonstrates a rich set of perturbation types which
existing methods like Wu et al. (2021) generate
(Table 7) and new perturbation types (Table 5) with
more diverse outputs (Table 6), without explicit
supervision. Our extensive data augmentation ex-
periments and analyses show that the combination
of retrieval and few-shot editing generates data for
CAD that is effective in reducing model biases and
improves performance on out of distribution (OOD)
and challenge test sets. Perturbing only 3% and 7%
of the data for NLI and Sentiment analysis respec-
tively, we achieve improvements up to 4.5% and
6.2% over standard DA (Tables 2,3). Additionally,
we show that CORE’s learned retriever can assist
humans in generating more diverse counterfactu-
als, spurring their creativity and reducing priming
effects (Gardner et al.,2021).
2 Related Work
Counterfactual Data Augmentation
There is
growing interest in the area of CDA for model
robustness, with early efforts focused on human-
authored counterfactuals (Kaushik et al.,2020;
Gardner et al.,2020). However, manual rewrites
can be costly and prone to systematic omissions.
Techniques have been proposed for the automatic
generation of counterfactual data or contrast sets
(Wu et al.,2021;Ross et al.,2022,2021;Bitton
et al.,2021;Asai and Hajishirzi,2020;Geva et al.,
2021;Madaan et al.,2021;Li et al.,2020). Ex-
isting techniques rely on using rules/heuristics for
perturbing sentences (Webster et al.,2020;Dua
et al.,2021;Ribeiro et al.,2020;Asai and Ha-
jishirzi,2020), or using sentence-level semantic
representations (eg. SRL) and a finite set of struc-
tured control codes (Geva et al.,2021;Ross et al.,
2022;Wu et al.,2021). However, Joshi and He
(2022) find that a limited set of perturbation types
further exacerbates biases, resulting in poor gener-
alization to unseen perturbation types. Generally,
creating an assorted set of instance-specific per-
turbations is challenging, often requiring external
knowledge (Paranjape et al.,2022).
Retrieval Augmented Generation
Retrieving
task-relevant knowledge from a large corpus of
unstructured and unlabeled text has proven to be
very effective for knowledge-intensive language
generation tasks like question answering (Lewis
et al.,2020), machine translation (Gu et al.,2018)
and dialogue generation (Weston et al.,2018). Re-
trieval has also been used for paraphrase generation
(Chen et al.,2019) and style-transfer (Xiao et al.,
2021) to specifically address the lack of diversity
in generations from pretrained language models.
In a similar vein, CORE uses learned retrieval for
counterfactual generation. While Paranjape et al.
(2022) use off-the-shelf retrieval models to gen-
erate counterfactuals for QA, learning to retrieve
counterfactuals is non-trivial for problems other
than QA. CORE provides a recipe to train retrieval
for general tasks.
In-context learning
Massive language models like
GPT-3 have been found to be effective at controlled
generation tasks like arbitrary style-transfer (Reif
et al.,2022), counterfactual reasoning (Frohberg
and Binder,2022), step-wise reasoning for com-
plex problems (Wei et al.,2022;Zhou et al.,2022),
and dataset generation (Liu et al.,2022), by learn-
ing in-context from few-shot demonstrations and
natural language instructions (Wei et al.,2021).
While GPT-3 has been used for data augmentation,
it has not been used for counterfactual generation,
which is fundamentally different in nature.
3 Method
A high level overview of CORE is shown in Fig-
ure 2. The first stage (§3.1) retrieves counterfactual
Retrieval
Index
Counterfactual
Retriever
The most opaque , self-indulgent and just
plain boring sequel to End Game
A light, engaging comedy with a
beautifully written plot. Congrats Disney,
a job well done!
The most delightful trailer ever
Spot on awesomeness in this
Obviously, this has to be Marvel.
Training Samples
Retrieved Samples
GPT3 Editor
Completely monotonous story with an
average plot
1
2
The most delightful , self-indulgent and just
spot on awesome sequel to End Game .
A light, monotonous comedy with
an average plot. Obviously, a job by Disney
Counterfactual Samples
Word
Extraction
Given a review, make use of the
given words to edit the review in
order to change the sentiment ..
Review: The movie is great
Words to use: "bad"
Edited: The movie is bad
Review: The most opaque...
Words to use: "delightful"
Edited:
Figure 2: Overview of CORE: COunterfactual Retrieval Editing framework. With the help of the 1 trained
counterfactual retriever we retrieve text excerpts from a large text corpus. These text excerpts are passed through
a simple word extraction module that extracts all the non stopwords, which are then used by 2 the Editor to edit
the given training instances to generate minimally edited label flipped instances.
excerpts from a large unlabeled corpus related to
the target task. In the second stage (§3.2), retrieved
excerpts are supplied, along with instructions and
demonstrations, as a language-modeling prompt
to GPT-3 to counterfactually edit input text. The
resultant data is used for augmentation in §5.1.2.
We describe each stage below; additional imple-
mentation details are provided in Appendix A.
3.1 CF-DPR: Counterfactual Dense Passage
Retriever
Our counterfactual retriever is based on the dense
passage retrieval (DPR) framework (Karpukhin
et al.,2020). CF-DPR retrieves similar instances
from a large unlabeled corpus that have different
labels. Formally, given a training set,
N(x) =
{(x1, y1),(x2, y2),...,(xn, yn)}
, for a text clas-
sification task and a large corpus
S
, CF-DPR re-
trieves samples
C(x) = {ˆx1,ˆx2,...,ˆxn}
from
S
such that the associated labels for samples in
C(x)
are not the same as the corresponding labels in
N(x)
. Specifically,
ˆyi6=yii(0, n)
, where
ˆyi
is
the class label for retrieved sample
ˆxi
. In Figure 2,
for the input “The most opaque , self-indulgent and
just plain boring sequel to End Game.”, CF-DPR
retrieves the excerpts “The most delightful trailer
ever" and “Spot on awesomeness in this”.
Training
We use the same contrastive learning
objective as Karpukhin et al. (2020) to train the
bi-encoder model. It consists of two independent
BERT (Devlin et al.,2018) encoders: a query en-
coder
P
that encodes
xi
in
N(x)
as
pi
and a doc-
ument encoder
Q
that encodes text excerpts in
S
as
qi
. To train the bi-encoder, we use a small seed
training dataset
[qi, p+
i, p
i]m
i=1
of size
m
contain-
ing
m < |N(x)|
positive and negative retrieval
samples. For a given training instance
qi
, we use
its corresponding positive sample
p+
i
and hard neg-
atives
p
i
s to optimize the following loss function.
(1)
L({qi, p+
i, p
i,1, ..p
i,n}m
i=1)
=log esim(qi,p+
i)
esim(qi,p+
i)+Pn
j=1 esim(qi,p
i,j )
To model the task of counterfactual retrieval, for
each training instance
xi=qi
, we use the corre-
sponding counterfactual instance as the positive
sample (
p+
i
) and use paraphrases of
qi
as the hard
negative (
p
i
). Positive samples can be obtained
from a seed dataset consisting of manually au-
thored counterfactuals for existing NLU datasets
like IMDb, and SNLI (Kaushik et al.,2020;Gard-
ner et al.,2020). This manual data is of the form
T={(q1, p+
1),(q2, p+
2),...,(qm, p+
m)}
. We make
use of the diverse paraphraser model (Krishna
et al.,2020) that generates paraphrases as hard
negatives for
{q1, q2, . . . , qm}
,
{p
1, p
2, . . . , p
m}
.
Contrastive training pulls counterfactual samples
p+
i
closer to
qi
and pushes semantically identical
sentences
p
i
away from
qi
. We show that this
counterfactual retrieval framework can be used to
retrieve counterfactuals for tasks with only a small
amount of seed training data (§4.1). Additional
details about training and evaluation of the trained
CF-DPR are in Appendix A.
Inference
We create
S
for a specific task dataset
using (1) text corpora that have similar domains as
that dataset and (2) other datasets for the same task.
For instance, for sentiment analysis over IMDb, we
use a large (1.6 million) corpus of movie reviews
from Amazon (McAuley and Leskovec,2013) and
the Yelp review dataset (Asghar,2016). We en-
code the entire search corpus using the trained doc-
ument encoder and index it using FAISS (John-
son et al.,2019). For every input training in-
stance
xi
, we retrieve top krelevant counterfactuals
{ˆx1
i,ˆx2
i,...,ˆxk
i}
. We refer to these as CF-DPR
counterfactuals.
3.2 GPT-3 Editor
The retrieved counterfactuals often contain relevant
phrases that perturb predictive features in different
ways (“opaque”
“delighful”, “boring”
“spot-
on awesome” in Figure 2), but are typically not
a minimally edited version of the training sample.
“The most delightful trailer ever” has the opposite
sentiment as the original review, but is about an-
other entity. To incorporate perturbation diversity
while controlling for minimality, CORE uses an ed-
itor module. The Editor takes the training sample
and retrieved counterfactuals as input and generates
a minimally edited counterfactual. This involves
selecting parts of the retrieved text that maybe use-
ful in flipping the original text’s label and then
seamlessly integrating them into the original input.
This fine-grained instance-specific style-transfer
task can be hard to find supervision for.
We use GPT-3 (Brown et al.,2020) since it has
been successfully used for fine-grained style trans-
fer with few-shot supervision in prior work Reif
et al. (2021). For instance, GPT-3 can learn from
natural language constraints in its prompt, such
as “include the word balloon,” for constrained de-
coding. To prompt GPT-3, we make use of the
set of instructions that were provided in Kaushik
et al. (2020) to instruct crowd workers to author
counterfactuals. We also append four human au-
thored demonstrations of incorporating retrieved
data, depicting various perturbation types.
Following Reif et al. (2022), we simplify our
demonstrations by extracting keywords from the re-
trieved samples and providing them as token-level
constraints in the prompt. To encourage the model
to perturb certain classes of words, we remove de-
terminers, conjunctions, and punctuation from the
retrieved samples and tokenize the rest of the input
into a list of keywords: [
w1, . . . , wn
]. The resul-
tant demonstration in our prompt thus becomes:
“Input: ... Words to use: [
w1, . . . , wn
], Edited:...”.
This is motivated by Wu et al. (2021)’s observa-
tion that perturbing certain classes of words (like
preposition and adjectives) leads to better counter-
factual generation. More details about the prompt
construction are in Appendix A.
4 Experimental Setup
We generate CORE counterfactuals for two tasks –
sentiment classification of movie reviews and natu-
ral language inference. We describe task-specific
details about CORE training and inference below.
4.1 Sentiment Classification
Task Dataset (N(x))
We create CORE counter-
factuals for the IMDb movie review dataset (Maas
et al.,2011), which has been used to manually cre-
ate contrastive data (Kaushik et al.,2020;Gardner
et al.,2020). This dataset presents unique chal-
lenges due to the longer average length of reviews
(233 words), that existing counterfactual generation
techniques (Wu et al.,2021) struggle at.
CF-DPR training data
(
p+
i, p
i
)Kaushik et al.
(2020) augment a subset of the IMDb dataset
(1.7K examples) with human edited counterfactu-
als, which we use to train CF-DPR. Negative pairs
p
iare created by paraphrase models.
Task-specific corpus (S)
We use datasets that
are of similar domain — Amazon Movie reviews
(McAuley and Leskovec,2013), Yelp reviews (As-
ghar,2016), and IMDb reviews (Maas et al.,2011).
Our initial experiments indicated that indexing full
movie reviews did not yield good CF-DPR per-
formance, owing to more dense retrieval noise
when encoding longer contexts (Luan et al.,2021).
Hence, we sentence tokenize the reviews and index
each sentence independently. The search corpus
contains approximately 8 million sentences.
Subset to augment N(x)
We generate CORE
counterfactuals for the same subset of
1.7k
reviews
from IMDb chosen by Kaushik et al. (2020) in
order to make fair comparisons in §5.
Key-word list [w1, . . . , wn]
We observed that sev-
eral sentences in the review contain descriptive
information (plot, cast, etc) that don’t convey senti-
ment. In Kaushik et al. (2020) too, humans edited
a few sentences to change the sentiment of a long
review. We extract these sentences containing po-
larity features, use them as queries to CF-DPR, and
retrieve their counterfactuals. This selection strat-
egy can also be extended to any review by learning
a sentence selection model on annotated data (DeY-
oung et al.,2020). The keyword list is extracted
from retrieved outputs as described in §3.2. Note
IMDb
Review:
It’s the worst movie I’ve ever seen. The action is so unclear, work of cameras is so poor, actors are so
affected ... and this lamentable 5 minutes of Arnie on the screen. My advice from the bottom of my heart - don’t
watch it unless you like such a low class torture. Sentiment: Negative
CF-DPR GPT-3 CORE
Probably the dozenth time
I’ve seen it. This is another
such effort. This is one spec-
tacular production. that is
definitely worth taking advan-
tage of.
It’s one of the best movies I’ve ever seen. The
action is so clear, work of cameras is so great,
actors are so unaffected ... and this wonderful
5 minutes of Arnie on the screen. My advice
from the bottom of my heart - watch it unless
you don’t like such a high class torture.
It’s the best movie I’ve ever seen. The ac-
tion is so clear, work of cameras is so great,
actors are so professional ... and this spec-
tacular 5 minutes of Arnie on the screen.
My advice from the bottom of my heart -
definitely watch it if you like such a high
class production.
MNLI
Premise: and my my uh taxes are a hundred and thirty five. Hypothesis: My taxes are $135 (Entails)
CF-DPR GPT-3 CORE
My sister spent over
$2,000
on a computer that
she’ll probably never use.
My taxes are not $135.
My taxes are probably over
$135
Premise:After the battle against the slave lord, the group needed it. Hypothesis:The slaves battled the group. (Contradicts)
CF-DPR GPT-3 CORE
The slaves never fought against their tyrannical cap-
tors.
The slaves battled the slave lord.
The slaves fought against their
captors.
Table 1: CORE generated counterfactuals for IMDb and MNLI, along with the CF-DPR retrieved outputs and the
independent GPT-3 Editor. For both the tasks, the retriever introduces several new words/phrases in the outputs.
that we do not impose any restrictions on the editor
regarding which sentences to edit.
4.2 Natural Language Inference
Task Dataset N(x)
We focus on MNLI
(Williams et al.,2018), a popular NLI dataset that
tests for complex language reasoning.
CF-DPR training data p+
i, p
i
We use the in-
herent paired nature of MNLI. In MNLI, given
a premise, annotators are asked to manually
write one sentences that entail, contradict or are
neutral to the premise. These three hypothe-
ses serve as mutual counterfactuals. In this
work, we limit counterfactual perturbations to
entailment(E)
contradiction(C) and vice-versa, to
simplify the different permutations of positives and
negatives required for CF-DPR training. We find
that including the neutral class leads to increasingly
noisy retrieved data, as the semantic differences be-
tween neutral class and the other two NLI classes
are subtle and hard to distinguish. In Equation 1,
qi
is generated by concatenating the premise and hy-
pothesis separated by the special token
[SEP]
. For
every such input,
p+
i
is a hypothesis from the coun-
terfactual class, while
p
i
are diverse paraphrases
of the original hypothesis.
Task-specific corpus (S)
is constructed by com-
bining the following NLI datasets (Williams et al.,
2018;Bowman, Samuel R. and Angeli, Gabor and
Potts, Christopher, and Manning, Christopher D.,
2015;Liu et al.,2022) in addition to the source cor-
pus
2
that was used to generate premises in MNLI.
We also include tokenized wikipedia articles (Mer-
ity et al.,2016) as several domains in MNLI (Eg.
travel, government) are related. The search corpus
contains approximately 7 million text excerpts.
Subset to augment N(x)
To compare with the
state-of-the-art data augmentation technique for
MNLI, WaNLI (Liu et al.,2022), we choose a sub-
set of the MNLI dataset for augmentation based on
their selection strategy. WaNLI uses dataset cartog-
raphy (Swayamdipta et al.,2020) to select the most
ambiguous examples — where model confidence
across training epochs is low and variance is high
— for augmentation. We generate 9.5K additional
examples in two classes.
Cross-Encoder
We incorporate a re-ranker mod-
ule to boost retrieval results for MNLI, that uses
a cross-encoder architecture (Thakur et al.,2021)
to jointly encode query
qi
and top-K documents
retrieved by the bi-encoder. Given a
qi
and
K
re-
trieved sentences from the bi-encoder, the re-ranker
learns to classify them as positive or negative. Dur-
ing inference, bi-encoder outputs are re-ranked
based on their cross-encoder probability. The cross
encoder is trained on the binary classification task
on the same seed dataset as the bi-encoder. Re-
trieval performances are reported in Appendix C.
2http://www.anc.org/
摘要:

CORE:ARetrieve-then-EditFrameworkforCounterfactualDataGenerationTanayDixit1BhargaviParanjape2HannanehHajishirzi2;3LukeZettlemoyer2;41IndianInstituteofTechnology,Madras2PaulG.AllenSchoolofComputerScience&Engineering,UniversityofWashington3AllenInstituteofArticialIntelligence,Seattle4MetaAItanay.dixi...

展开>> 收起<<
CORE A Retrieve-then-Edit Framework for Counterfactual Data Generation Tanay Dixit1Bhargavi Paranjape2Hannaneh Hajishirzi23Luke Zettlemoyer24.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:1.85MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注