arbitrary style-transfer (Reif et al.,2022). CORE
prompts GPT-3 (Brown et al.,2020;Wei et al.,
2022) with a few demonstrations of using these
excerpts for counterfactual editing.
The CORE retriever is a transformer-based bi-
encoder model trained using contrastive learning
(Le-Khac et al.,2020) on a small set of human-
authored counterfactuals for the task. For each
training example, CORE retrieves excerpts from
an unlabeled task-related corpus that bear a label-
flipping counterfactual relationship with the origi-
nal input instance. Retrieval may extract excerpts
that have significant semantic drift from input
text, while still containing relevant counterfactual
phrases (Table 1). Using prompts, the CORE GPT-
3 editor generates counterfactual edits to the in-
put conditioned on the retrieved excerpts (and the
original inputs). The prompts consist of instruc-
tions and a few demonstrations of using the re-
trieved text for editing. Unlike prior work that
use rule-based (Ribeiro et al.,2020) or semantic
frameworks (Wu et al.,2021;Ross et al.,2022) and
restrict perturbation types, CORE uses naturally
occurring data to encourage perturbation diversity.
Intrinsic evaluation of CORE counterfactuals
demonstrates a rich set of perturbation types which
existing methods like Wu et al. (2021) generate
(Table 7) and new perturbation types (Table 5) with
more diverse outputs (Table 6), without explicit
supervision. Our extensive data augmentation ex-
periments and analyses show that the combination
of retrieval and few-shot editing generates data for
CAD that is effective in reducing model biases and
improves performance on out of distribution (OOD)
and challenge test sets. Perturbing only 3% and 7%
of the data for NLI and Sentiment analysis respec-
tively, we achieve improvements up to 4.5% and
6.2% over standard DA (Tables 2,3). Additionally,
we show that CORE’s learned retriever can assist
humans in generating more diverse counterfactu-
als, spurring their creativity and reducing priming
effects (Gardner et al.,2021).
2 Related Work
Counterfactual Data Augmentation
There is
growing interest in the area of CDA for model
robustness, with early efforts focused on human-
authored counterfactuals (Kaushik et al.,2020;
Gardner et al.,2020). However, manual rewrites
can be costly and prone to systematic omissions.
Techniques have been proposed for the automatic
generation of counterfactual data or contrast sets
(Wu et al.,2021;Ross et al.,2022,2021;Bitton
et al.,2021;Asai and Hajishirzi,2020;Geva et al.,
2021;Madaan et al.,2021;Li et al.,2020). Ex-
isting techniques rely on using rules/heuristics for
perturbing sentences (Webster et al.,2020;Dua
et al.,2021;Ribeiro et al.,2020;Asai and Ha-
jishirzi,2020), or using sentence-level semantic
representations (eg. SRL) and a finite set of struc-
tured control codes (Geva et al.,2021;Ross et al.,
2022;Wu et al.,2021). However, Joshi and He
(2022) find that a limited set of perturbation types
further exacerbates biases, resulting in poor gener-
alization to unseen perturbation types. Generally,
creating an assorted set of instance-specific per-
turbations is challenging, often requiring external
knowledge (Paranjape et al.,2022).
Retrieval Augmented Generation
Retrieving
task-relevant knowledge from a large corpus of
unstructured and unlabeled text has proven to be
very effective for knowledge-intensive language
generation tasks like question answering (Lewis
et al.,2020), machine translation (Gu et al.,2018)
and dialogue generation (Weston et al.,2018). Re-
trieval has also been used for paraphrase generation
(Chen et al.,2019) and style-transfer (Xiao et al.,
2021) to specifically address the lack of diversity
in generations from pretrained language models.
In a similar vein, CORE uses learned retrieval for
counterfactual generation. While Paranjape et al.
(2022) use off-the-shelf retrieval models to gen-
erate counterfactuals for QA, learning to retrieve
counterfactuals is non-trivial for problems other
than QA. CORE provides a recipe to train retrieval
for general tasks.
In-context learning
Massive language models like
GPT-3 have been found to be effective at controlled
generation tasks like arbitrary style-transfer (Reif
et al.,2022), counterfactual reasoning (Frohberg
and Binder,2022), step-wise reasoning for com-
plex problems (Wei et al.,2022;Zhou et al.,2022),
and dataset generation (Liu et al.,2022), by learn-
ing in-context from few-shot demonstrations and
natural language instructions (Wei et al.,2021).
While GPT-3 has been used for data augmentation,
it has not been used for counterfactual generation,
which is fundamentally different in nature.
3 Method
A high level overview of CORE is shown in Fig-
ure 2. The first stage (§3.1) retrieves counterfactual