
sic edits to the input text, either manually (Gardner
et al.,2020;Kaushik et al.,2019) or automatically
(Yang et al.,2021;Wang and Culotta,2021;Wu
et al.,2021), such that the target label changes.
These minimal edits are made via substitution, in-
sertion or deletion of tokens in the original sen-
tence, resulting in simplistic generations, which are
often unrealistic and lack linguistic diversity.
1
As a
result, counterfactuals via minimum edits often fail
to provide adequate inductive biases to promote ro-
bustness (Khashabi et al.,2020;Huang et al.,2020;
Joshi and He,2022).
In this paper, we investigate the potential of more
realistic and creative counterfactuals, which go be-
yond simple token-level edits, towards improving
robust generalization. While allowing larger edits
reduces proximity to the original sentence, we be-
lieve that this is a worthwhile trade-off for more
realistic and creative counterfactuals, which offer
greater flexibility in sentiment steering, increasing
the likelihood that the counterfactual possesses the
desired label. We propose a novel approach that
can generate diverse counterfactuals via concept-
controlled text generation, illustrated in Figure 1.
In particular, our approach combines the benefits
of domain adaptive pretraining (Gururangan et al.,
2020) for soft steering of the target label (Liu et al.,
2021), with those of
NeuroLogic
decoding (Lu
et al.,2021), an unsupervised, inference-time algo-
rithm that generates fluent text while strictly satisfy-
ing complex lexical constraints. As constraints, we
use tokens that evoke salient concepts derived from
ConceptNet (Speer et al.,2017). Our resulting gen-
erations, called
NeuroCounterfactuals2
, provide
loose counterfactuals to the original, while demon-
strating nuanced linguistic alterations to change the
target label (§2).
Compared to minimal-edit counterfactuals, our
counterfactuals are more natural and linguistically
diverse, resulting in syntactic, semantic and prag-
matic changes which alter the label while preserv-
ing relevance to the original concepts (Table 1).
On experiments with training data augmentation
for sentiment classification, our approach achieves
better performance compared to competitive base-
lines using minimal edit counterfactuals (§3). Our
performance even matches baselines using human-
annotated counterfactuals, on some settings, while
1
For instance, the minimal edit counterfactual in Figure 1
contains the phrase “loose collection of intelligible analogies”,
a somewhat unnatural construction for a positive movie review.
2NeuroCFs, for short.
avoiding the cost of human annotation. While
Neu-
roCFs
are designed to be loose counterfactuals, our
detailed analyses show that it is still important to
augment training data with examples possessing
a moderately high degree of similarity with the
original examples (§4). When the ultimate goal is
improving robust generalization, we show that go-
ing beyond minimal edit counterfactuals can result
in richer data augmentation.3
2NeuroCounterfactuals
We describe our methodology for automatic gen-
eration of loose counterfactuals,
NeuroCFs
, for
sentiment classification. The key idea underlying
our approach is the need for retention of concepts
to ensure content similarity to the original text,
while steering the sentiment to the opposite po-
larity. Our method, illustrated in Figure 1, com-
bines a concept-constrained decoding strategy with
a sentiment-steered language model. First, we de-
tail our approach for extracting the salient concepts
from a document (§2.1). Next, we discuss language
model adaptation to produce sentiment-steered
LMs (§2.2). Finally, we provide an overview of
the
NeuroLogic
decoding algorithm for controlled
text generation, and how it can be adapted for the
task of generating sentiment counterfactuals (§2.3).
2.1 Extracting Salient Concepts
Our first step constitutes extraction of concepts
from the original document, which can be used to
reconstruct its content, when used as constraints
during decoding (§2.3). Specifically, we aim to
identify a set of constraints which will require the
counterfactual to be similar in content to the orig-
inal sentence while still allowing the generation
to be steered towards the opposite polarity. Using
extracted concepts as constraints achieves this be-
cause the concepts consist of the content-bearing
noun phrases as opposed to the sentiment-bearing
adjectives. For example, in the original sentence
from Figure 1, we seek to constrain our generated
counterfactual to contain concept-oriented phrases,
such as “movie”,“analogy”, and “plot devices”
without explicitly requiring the presence of other
tokens which may indicate the sentiment (e.g., “un-
intelligible”,“ill-conceived”).
We achieve this mapping via linking tokens and
phrases in the document to nodes in the ConceptNet
3
Our code and data are available at
https://github.
com/IntelLabs/NeuroCounterfactuals