
REV: Information-Theoretic Evaluation of Free-Text Rationales
Hanjie Chen♡∗Faeze Brahman♠♢ Xiang Ren♠♣ Yangfeng Ji♡
Yejin Choi♠♢ Swabha Swayamdipta♣
♡Department of Computer Science, University of Virginia
♠Allen Institute for AI ♣University of Southern California
♢Paul G. Allen School of Computer Science & Engineering, University of Washington
{hc9mx,yangfeng}@virginia.edu {faezeb,xiangr,yejinc}@allenai.org swabhas@usc.edu
Abstract
Generating free-text rationales is a promising
step towards explainable NLP, yet evaluating
such rationales remains a challenge. Existing
metrics have mostly focused on measuring the
association between the rationale and a given
label. We argue that an ideal metric should fo-
cus on the new information uniquely provided
in the rationale that is otherwise not provided
in the input or the label. We investigate this re-
search problem from an information-theoretic
perspective using conditional
V
-information
(Hewitt et al.,2021). More concretely, we pro-
pose a metric called REV (
R
ationale
E
valuation
with conditional
V
-information), to quantify
the amount of new, label-relevant information
in a rationale beyond the information already
available in the input or the label. Experiments
across four benchmarks with reasoning tasks,
including chain-of-thought, demonstrate the ef-
fectiveness of REV in evaluating rationale-label
pairs, compared to existing metrics. We fur-
ther demonstrate REV is consistent with hu-
man judgments on rationale evaluations and
provides more sensitive measurements of new
information in free-text rationales. When used
alongside traditional performance metrics, REV
provides deeper insights into models’ reasoning
and prediction processes.1
1 Introduction
Model explanations have been indispensable for
trust and interpretability in natural language pro-
cessing (NLP) (Ribeiro et al.,2016,2020;Lipton,
2018;Chen et al.,2020,2021a). Free-text ratio-
nales, which explain a model prediction in natural
language, have been especially appealing due to
their flexibility in eliciting the reasoning process be-
hind the model’s decision making (Camburu et al.,
∗Work done during an internship at AI2.
1
Our code is publicly available at
https://github.com/
HanjieChen/REV
2018;Narang et al.,2020;Rajani et al.,2019;Ku-
mar and Talukdar,2020;Brahman et al.,2021),
making them closer to human explanations. How-
ever, existing metrics for free-text rationale eval-
uation remain narrowly focused on the extent to
which a rationale can help a (proxy) model predict
the label it explains (i.e., accuracy based) (Hase
et al.,2020;Wiegreffe et al.,2021). These metrics
offer little understanding of the new information
contained in the rationale, as added to the original
input, that could explain why the label is selected—
the very purpose a rationale is designed to serve.
For instance, the two rationales
r∗
1
and
ˆr1,a
in Fig.
1would be considered equally valuable under ex-
isting metrics, even though they supply different
amount of novel and relevant information.
In this paper, we overcome this shortcoming by
introducing an automatic evaluation for free-text ra-
tionales along two dimensions: (1) whether the ra-
tionale supports (i.e., is predictive of) the intended
label, and (2) how much new information does it
provide to justify the label, beyond what is con-
tained in the input. For example, rationale
ˆr1,b
in
Fig. 1violates (1) because it is not predictive of
the label, “
enjoy nature
”. Rationale
ˆr1,a
does
support the label but contains no new information
that justifies it, beyond what is stated in the input
x
; thus, it violates (2). Rationale
r∗
1
is satisfied
along both dimensions: it supports the label and
does so by providing new and relevant information,
beyond what is in the input. Our proposed eval-
uation is designed to penalize both
ˆr1,a
and
ˆr1,b
,
while rewarding rationales like r∗
1.
We introduce REV
2
, which adapts an
information-theoretic framework from Xu
et al. (2020) for evaluating free-text rationales
along the two dimensions mentioned above. Specif-
ically, REV is based on conditional
V
-information
2
For
R
ationale
E
valuation with conditional
V
-information.
arXiv:2210.04982v5 [cs.CL] 2 Jun 2023