
Metric-guided Distillation: Distilling Knowledge from the Metric to
Ranker and Retriever for Generative Commonsense Reasoning
Xingwei He1∗
, Yeyun Gong2†
, A-Long Jin1, Weizhen Qi3, Hang Zhang2,
Jian Jiao,4Bartuer Zhou,2Biao Cheng,2Siu Ming Yiu,1Nan Duan2
1The University of Hong Kong, 2Microsoft Research Asia,
3University of Science and Technology of China, 4Microsoft
hexingwei15@gmail.com,ajin@eee.hku.hk,
smyiu@cs.hku.hk,weizhen@mail.ustc.edu.cn,
{yegong, v-zhhang, jian.jiao, bazhou, bicheng, nanduan}@microsoft.com
Abstract
Commonsense generation aims to generate a
realistic sentence describing a daily scene un-
der the given concepts, which is very challeng-
ing, since it requires models to have relational
reasoning and compositional generalization ca-
pabilities. Previous work focuses on retrieving
prototype sentences for the provided concepts
to assist generation. They first use a sparse
retriever to retrieve candidate sentences, then
re-rank the candidates with a ranker. How-
ever, the candidates returned by their ranker
may not be the most relevant sentences, since
the ranker treats all candidates equally with-
out considering their relevance to the reference
sentences of the given concepts. Another prob-
lem is that re-ranking is very expensive, but
only using retrievers will seriously degrade the
performance of their generation models. To
solve these problems, we propose the metric
distillation rule to distill knowledge from the
metric (e.g., BLEU) to the ranker. We further
transfer the critical knowledge summarized by
the distilled ranker to the retriever. In this way,
the relevance scores of candidate sentences
predicted by the ranker and retriever will be
more consistent with their quality measured
by the metric. Experimental results on the
CommonGen benchmark verify the effective-
ness of our proposed method: (1) Our gener-
ation model with the distilled ranker achieves
a new state-of-the-art result. (2) Our genera-
tion model with the distilled retriever even sur-
passes the previous SOTA.
1 Introduction
Commonsense reasoning is the ability to make rea-
sonable and logical assumptions about daily scenes,
which is a long-standing challenge in natural lan-
guage processing. Recently, many discriminative
tasks, such as CommonsenseQA (Talmor et al.,
∗
Work done during internship at Microsoft Research Asia.
†Corresponding author.
Concepts:eye,hang,head,shut,squeeze
Reference: A man squeezes his eyes shut and hangs his head.
BART: He squeezes her head shut, then grasps her eyes shut.
Our: A baby with a blue shirt hangs his head and squeezes his eyes shut.
Table 1: Sentences generated by BART and our pro-
posed model, DKMR2.
2019) and SWAG (Sap et al.,2019), have been pro-
posed to evaluate the commonsense reasoning abil-
ity by testing whether models can select the correct
answer from the choices according to the given con-
text. To test whether models acquire the generative
commonsense reasoning ability, Lin et al. (2020)
proposed the commonsense generation (Common-
Gen) task, which requires models to produce a
plausible sentence describing a specific daily life
scenario based on the given concepts.
CommonGen proposes two main challenges to
models, and it expects models to (1) reason over
the commonsense relations among concepts to gen-
erate sentences in line with our commonsense; (2)
possess the compositional generalization ability
to generate realistic sentences with unseen con-
cept compositions. Experiment results (Lin et al.,
2020) show that large-scale pre-trained models
(e.g., BART) alone is not competent for this task
(see Table 1). The main reason is that the source
information is very limited; therefore, the models
can only rely on the internal implicit knowledge
acquired during pre-training to solve this problem,
resulting in generating some sentences that violate
commonsense.
To enrich the source information, EKI-BART
(Fan et al.,2020) first retrieves prototype sentences
for the input concepts, and then feeds the concepts
and retrieved sentences into the generation model.
Recent work, such as RE-T5 (Wang et al.,2021),
KFCNet (Li et al.,2021), and KGR
4
(Liu et al.,
2022), extends this retrieve-and-generate frame-
work by introducing a binary classifier to re-rank
the retrieved candidate sentences and filter out can-
arXiv:2210.11708v1 [cs.CL] 21 Oct 2022