
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach
Wenhao Yu1, Chenguang Zhu2, Zhihan Zhang1, Shuohang Wang2,
Zhuosheng Zhang3, Yuwei Fang2, Meng Jiang1
1University of Notre Dame, Indiana, USA
2Microsoft Cognitive Services Research, Washington, USA
3Shanghai Jiaotong University, Shanghai, China
1{wyu1, zzhang23, mjiang2}@nd.edu;
2{chezhu, shuow, yuwfan}@microsoft.com;3zhangzs@sjtu.edu.cn
Abstract
A common thread of retrieval-augmented
methods in the existing literature focuses on
retrieving encyclopedic knowledge, such as
Wikipedia, which facilitates well-defined en-
tity and relation spaces that can be modeled.
However, applying such methods to common-
sense reasoning tasks faces two unique chal-
lenges, i.e., the lack of a general large-scale
corpus for retrieval and a corresponding ef-
fective commonsense retriever. In this pa-
per, we systematically investigate how to lever-
age commonsense knowledge retrieval to im-
prove commonsense reasoning tasks. We
proposed a unified framework of Retrieval-
Augmented Commonsense reasoning (called
RACO), including a newly constructed com-
monsense corpus with over 20 million doc-
uments and novel strategies for training a
commonsense retriever. We conducted ex-
periments on four different commonsense rea-
soning tasks. Extensive evaluation results
showed that our proposed RACOcan signif-
icantly outperform other knowledge-enhanced
method counterparts, achieving new SoTA per-
formance on the CommonGen1and CREAK2
leaderboards. Our code is available at https:
//github.com/wyu97/RACo.
1 Introduction
Recent work has shown that scaling language mod-
els with considerably more data and parameters,
such as GPT3-175B (Brown et al.,2020), could
drive significant advances in commonsense reason-
ing tasks. Nevertheless, such models make predic-
tions by only “looking up information” stored in
their parameters, making it difficult to determine
what knowledge is stored or has been already for-
gotten by the neural network (Guu et al.,2020).
Besides, storage space is limited by the size of the
1https://inklab.usc.edu/CommonGen/
leaderboard.html
2https://www.cs.utexas.edu/~yasumasa/
creak/leaderboard.html
neural network. In order to memorize more world
knowledge, one must train ever-larger networks,
which can be prohibitively expensive and slow.
The solution that may seem obvious at first
glance is to grant language models free access to
open-world sources of commonsense knowledge
in a plug-and-play manner, instead of memorizing
all world knowledge. To achieve this capability,
language models must be able to retrieve relevant
commonsense knowledge from an unbounded set
of situations. Then, the language models can lever-
age the input text, as well as the retrieved informa-
tion to produce the desired output.
Compared with the large-scale language model
counterparts, e.g., UNICORN (Lourie et al.,2021),
retrieval-augmented methods have three remark-
able advantages: first, the knowledge is not stored
implicitly in the model parameters, but is explic-
itly acquired in a plug-and-play manner, leading
to great scalability; second, the paradigm gen-
erates text based on some retrieved references,
which alleviates the difficulty of generating from
scratch (Li et al.,2022); third, knowledge corpus
can be constantly edited and updated by experts,
making the model aware of the latest information.
Besides, compared with knowledge graph infer-
ence model counterparts, e.g., QA-GNN (Yasunaga
et al.,2021), retrieval-augmented methods allow
more flexibility in accessing and using knowledge
from different sources, because of the nature of
commonsense knowledge, which cannot all be con-
tained in a single knowledge graph defined by a
certain schema (Yu et al.,2022b).
A common thread of retrieval-augmented meth-
ods in the existing literature focuses on retriev-
ing encyclopedic knowledge such as Wikipedia,
which lends itself to a well-defined space of enti-
ties and relations that can be modeled (Karpukhin
et al.,2020;Lewis et al.,2020b;Yu et al.,2022a).
However, retrieval-augmented methods for com-
monsense reasoning have been rarely studied in the
arXiv:2210.12887v1 [cs.CL] 23 Oct 2022