
Explanations from Large Language Models Make Small Reasoners Better
Shiyang Li1, Jianshu Chen2, Yelong Shen3, Zhiyu Chen1, Xinlu Zhang1, Zekun Li1
Hong Wang1,Jing Qian1,Baolin Peng3,Yi Mao3,Wenhu Chen4and Xifeng Yan1
1University of California, Santa Barbara
2Tencent AI Lab, 3Microsoft
4University of Waterloo, Vector Institute
{shiyangli,zhiyuchen,xinluzhang,zekunli,hongwang600,jing_qian,xyan}@cs.ucsb.edu
jianshuchen@tencent.com,wenhuchen@uwaterloo.ca
{yelong.shen,bapeng,maoyi}@microsoft.com
Abstract
Integrating free-text explanations to in-context
learning of large language models (LLM) is
shown to elicit strong reasoning capabilities
along with reasonable explanations. In this pa-
per, we consider the problem of leveraging the
explanations generated by LLM to improve the
training of small reasoners, which are more fa-
vorable in real-production deployment due to
their low cost. We systematically explore three
explanation generation approaches from LLM
and utilize a multi-task learning framework to
facilitate small models to acquire strong rea-
soning power together with explanation gen-
eration capabilities. Experiments on multi-
ple reasoning tasks show that our method can
consistently and significantly outperform fine-
tuning baselines across different settings, and
even perform better than finetuning/prompting
a 60x larger GPT-3 (175B) model by up to
9.5% in accuracy. As a side benefit, human
evaluation further shows that our method can
generate high-quality explanations to justify
its predictions, moving towards the goal of ex-
plainable AI.
1 Introduction
Large language models (LLM) have achieved im-
pressive results with in-context learning; by adding
a few demonstrations as the prompts, they can solve
unseen tasks without any parameter update (Brown
et al.,2020;Thoppilan et al.,2022;Chowdhery
et al.,2022;Wei et al.,2022a). Recently, it is
shown that adding explanation-augmented prompts
can elicit strong performance in various reasoning
tasks (Wei et al.,2022b;Lampinen et al.,2022),
such as math word problem (Cobbe et al.,2021),
symbolic reasoning (Wei et al.,2022b), numerical
reasoning (Zhou et al.,2022) and commonsense
reasoning tasks (Talmor et al.,2019). In addition,
they also enable LLM to generate reasonable ex-
planations to justify the reasoning outcomes.
In this paper, we consider the problem of leverag-
ing these elicited explanations by LLM to improve
the training of small reasoners. Small language
models (SLM)
1
could be more favorable over LLM
in many real situations due to their low cost in both
storage and computation. Nevertheless, one impor-
tant open question is how to close the performance
gap with respect to LLM on complicated reasoning
tasks, as is observed in Zelikman et al. (2022), espe-
cially in few-shot settings (Li et al.,2019). Surpris-
ingly, Hase et al. (2020) shows that using human-
annotated explanations does not improve the perfor-
mance compared to standard finetuning on T5 (Raf-
fel et al.,2019). One possible reason is that many
human-annotated explanations collected via crowd-
sourcing (Wiegreffe and Marasovi´
c,2021) could be
logically inconsistent and grammatically incorrect
(Narang et al.,2020), which restricts the amount of
available high-quality explanations. On the other
hand, using explanation-augmented prompts en-
ables LLM to automatically generate descent ex-
planations (Wiegreffe et al.,2021a), making it a
plausible alternative to generate arbitrary amount
of explanations. Therefore, a key question is: Can
the explanations generated by LLM improve the
reasoning capability of SLM?
In this paper, we show that explanations gener-
ated from LLM can consistently improve reasoning
capability of SLM. Our framework is shown in Fig-
ure 1. Specifically, we first utilize several examples
with human-written explanations as demonstrations
of LLM and then generate explanations for training
set. We systematically explore three approaches to
generating explanations. The first approach utilizes
explanations generated through chain of thought
prompting and explanations are adopted if LLM
have correct predictions and are rejected otherwise
(Zelikman et al.,2022). The second one is to gener-
ate explanations by rationalization prompting con-
ditioned on golden labels (Wiegreffe et al.,2021a).
1
We argue that small and large models are relative concepts.
For the same model, it can be small or large depending on the
context.
arXiv:2210.06726v1 [cs.CL] 13 Oct 2022