Analogy Generation by Prompting Large Language Models A Case Study of InstructGPT Bhavya Bhavya1 Jinjun Xiong2and ChengXiang Zhai1

2025-04-27 0 0 1.14MB 15 页 10玖币
侵权投诉
Analogy Generation by Prompting Large Language Models:
A Case Study of InstructGPT
Bhavya Bhavya1, Jinjun Xiong2and ChengXiang Zhai1
1Department of Computer Science, University of Illinois at Urbana-Champaign
1{bhavya2, czhai}@illinois.edu
2Department of Computer Science and Engineering, University at Buffalo
2jinjun@buffalo.edu
Abstract
We propose a novel application of prompting
Pre-trained Language Models (PLMs) to gen-
erate analogies and study how to design effec-
tive prompts for two task settings: generating
a source concept analogous to a given target
concept (aka Analogous Concept Generation
or ACG), and generating an explanation of the
similarity between a given pair of target con-
cept and source concept (aka Analogous Ex-
planation Generation or AEG). We found that
it is feasible to prompt InstructGPT to gener-
ate meaningful analogies and the best prompts
tend to be precise imperative statements espe-
cially with a low temperature setting. We also
systematically analyzed the sensitivity of the
InstructGPT model to prompt design, temper-
ature, and injected spelling errors, and found
that the model is particularly sensitive to cer-
tain variations (e.g., questions vs. imperative
statements). Further, we conducted human
evaluation on 1.4k of the generated analogies
and found that the quality of generations varies
substantially by model size. The largest In-
structGPT model can achieve human-level per-
formance at generating meaningful analogies
for a given target while there is still room for
improvement on the AEG task.1
1 Introduction
Large Pre-trained Language Models (PLMs) such
as BERT(Devlin et al.,2018) and GPT(Brown et al.,
2020) have been applied to many tasks of text gen-
eration (e.g., summarization, dialogue system) with
promising results (Li et al.,2021). However, no
existing work has studied how to apply PLMs to
generate different kinds of textual analogies, such
as conceptual metaphors (e.g.,“Life is a journey
2
”),
and instructional analogies (e.g., “A red blood cell
is like a truck in that they both transport essential
supplies”(Newby et al.,1995)).
1
Our code and datasets are available for public use:
https://github.com/Bhaavya/InstructGPT-Analogies
2https://en.wikipedia.org/wiki/Conceptual_metaphor
Table 1: Selected prompts and InstructGPT-generated
analogies for natural selection
Prompt
(P7):
What is analogous to natural se-
lection?
InstructGPT
Output:
The analogous process to natural
selection is artificial selection.
(9
words)
Prompt
(P2):
Explain natural selection using a
well-known analogy.
InstructGPT
Output:
Imagine that you have a jar of
mixed nuts ... If you shake the jar
...the big nuts will fall out first ...
analogy is that natural selection
is like a sieve that separates the
fit from the unfit... (136 words)
.
Generating analogies has a wide range of appli-
cations, such as explaining concepts and scientific
innovation, and analogies play a crucial role in hu-
man cognition. Analogical matching and reasoning
enables humans to understand and learn unfamiliar
concepts (aka target concepts) by means of familiar
ones (aka source concepts) and to make scientific
innovations. Unsurprisingly, analogy modeling and
generation has been a long-standing goal of AI
(Mitchell,2021). This is a challenging problem
because it often requires computing deep semantic
similarities that are beyond the surface-level simi-
larity. For example, the Bohr’s atom model and the
solar system are analogous due to their structural
and relational similarities (i.e., atoms orbit around
the nucleus like planets around the sun).
Much work has been done to compute such ana-
logical similarities between concepts. However,
existing approaches mostly rely on structured rep-
resentations, thus, they can only where such repre-
sentations already exist. For example, one of the
most popular models is Structural Mapping Engine
(SME) (Forbus et al.,2017), which aligns struc-
tured representations of the target and source con-
cepts using predicate logic. Moreover, they cannot
generate analogies in natural language.
arXiv:2210.04186v2 [cs.CL] 11 Oct 2022
Inspired by the recent success in applying PLMs
to many NLP tasks (e.g., (Li et al.,2021)), we
propose and study the application of PLMs to anal-
ogy generation. We consider two typical applica-
tion scenarios of analogy generation: 1) Analogous
Concept Generation (ACG): given a target concept
(e.,g, bohr’s model), generate a source concept anal-
ogous to the target concept (e.g., solar system), pos-
sibly with an explanation of their similarities; 2)
Analogy Explanation Generation (AEG): given a
target concept and an analogous source concept,
generate an explanation of their similarities.
By noting the similarity of the two tasks defined
above to other text generation problems, and being
inspired by the recent success of using prompted
PLMs for text generation, we propose analogy
generation by using a PLM with appropriately de-
signed prompts. We adopt the promising emerg-
ing paradigm of prompting language models (Liu
et al.,2021) that uses textual prompts with unfilled
slots and directly leverages the language models to
fill those slots and obtain the desired output. For
example, Table 1shows sample prompts and PLM-
generated outputs for ACG from our experiments.
Specifically, we study the following main re-
search questions: RQ1) How effective is a modern
PLM such as InstructGPT in generating meaning-
ful analogies? RQ2) How sensitive are the gener-
ated analogies to prompt design, the temperature
hyperparameter, and spelling errors? RQ3) How
does the model size impact the quality of generated
analogies?
To study these questions, we design several ex-
periments on analogies generated from the Instruct-
GPT (Ouyang et al.,2022) model. First, we man-
ually validate whether InstructGPT can generate
meaningful analogies for ten well-known analo-
gies in the science domain. Next, we design and
systematically vary prompt variants (e.g., imper-
ative statements vs. questions) and temperature,
and investigate the corresponding variations in the
generated text by comparing them to a reference
dataset of science analogies. Finally, we study the
impact of model size on the quality of generated
analogies both by automatically comparing against
the reference data and using human evaluation.
Our experimental results show that PLMs
(specifically, InstructGPT) offer a promising gen-
eral approach to generating analogies with properly
designed prompts. Furthermore, the InstructGPT
model is found to be sensitive to the prompt design,
temperature, and spelling errors for this task, par-
ticularly to the prompt style (i.e., question vs. im-
perative statement). Precise imperative statements
in low-temperature setting are found to be the best
prompts. Finally, the quality of the generated analo-
gies depends heavily on the model size. While
the largest model can achieve human-level perfor-
mance on the ACG task, the smallest model barely
generates any meaningful analogies. The AEG task
proved to be more challenging based on human
evaluation and could be a better test of the analogi-
cal reasoning capabilities of PLMs especially for
explaining analogies not seen during training.
2 Related Work
2.1 Computational Models of Analogies
There has been a lot of work on computational
modeling of analogies (Mitchell,2021). The SME
model (Forbus et al.,2017) is one of the most
popular symbolic model that finds the mapping,
or connections between structured representations
of source and target concepts and their attributes.
However, such methods cannot generate new analo-
gous source concepts with analogical explanation.
The recent deep learning-based approaches,
including using pre-trained language models
(Mikolov et al.,2013;Rossiello et al.,2019;Ushio
et al.,2021), are able to generate analogies to
some extent, but are currently limited to simple
word-level and proportional analogies, such as (os-
trich:bird :: lion:?). In contrast, we aim to generate
and explain more complex analogies of concepts,
e.g. instructional analogies (Newby et al.,1995).
Another line of work is on finding analogous
documents for scientific innovation, such as prod-
uct descriptions and research papers, based on their
semantic similarities (Kittur et al.,2019). In con-
trast, we operate in a generative task setup.
To the best of our knowledge, none of the exist-
ing work has studied the problem of automatically
generating complex analogies in natural language.
Recently, research on more “generative” analogy-
making tasks has been recommended (Mitchell,
2021). Along this direction, we believe that our
proposed task is challenging and more practically
useful than the existing text-based generative ana-
logical tasks including letter-string (e.g., if “abc”
changes “abd”), what does “pqrs” change to?) and
word-level analogies.
.
2.2 Prompting Language Models
Recently, prompts have been either manually cre-
ated or learned to successfully leverage PLMs
for several natural language tasks (Liu et al.,
2021). Our work is closest to prompting for lex-
ical and proportional analogy generation (Ushio
et al.,2021). But, none of the existing work has
performed an in-depth study on prompting PLMs
for both generating analogous concepts given a sin-
gle query concept and explaining the analogical
similarities between two query concepts.
3 Problem Formulation
Motivated by the practical applications of this task
(e.g., explaining concepts), we study analogy gen-
eration in the following settings.
1. Analogous Concept Generation (ACG) or
No
Source (NO_SRC)
: Here, only the target concept
is provided as the input. The goal is to generate
an analogous source concept or scenario, along
with some explanation to justify the analogy. For
example, “Explain Bohr’s atomic model using an
analogy.
2. Analogy Explanation Generation (AEG) or
With Source (WSRC)
: Here, in addition to the tar-
get, the source concept is also a part of input. The
goal is to generate an explanation of how the target
and source are analogous. For example, “Explain
how Bohr’s atomic model is analogous to the solar
system.
Our problem setup is similar to the use of PLMs
for text generation (Li et al.,2021), and is most
closely related to single-relation analogy genera-
tion (e.g., ostrich : bird :: animal : lion) (Ushio
et al.,2021), where the input is a pair of query con-
cept (e.g., ostrich : bird), and the task is to choose
an analogical pair from a pre-defined list of candi-
date pairs. But, our proposed task is still different
in nature and much more challenging (e.g., requir-
ing more creativity in some cases). First, both of
our inputs and outputs are different. For example,
in the proposed ACG setup, our input is a single
concept (e.g., “bohr’s model”), not a pair of con-
cepts. Our task is to identify another concept (or
scenario) that has an equivalence to the query con-
cept based on their deep and non-trivial semantic
similarities. No previous work has studied this kind
of “single-concept-based” analogy generation with
pre-trained language models. Even in the proposed
AEG setup where we also use a pair of concepts
as input, they are different from the pair used in
the previous work. For example, our input could
be a pair (e.g., “bohr’s model” and “solar system”)
and the output is an explanation of their analogi-
cal relations (e.g., how their structures are similar).
Second, we do not have a pre-defined finite list of
candidates to choose from, which is a more realis-
tic and interesting setting than previous work from
application perspectives, and is also much more
challenging for evaluation.
4 Experiment Setup
In this section, we discuss InstructGPT PLM and
datasets used in our experiments.
InstructGPT Model:
Recently, several PLMs
have been developed and trained on massive web
data (Devlin et al.,2018;Brown et al.,2020;Raffel
et al.,2019). In this study, we probe the aligned
GPT-3 models, InstructGPT. These are GPT-3 mod-
els that have been optimized to follow instructions
better (Ouyang et al.,2022). InstructGPT has four
variants depending on the model size (number of
parameters), namely Ada (350 M), Babbage (1.3
B), Curie (6.7 B) , and Davinci (175 B)
3
. Unless
otherwise mentioned, we use the Davinci model
for the experiments as it is expected to have the
best performance.
We used the Open AI API
4
to generate all analo-
gies. Main hyperparameters are described in Sec-
tion 5.2.2 and rest in the Appendix A
Dataset:
As the task of analogy generation, as de-
fined in this paper, has not been previously studied,
there is no existing data set available to use directly
for evaluation. We thus opted to create new data
sets for evaluation. Table 2shows sample data from
these datasets.
Standard Science Analogies (STD): As far as we
could find, the closest dataset consisting of concep-
tual analogies is from (Turney,2008). It consists
of ten standard science analogies. However, these
only contain the source and target concepts but not
any explanation in natural language.
S
cience
a
nalogies from academic
Q
&
A
sites
SAQA: We searched for quiz questions that asked
to create analogies on academic Q&A sites like
Chegg.com, Study.com
5
by using search queries
3https://blog.eleuther.ai/gpt3-model-sizes/
4
https://beta.openai.com/docs/api-
reference/completions/create
5
https://chegg.com/, https://study.com/. We manually in-
spected the data and found no personal identifiers or offensive
content. We manually compiled the datasets, no scraping was
done.
摘要:

AnalogyGenerationbyPromptingLargeLanguageModels:ACaseStudyofInstructGPTBhavyaBhavya1,JinjunXiong2andChengXiangZhai11DepartmentofComputerScience,UniversityofIllinoisatUrbana-Champaign1{bhavya2,czhai}@illinois.edu2DepartmentofComputerScienceandEngineering,UniversityatBuffalo2jinjun@buffalo.eduAbstract...

展开>> 收起<<
Analogy Generation by Prompting Large Language Models A Case Study of InstructGPT Bhavya Bhavya1 Jinjun Xiong2and ChengXiang Zhai1.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.14MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注