Analogy Generation by Prompting Large Language Models A Case Study of InstructGPT Bhavya Bhavya1 Jinjun Xiong2and ChengXiang Zhai1

2025-04-27 0 0 1.14MB 15 页 10玖币

侵权投诉

Analogy Generation by Prompting Large Language Models:

A Case Study of InstructGPT

Bhavya Bhavya1, Jinjun Xiong2and ChengXiang Zhai1

1Department of Computer Science, University of Illinois at Urbana-Champaign

1{bhavya2, czhai}@illinois.edu

2Department of Computer Science and Engineering, University at Buffalo

2jinjun@buffalo.edu

Abstract

We propose a novel application of prompting

Pre-trained Language Models (PLMs) to gen-

erate analogies and study how to design effec-

tive prompts for two task settings: generating

a source concept analogous to a given target

concept (aka Analogous Concept Generation

or ACG), and generating an explanation of the

similarity between a given pair of target con-

cept and source concept (aka Analogous Ex-

planation Generation or AEG). We found that

it is feasible to prompt InstructGPT to gener-

ate meaningful analogies and the best prompts

tend to be precise imperative statements espe-

cially with a low temperature setting. We also

systematically analyzed the sensitivity of the

InstructGPT model to prompt design, temper-

ature, and injected spelling errors, and found

that the model is particularly sensitive to cer-

tain variations (e.g., questions vs. imperative

statements). Further, we conducted human

evaluation on 1.4k of the generated analogies

and found that the quality of generations varies

substantially by model size. The largest In-

structGPT model can achieve human-level per-

formance at generating meaningful analogies

for a given target while there is still room for

improvement on the AEG task.1

1 Introduction

Large Pre-trained Language Models (PLMs) such

as BERT(Devlin et al.,2018) and GPT(Brown et al.,

2020) have been applied to many tasks of text gen-

eration (e.g., summarization, dialogue system) with

promising results (Li et al.,2021). However, no

existing work has studied how to apply PLMs to

generate different kinds of textual analogies, such

as conceptual metaphors (e.g.,“Life is a journey

”),

and instructional analogies (e.g., “A red blood cell

is like a truck in that they both transport essential

supplies”(Newby et al.,1995)).

Our code and datasets are available for public use:

https://github.com/Bhaavya/InstructGPT-Analogies

2https://en.wikipedia.org/wiki/Conceptual_metaphor

Table 1: Selected prompts and InstructGPT-generated

analogies for natural selection

Prompt

(P7):

What is analogous to natural se-

lection?

InstructGPT

Output:

The analogous process to natural

selection is artiﬁcial selection.

words)

Prompt

(P2):

Explain natural selection using a

well-known analogy.

InstructGPT

Output:

Imagine that you have a jar of

mixed nuts ... If you shake the jar

...the big nuts will fall out ﬁrst ...

analogy is that natural selection

is like a sieve that separates the

ﬁt from the unﬁt... (136 words)

Generating analogies has a wide range of appli-

cations, such as explaining concepts and scientiﬁc

innovation, and analogies play a crucial role in hu-

man cognition. Analogical matching and reasoning

enables humans to understand and learn unfamiliar

concepts (aka target concepts) by means of familiar

ones (aka source concepts) and to make scientiﬁc

innovations. Unsurprisingly, analogy modeling and

generation has been a long-standing goal of AI

(Mitchell,2021). This is a challenging problem

because it often requires computing deep semantic

similarities that are beyond the surface-level simi-

larity. For example, the Bohr’s atom model and the

solar system are analogous due to their structural

and relational similarities (i.e., atoms orbit around

the nucleus like planets around the sun).

Much work has been done to compute such ana-

logical similarities between concepts. However,

existing approaches mostly rely on structured rep-

resentations, thus, they can only where such repre-

sentations already exist. For example, one of the

most popular models is Structural Mapping Engine

(SME) (Forbus et al.,2017), which aligns struc-

tured representations of the target and source con-

cepts using predicate logic. Moreover, they cannot

generate analogies in natural language.

arXiv:2210.04186v2 [cs.CL] 11 Oct 2022

Inspired by the recent success in applying PLMs

to many NLP tasks (e.g., (Li et al.,2021)), we

propose and study the application of PLMs to anal-

ogy generation. We consider two typical applica-

tion scenarios of analogy generation: 1) Analogous

Concept Generation (ACG): given a target concept

(e.,g, bohr’s model), generate a source concept anal-

ogous to the target concept (e.g., solar system), pos-

sibly with an explanation of their similarities; 2)

Analogy Explanation Generation (AEG): given a

target concept and an analogous source concept,

generate an explanation of their similarities.

By noting the similarity of the two tasks deﬁned

above to other text generation problems, and being

inspired by the recent success of using prompted

PLMs for text generation, we propose analogy

generation by using a PLM with appropriately de-

signed prompts. We adopt the promising emerg-

ing paradigm of prompting language models (Liu

et al.,2021) that uses textual prompts with unﬁlled

slots and directly leverages the language models to

ﬁll those slots and obtain the desired output. For

example, Table 1shows sample prompts and PLM-

generated outputs for ACG from our experiments.

Speciﬁcally, we study the following main re-

search questions: RQ1) How effective is a modern

PLM such as InstructGPT in generating meaning-

ful analogies? RQ2) How sensitive are the gener-

ated analogies to prompt design, the temperature

hyperparameter, and spelling errors? RQ3) How

does the model size impact the quality of generated

analogies?

To study these questions, we design several ex-

periments on analogies generated from the Instruct-

GPT (Ouyang et al.,2022) model. First, we man-

ually validate whether InstructGPT can generate

meaningful analogies for ten well-known analo-

gies in the science domain. Next, we design and

systematically vary prompt variants (e.g., imper-

ative statements vs. questions) and temperature,

and investigate the corresponding variations in the

generated text by comparing them to a reference

dataset of science analogies. Finally, we study the

impact of model size on the quality of generated

analogies both by automatically comparing against

the reference data and using human evaluation.

Our experimental results show that PLMs

(speciﬁcally, InstructGPT) offer a promising gen-

eral approach to generating analogies with properly

designed prompts. Furthermore, the InstructGPT

model is found to be sensitive to the prompt design,

temperature, and spelling errors for this task, par-

ticularly to the prompt style (i.e., question vs. im-

perative statement). Precise imperative statements

in low-temperature setting are found to be the best

prompts. Finally, the quality of the generated analo-

gies depends heavily on the model size. While

the largest model can achieve human-level perfor-

mance on the ACG task, the smallest model barely

generates any meaningful analogies. The AEG task

proved to be more challenging based on human

evaluation and could be a better test of the analogi-

cal reasoning capabilities of PLMs especially for

explaining analogies not seen during training.

2 Related Work

2.1 Computational Models of Analogies

There has been a lot of work on computational

modeling of analogies (Mitchell,2021). The SME

model (Forbus et al.,2017) is one of the most

popular symbolic model that ﬁnds the mapping,

or connections between structured representations

of source and target concepts and their attributes.

However, such methods cannot generate new analo-

gous source concepts with analogical explanation.

The recent deep learning-based approaches,

including using pre-trained language models

(Mikolov et al.,2013;Rossiello et al.,2019;Ushio

et al.,2021), are able to generate analogies to

some extent, but are currently limited to simple

word-level and proportional analogies, such as (os-

trich:bird :: lion:?). In contrast, we aim to generate

and explain more complex analogies of concepts,

e.g. instructional analogies (Newby et al.,1995).

Another line of work is on ﬁnding analogous

documents for scientiﬁc innovation, such as prod-

uct descriptions and research papers, based on their

semantic similarities (Kittur et al.,2019). In con-

trast, we operate in a generative task setup.

To the best of our knowledge, none of the exist-

ing work has studied the problem of automatically

generating complex analogies in natural language.

Recently, research on more “generative” analogy-

making tasks has been recommended (Mitchell,

2021). Along this direction, we believe that our

proposed task is challenging and more practically

useful than the existing text-based generative ana-

logical tasks including letter-string (e.g., if “abc”

changes “abd”), what does “pqrs” change to?) and

word-level analogies.

2.2 Prompting Language Models

Recently, prompts have been either manually cre-

ated or learned to successfully leverage PLMs

for several natural language tasks (Liu et al.,

2021). Our work is closest to prompting for lex-

ical and proportional analogy generation (Ushio

et al.,2021). But, none of the existing work has

performed an in-depth study on prompting PLMs

for both generating analogous concepts given a sin-

gle query concept and explaining the analogical

similarities between two query concepts.

3 Problem Formulation

Motivated by the practical applications of this task

(e.g., explaining concepts), we study analogy gen-

eration in the following settings.

1. Analogous Concept Generation (ACG) or

Source (NO_SRC)

: Here, only the target concept

is provided as the input. The goal is to generate

an analogous source concept or scenario, along

with some explanation to justify the analogy. For

example, “Explain Bohr’s atomic model using an

analogy.”

2. Analogy Explanation Generation (AEG) or

With Source (WSRC)

: Here, in addition to the tar-

get, the source concept is also a part of input. The

goal is to generate an explanation of how the target

and source are analogous. For example, “Explain

how Bohr’s atomic model is analogous to the solar

system.”

Our problem setup is similar to the use of PLMs

for text generation (Li et al.,2021), and is most

closely related to single-relation analogy genera-

tion (e.g., ostrich : bird :: animal : lion) (Ushio

et al.,2021), where the input is a pair of query con-

cept (e.g., ostrich : bird), and the task is to choose

an analogical pair from a pre-deﬁned list of candi-

date pairs. But, our proposed task is still different

in nature and much more challenging (e.g., requir-

ing more creativity in some cases). First, both of

our inputs and outputs are different. For example,

in the proposed ACG setup, our input is a single

concept (e.g., “bohr’s model”), not a pair of con-

cepts. Our task is to identify another concept (or

scenario) that has an equivalence to the query con-

cept based on their deep and non-trivial semantic

similarities. No previous work has studied this kind

of “single-concept-based” analogy generation with

pre-trained language models. Even in the proposed

AEG setup where we also use a pair of concepts

as input, they are different from the pair used in

the previous work. For example, our input could

be a pair (e.g., “bohr’s model” and “solar system”)

and the output is an explanation of their analogi-

cal relations (e.g., how their structures are similar).

Second, we do not have a pre-deﬁned ﬁnite list of

candidates to choose from, which is a more realis-

tic and interesting setting than previous work from

application perspectives, and is also much more

challenging for evaluation.

4 Experiment Setup

In this section, we discuss InstructGPT PLM and

datasets used in our experiments.

InstructGPT Model:

Recently, several PLMs

have been developed and trained on massive web

data (Devlin et al.,2018;Brown et al.,2020;Raffel

et al.,2019). In this study, we probe the aligned

GPT-3 models, InstructGPT. These are GPT-3 mod-

els that have been optimized to follow instructions

better (Ouyang et al.,2022). InstructGPT has four

variants depending on the model size (number of

parameters), namely Ada (350 M), Babbage (1.3

B), Curie (6.7 B) , and Davinci (175 B)

. Unless

otherwise mentioned, we use the Davinci model

for the experiments as it is expected to have the

best performance.

We used the Open AI API

to generate all analo-

gies. Main hyperparameters are described in Sec-

tion 5.2.2 and rest in the Appendix A

Dataset:

As the task of analogy generation, as de-

ﬁned in this paper, has not been previously studied,

there is no existing data set available to use directly

for evaluation. We thus opted to create new data

sets for evaluation. Table 2shows sample data from

these datasets.

Standard Science Analogies (STD): As far as we

could ﬁnd, the closest dataset consisting of concep-

tual analogies is from (Turney,2008). It consists

of ten standard science analogies. However, these

only contain the source and target concepts but not

any explanation in natural language.

cience

nalogies from academic

sites

SAQA: We searched for quiz questions that asked

to create analogies on academic Q&A sites like

Chegg.com, Study.com

by using search queries

3https://blog.eleuther.ai/gpt3-model-sizes/

https://beta.openai.com/docs/api-

reference/completions/create

https://chegg.com/, https://study.com/. We manually in-

spected the data and found no personal identiﬁers or offensive

content. We manually compiled the datasets, no scraping was

done.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AnalogyGenerationbyPromptingLargeLanguageModels:ACaseStudyofInstructGPTBhavyaBhavya1,JinjunXiong2andChengXiangZhai11DepartmentofComputerScience,UniversityofIllinoisatUrbana-Champaign1{bhavya2,czhai}@illinois.edu2DepartmentofComputerScienceandEngineering,UniversityatBuffalo2jinjun@buffalo.eduAbstract...

展开>> 收起<<

Analogy Generation by Prompting Large Language Models A Case Study of InstructGPT Bhavya Bhavya1 Jinjun Xiong2and ChengXiang Zhai1.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Analogy Generation by Prompting Large Language Models A Case Study of InstructGPT Bhavya Bhavya1 Jinjun Xiong2and ChengXiang Zhai1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: