Inspired by the recent success in applying PLMs
to many NLP tasks (e.g., (Li et al.,2021)), we
propose and study the application of PLMs to anal-
ogy generation. We consider two typical applica-
tion scenarios of analogy generation: 1) Analogous
Concept Generation (ACG): given a target concept
(e.,g, bohr’s model), generate a source concept anal-
ogous to the target concept (e.g., solar system), pos-
sibly with an explanation of their similarities; 2)
Analogy Explanation Generation (AEG): given a
target concept and an analogous source concept,
generate an explanation of their similarities.
By noting the similarity of the two tasks defined
above to other text generation problems, and being
inspired by the recent success of using prompted
PLMs for text generation, we propose analogy
generation by using a PLM with appropriately de-
signed prompts. We adopt the promising emerg-
ing paradigm of prompting language models (Liu
et al.,2021) that uses textual prompts with unfilled
slots and directly leverages the language models to
fill those slots and obtain the desired output. For
example, Table 1shows sample prompts and PLM-
generated outputs for ACG from our experiments.
Specifically, we study the following main re-
search questions: RQ1) How effective is a modern
PLM such as InstructGPT in generating meaning-
ful analogies? RQ2) How sensitive are the gener-
ated analogies to prompt design, the temperature
hyperparameter, and spelling errors? RQ3) How
does the model size impact the quality of generated
analogies?
To study these questions, we design several ex-
periments on analogies generated from the Instruct-
GPT (Ouyang et al.,2022) model. First, we man-
ually validate whether InstructGPT can generate
meaningful analogies for ten well-known analo-
gies in the science domain. Next, we design and
systematically vary prompt variants (e.g., imper-
ative statements vs. questions) and temperature,
and investigate the corresponding variations in the
generated text by comparing them to a reference
dataset of science analogies. Finally, we study the
impact of model size on the quality of generated
analogies both by automatically comparing against
the reference data and using human evaluation.
Our experimental results show that PLMs
(specifically, InstructGPT) offer a promising gen-
eral approach to generating analogies with properly
designed prompts. Furthermore, the InstructGPT
model is found to be sensitive to the prompt design,
temperature, and spelling errors for this task, par-
ticularly to the prompt style (i.e., question vs. im-
perative statement). Precise imperative statements
in low-temperature setting are found to be the best
prompts. Finally, the quality of the generated analo-
gies depends heavily on the model size. While
the largest model can achieve human-level perfor-
mance on the ACG task, the smallest model barely
generates any meaningful analogies. The AEG task
proved to be more challenging based on human
evaluation and could be a better test of the analogi-
cal reasoning capabilities of PLMs especially for
explaining analogies not seen during training.
2 Related Work
2.1 Computational Models of Analogies
There has been a lot of work on computational
modeling of analogies (Mitchell,2021). The SME
model (Forbus et al.,2017) is one of the most
popular symbolic model that finds the mapping,
or connections between structured representations
of source and target concepts and their attributes.
However, such methods cannot generate new analo-
gous source concepts with analogical explanation.
The recent deep learning-based approaches,
including using pre-trained language models
(Mikolov et al.,2013;Rossiello et al.,2019;Ushio
et al.,2021), are able to generate analogies to
some extent, but are currently limited to simple
word-level and proportional analogies, such as (os-
trich:bird :: lion:?). In contrast, we aim to generate
and explain more complex analogies of concepts,
e.g. instructional analogies (Newby et al.,1995).
Another line of work is on finding analogous
documents for scientific innovation, such as prod-
uct descriptions and research papers, based on their
semantic similarities (Kittur et al.,2019). In con-
trast, we operate in a generative task setup.
To the best of our knowledge, none of the exist-
ing work has studied the problem of automatically
generating complex analogies in natural language.
Recently, research on more “generative” analogy-
making tasks has been recommended (Mitchell,
2021). Along this direction, we believe that our
proposed task is challenging and more practically
useful than the existing text-based generative ana-
logical tasks including letter-string (e.g., if “abc”
changes “abd”), what does “pqrs” change to?) and
word-level analogies.
.