
iPrompt 2
LLM to iteratively propose and evaluate different candidate
explanations.
For evaluation, we curate a diverse collection of datasets
written in natural language (Table 1) and measure iPrompt’s
ability to accurately explain a ground-truth pattern. We find
that iPrompt outperforms baseline methods in accurately
finding a correct description; moreover, the generated de-
scriptions are interpretable, allowing human auditing and
enabling strong generalization when used as a prompt in a
new setting (i.e. when used for a different LLM). On real-
world sentiment classification datasets, iPrompt even pro-
duces prompts that match or improve upon human-written
prompts for GPT-3, while only using smaller, locally-run
language models. Finally, we find that iPrompt is able to
extract information from real-world scientific datasets.
2. Related work
Prompting and autoprompting.
With the advent of
large-scale models, prompting (i.e. finding the right prompt
to use to query an LLM for a given task) has exploded as an
area of inquiry, often yielding impressive improvements in
performance (Brown et al., 2020; Petroni et al., 2019; Liu
et al., 2021a) and spurring a line of work aiming to make
prompting easier (Strobelt et al., 2022; Lu et al., 2022; Bach
et al., 2022; Logan IV et al., 2022). Recently, autoprompt-
ing (i.e. automatically searching for a prompt or prompt-
embedding via optimization) has emerged, with methods
such as prefix-tuning (Li & Liang, 2021), P-tuning (Liu
et al., 2021b), prompt-tuning with rules (Han et al., 2021),
knowledgeable prompt tuning (Hu et al., 2021) and many
more (Liu et al., 2021a). These strategies use gradient de-
scent to find a set of “adapter” parameters that maximize
model performance, but do not require that the new parame-
ters map back to tokens in discrete space, rendering them
uninterpretable.
A few methods tackle the more difficult problem of search-
ing for prompts that can be expressed in natural language
tokens. RLPrompt (Deng et al., 2022) searches for such
a prompt using reinforcement learning and one recent
work (Honovich et al., 2022) queries an LLM to produce
a prompt. AutoPrompt (Shin et al., 2020) performs auto-
prompting via input gradients (see Sec. 3). Similarly, ad-
versarial triggers (Wallace et al., 2019) use autoprompting
to identify adversarial inputs which can be used to change
a model’s prediction. These methods effectively alter a
model’s predictions, but do not constrain the discovered
prompts to be semantically meaningful, resulting in prompts
that are difficult to interpret (Webson & Pavlick, 2021). An-
other related work directly finetunes an LLM to describe
the difference between two datasets (Zhong et al., 2022).
Concurrent work proposes a method for natural language
prompting similar to the one here, with a focus on improv-
ing prediction performance rather than on explaining data
patterns (Zhou et al., 2022).
Problems related to dataset explanation
The problem
statement presented in this work closely resembles the
widely studied problems of symbolic regression (Augusto
& Barbosa, 2000; Schmidt & Lipson, 2009), program syn-
thesis (Gulwani et al., 2017; Manna & Waldinger, 1980),
text/table summarization (Kry
´
sci
´
nski et al., 2019; Liu et al.,
2018), and pattern discovery in data-mining (Hand, 2007).
iPrompt can be viewed as an algorithm for symbolic regres-
sion, in which the set of allowable symbols consists of se-
mantically meaningful natural language strings. One recent
work proposes the task of inferring prompts that improve
supervised prediction (Honovich et al., 2022), which we
generalize here to diverse use cases for dataset explanation.
Alternative methods for neural-network interpretation
A popular method for interpreting neural networks is to
inspect an LLM’s individual predictions via feature impor-
tances (Lundberg et al., 2019; Ribeiro et al., 2016), feature-
interaction importances (Singh et al., 2019; Tsang et al.,
2017), extractive rationales (Zaidan & Eisner, 2008; Sha
et al., 2021), or natural language explanations for individual
predictions (Hendricks et al., 2016; Camburu et al., 2018).
These works can provide meaningful insights for individual
predictions, but it is difficult to parse them into an under-
standing of an entire dataset. Alternatively, one can inves-
tigate whether an LLM’s learned representations via prob-
ing (Conneau et al., 2018; Liu & Avci, 2019) or by directly
analyzing a model’s internal weights and activations (Wang
et al., 2021; Olah et al., 2018; Meng et al., 2022). However,
these approaches are limited in their ability to generate pre-
viously unknown descriptions of data. A different approach
involves distilling information into a transparent model (Tan
et al., 2018; Ha et al., 2021; Singh & Gao, 2022) or simply
using a transparent model in the first place (Breiman et al.,
1984; Tan et al., 2022; Singh et al., 2021; Agarwal et al.,
2022).
3. Methods: Defining the task and approach
3.1. Task: Dataset Explanation
Given a dataset comprised of input-output string pairs
{(x1, y1),...,(xN, yN)}
, the goal is to produce a “seman-
tically meaningful” natural language string that explains the
relationship between
x
and
y
. We require that a string con-
sists of human-understandable text rather than a sequence
of incongruous tokens. For example, in the dataset shown
in Fig. 1, given samples of data performing addition, our
task is to recover text synonymous to Add the inputs. This
dataset explanation can then be used for various downstream
tasks, such as prompting a different LLM.