
Task Prompting pattern Mining pattern
Sentiment {INPUT}. It was {VERBALIZER}.(is|was) {VERBALIZER}*. {INPUT}
Topic class. {INPUT}. It is about {VERBALIZER}.{VERBALIZER}*. {INPUT}
NLI {INPUT:HYP} {VERBALIZER}, {INPUT:PREM} {INPUT:HYP} {VERBALIZER}, {INPUT:PREM}
Table 1: Patterns. {VERBALIZER} is replaced with the verbalizers in Table 2. For mining, *. captures everything
up to a sentence boundary, and {INPUT},{INPUT:HYP} and {INPUT:PREM} capture a single sentence.
Task Lbl Verbalizers
Sent. Pos. good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good, great, awesome, incredible
Neg. bad
bad
bad
bad
bad
bad
bad
bad
bad
bad
bad
bad
bad
bad
bad
bad
bad, awful, terrible, horrible
NLI
Ent. Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes, Therefore, Thus, Accordingly,
Hence, For this reason
Con. No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No, However, But, On the contrary,
In contrast
Neu. Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe
Maybe, Also, Furthermore, Secondly,
Additionally, Moreover, In addition
Table 2: Verbalizers for sentiment classification and
NLI. See Table 9for verbalizers used in topic classifi-
cation. When using a single verbalizer, we choose the
one underlined. Multi-token verbalizers are in italic.
Lbl: label, Ent./Con./Neu: entailment, contradiction,
neutral.
in-context demonstrations (Min et al.,2022).
In this paper, we propose an alternative approach
to zero-shot learning that is more flexible and inter-
pretable than prompting, while obtaining stronger
results in our experiments. Similar to prompting,
our method requires a pretrained language model,
pattern, and verbalizer, in addition to an unlabeled
corpus (e.g., the one used for pretraining). As il-
lustrated in Figure 1, our approach works by using
the pattern and verbalizer to mine labeled examples
from the corpus through regular expressions, and
leveraging them as supervision to finetune the pre-
trained language model. This allows to naturally
combine multiple patterns and verbalizers for each
task, while providing a signal to interactively de-
sign them by looking at the mined examples. In
addition, we show that better results are obtained
by filtering the mined examples through prompting.
Experiments in sentiment analysis, topic
classification and natural language inference (NLI)
confirm the effectiveness of our approach, which
outperforms prompting by a large margin when
using the exact same verbalizers and comparable
patterns. Our results offer a new perspective on
how language models can perform downstream
tasks in a zero-shot fashion, showing that similar
examples often exist in the pretraining corpus,
which can be directly retrieved through simple
extraction patterns.
2 Proposed Method
As shown in Figure 1, our method has three steps:
Mine.
We first use the pattern and a set of verbal-
izers to extract labeled examples from the corpus.
To that end, we define patterns that are filled with
verbalizers and expanded into regular expressions.
For instance, the pattern and verbalizer in Figure 1
would extract every sentence following “is good.”
or “was good.” as an example of the positive class,
and every sentence following “is bad.” or “was
bad.” as an example of the negative class. In prac-
tice, the patterns that we define are comparable to
the ones used for prompting, and the verbalizers
are exactly the same (see Tables 1and 2). Ap-
pendix Agives more details on how we expand
patterns into regular expressions. While prior work
in prompting typically uses a single verbalizer per
class, our approach allows to naturally combine
examples mined through multiple verbalizers in a
single dataset. So as to mitigate class imbalance
and keep the mined dataset to a reasonable size, we
mine a maximum of 40k examples per class after
balancing across the different verbalizers.
Filter.
As an optional second step, we explore
automatically removing noisy examples from the
mined data. To that end, we classify the mined
examples using zero-shot prompting, and remove
examples for which the predicted and the mined
label do not match. This filtering step is reliant on
the performance of prompting, and we only remove
10% of the mismatching examples for which zero-
shot prompting is the most confident.
Finetune.
Finally, we use the mined dataset to
finetune a pretrained language model in the stan-
dard supervised fashion (Devlin et al.,2019), learn-
ing a new classification head.