
2 Background
2.1 Weakly-supervised text classification
In weakly-supervised text classification, the system
is allowed to view the entire test set at evaluation
time. Having access to all test data allows novel pre-
processing approaches unavailable in traditional
text classification, such as preliminary clustering
of test samples (Mekala and Shang,2020;Wang
et al.,2021) before attempting final classification.
In the process, the system has an opportunity to
examine overall characteristics of the test set.
Existing methods for weakly-supervised text
classification focus on effectively leveraging such
additional information. The dominant approach in-
volves generating pseudo-data to train a neural text
classifier. Most methods assign labels to samples
in the test set by identifying operative keywords
within the text (Meng et al.,2018). They obtain
seed words that best represent each category name.
Then, each sample in the test set is assigned a label
with keywords most relevant to its content.
Later works improve this pipeline by automati-
cally generating seed words (Meng et al.,2020b) or
incorporating pre-trained language models to uti-
lize contextual information of representative key-
words (Mekala and Shang,2020).
Seed-word-based pseudo-labeling, however, is
heavily dependent on the existence of representa-
tive seed words in test samples. Seed-word-based
matching cannot fully utilize information in con-
textual language representations, because the clas-
sification of each document involves brittle global
hyperparameters such as the number of total seed
words (Meng et al.,2020b) or word embedding
distance (Wang et al.,2021).
In this work, we entirely forgo the seed word
generation process during pseudo-labeling. We
show that replacing seed-word generation with
entailment-based text classification is more reliable
and performant for text classification with weak
supervision.
2.2 Entailment based text classification
Textual entailment (Fyodorov et al.,2000;MacCart-
ney and Manning,2009) measures the likeliness
of a sentence appearing after another. Since entail-
ment is evaluated to a probability value, the task
can be extended for use in text classification. In
entailment-based text classification, classification
is posed as a textual entailment problem: given
a test document, the system ranks the probabili-
ties that sentences each containing a possible class
label (hypotheses) will immediately follow the doc-
ument text. The class label belonging to the most
probable hypothesis is selected as the classification
prediction. A hypothesis for topic classification,
for example, could be “This text is about <topic>”.
The flexibility in prompt choices for constructing
the hypotheses makes entailment-based classifica-
tion extremely adaptable to different task types.
Although entailment-based sentence scoring is
popular in zero- and few-shot text classification
(Yin et al.,2019,2020), the robustness of such ap-
proaches has recently been called into question (Ma
et al.,2021). Since entailment-based classifiers rely
heavily on lexical patterns, a large variance is ob-
served in classification performance across differ-
ent domains. We find that self-training commonly
found in weakly-supervised classification mitigates
such robustness issues in entailment-based classifi-
cation to a large degree.
3 The LIME Framework
LIME enhances the two-phase weakly-supervised
classification pipeline with an entailment-based
pseudo-labeling scheme.
Examples
Test sample (t) “I love the food."
Class label (c) “Positive"
Verbalizer "Positive" →“good"
Prompt "It was <verbalizer(hi)>."
Hypothesis (h) "It was good."
Table 1: Example test sample, class label, verbalizer,
prompt, and entailment hypothesis. Converting class
labels with a verbalizer is an optional procedure.
3.1 Phase 1: Pseudo-labeling
Textual entailment evaluates the likeliness of a hy-
pothesis hsucceeding some text t.
Given
C={c1, c2, . . . , cn}
, the set of all possi-
ble labels for
t
, we generate
H={h1, h2, . . . , hn}
,
the set of all entailment hypothesis. Every sentence
hi
asserts that its corresponding
ci∈C
is the cor-
rect label for
t
.
hi
is constructed from a designated
prompt and an optional verbalizer for each dataset
(Schick and Schütze,2021):
hi=prompt(verbalizer(ci))
Prompts dictate the wording of the hypotheses,