
Conformal Predictor for Improving Zero-shot Text Classification
Efficiency
Prafulla Kumar Choubey1Yu Bai1Chien-Sheng Wu1
Wenhao Liu2†Nazneen Rajani3†
1Salesforce AI Research, 2Faire.com, 3Hugging Face
{pchoubey, yu.bai, wu.jason}@salesforce.com
wenhao@faire.com, nazneen@hf.co
Abstract
Pre-trained language models (PLMs) have
been shown effective for zero-shot (0shot) text
classification. 0shot models based on natural
language inference (NLI) and next sentence
prediction (NSP) employ cross-encoder archi-
tecture and infer by making a forward pass
through the model for each label-text pair sep-
arately. This increases the computational cost
to make inferences linearly in the number of
labels. In this work, we improve the efficiency
of such cross-encoder-based 0shot models by
restricting the number of likely labels using an-
other fast base classifier-based conformal pre-
dictor (CP) calibrated on samples labeled by
the 0shot model. Since a CP generates pre-
diction sets with coverage guarantees, it re-
duces the number of target labels without ex-
cluding the most probable label based on the
0shot model. We experiment with three in-
tent and two topic classification datasets. With
a suitable CP for each dataset, we reduce
the average inference time for NLI- and NSP-
based models by 25.6% and 22.2% respec-
tively, without dropping performance below
the predefined error rate of 1%.
1 Introduction
Zero-shot (0shot) text classification is an important
NLP problem with many real-world applications.
The earliest approaches for 0shot text classifica-
tion use a similarity score between text and labels
mapped to common embedding space (Chang et al.,
2008;Gabrilovich and Markovitch,2007;Chen
et al.,2015;Li et al.,2016;Sappadla et al.,2016;
Xia et al.,2018). These models calculate text and
label embeddings independently and make only
one forward pass over the text resulting in a mini-
mal increase in the computation. Later approaches
explicitly incorporate label information when pro-
cessing the text, e.g., Yogatama et al. (2017) uses
generative modeling and generates text given label
†work was done at Salesforce AI Research.
embedding, and Rios and Kavuluru (2018) uses
label embedding based attention over text, both re-
quiring multiple passes over the text and increasing
the computational cost.
Most recently, NLI- (Condoravdi et al.,2003;
Williams et al.,2018;Yin et al.,2019) and NSP-
(Ma et al.,2021) based 0shot text classification for-
mulations have been proposed. NLI and NSP make
inferences by defining a representative hypothesis
sentence for each label and producing a score corre-
sponding to every pair of input text and hypothesis.
To compute the score, they employ a cross-encoder
architecture that is full self-attention over the text
and hypothesis sentences, which requires recom-
puting the encoding for text and each hypothesis
separately. It increases the computational cost to
make inferences linearly in the number of target
labels.
NLI and NSP use large transformer-based PLMs
(Devlin et al.,2019;Liu et al.,2019b;Lewis et al.,
2019) and outperform previous non-transformer-
based models by a large margin. However, the size
of PLMs and the number of target labels drasti-
cally reduce the prediction efficiency, increasing
the computation and inference time, and may sig-
nificantly increase the carbon footprint of making
predictions (Strubell et al.,2019;Moosavi et al.,
2020;Schwartz et al.,2020;Zhou et al.,2021).
In this work, we focus on the correlation between
the number of labels and prediction efficiency and
propose to use a conformal predictor (CP) (Vovk
et al.,2005;Shafer and Vovk,2008) to filter out un-
likely labels from the target. Conformal prediction
provides a model-agnostic framework to generate a
label set, instead of a single label prediction, within
a pre-defined error rate. Consequently, we use a
CP, with a small error rate we select, based on
another fast base classifier to generate candidate
target labels. Candidate labels are then used with
the larger NLI/NSP-based 0shot models to make
the final prediction.
arXiv:2210.12619v1 [cs.CL] 23 Oct 2022