
Pseudo-OOD training for robust language models
Dhanasekar Sundararaman1∗, Nikhil Mehta1∗, Lawrence Carin1
1Duke University
{ds448,nm208}@duke.edu
Abstract
While pre-trained large-scale deep models
have garnered attention as an important topic
for many downstream natural language pro-
cessing (NLP) tasks, such models often make
unreliable predictions on out-of-distribution
(OOD) inputs. As such, OOD detection is
a key component of a reliable machine learn-
ing model for any industry-scale application.
Common approaches often assume access to
additional OOD samples during the training
stage, however, outlier distribution is often un-
known in advance. Instead, we propose a
post hoc framework called POORE - POsthoc
pseudo Ood REgularization, that generates
pseudo-OOD samples using in-distribution
(IND) data. The model is fine-tuned by in-
troducing a new regularization loss that sepa-
rates the embeddings of IND and OOD data,
which leads to significant gains on the OOD
prediction task during testing. We extensively
evaluate our framework on three real-world di-
alogue systems, achieving new state-of-the-art
in OOD detection.
1 Introduction
Detecting Out-of-Distribution (
OOD
) (Goodfel-
low et al.,2014;Hendrycks and Gimpel,2016;
Yang et al.,2021) samples is vital for develop-
ing reliable machine learning systems for various
industry-scale applications of natural language un-
derstanding (NLP) (Shen et al.,2019;Sundarara-
man et al.,2020) including intent understanding
in conversational dialogues (Zheng et al.,2020;
Li et al.,2017), language translation (Denkowski
and Lavie,2011;Sundararaman et al.,2019), and
text classification (Aggarwal and Zhai,2012;Sun-
dararaman et al.,2022). For instance, a language
understanding model deployed to support a chat
system for medical inquiries should reliably de-
tect if the symptoms reported in a conversation
∗The authors contributed equally to this work
constitute an
OOD
query so that the model may
abstain from making incorrect diagnosis (Sied-
likowski et al.,2021).
Although
OOD
detection has attracted a great
deal of interest from the research commu-
nity (Goodfellow et al.,2014;Hendrycks and Gim-
pel,2017;Lee et al.,2018), these approaches are
not specifically designed to leverage the structure
of textual inputs. Consequently, commonly used
OOD
approaches often have limited success in real-
world NLP applications. Most prior
OOD
methods
for NLP systems (Larson et al.,2019;Chen and Yu,
2021;Kamath et al.,2020) typically assume addi-
tional
OOD
data for outlier exposure (Hendrycks
et al.,2018). However, such methods risk over-
fitting to the chosen OOD set, while making the
assumption that a relevant OOD set is available dur-
ing the training stage. Other methods (Gangal et al.,
2020;Li et al.,2021;Kamath et al.,2020) assume
training a calibration model, in addition to the clas-
sifier, for detecting
OOD
inputs. These methods
are computationally expensive as they often require
re-training the model on the downstream task.
Motivated by the above limitations, we propose
a framework called POsthoc pseudo Ood REgular-
ization (
POORE
) that generates pseudo-OOD data
using the trained classifier and the In-Distribution
(
IND
) samples. As opposed to methods that use
outlier exposure, our framework doesn’t rely on any
external OOD set. Moreover,
POORE
can be eas-
ily applied to already deployed large-scale models
trained on a classification task, without requiring
to re-train the classifier from scratch. In summary,
we make the following contributions:
1.
We propose a Mahalanobis-based context
masking scheme for generating pseudo-OOD
samples that can be used during the fine-
tuning.
2.
We introduce a new Pseudo Ood Regular-
ization (
POR
) loss that maximizes the dis-
arXiv:2210.09132v1 [cs.CL] 17 Oct 2022