
UniNL: Aligning Representation Learning with Scoring Function for
OOD Detection via Unified Neighborhood Learning
Yutao Mou1∗, Pei Wang1∗, Keqing He2∗, Yanan Wu1
Jingang Wang2, Wei Wu2, Weiran Xu1∗
1Beijing University of Posts and Telecommunications, Beijing, China
2Meituan, Beijing, China
{myt,wangpei,yanan.wu,xuweiran}@bupt.edu.cn
{hekeqing,wangjingang,wuwei}@meituan.com
Abstract
Detecting out-of-domain (OOD) intents from
user queries is essential for avoiding wrong
operations in task-oriented dialogue systems.
The key challenge is how to distinguish in-
domain (IND) and OOD intents. Previous
methods ignore the alignment between repre-
sentation learning and scoring function, limit-
ing the OOD detection performance. In this pa-
per, we propose a unified neighborhood learn-
ing framework (UniNL) to detect OOD in-
tents. Specifically, we design a K-nearest
neighbor contrastive learning (KNCL) objec-
tive for representation learning and introduce
a KNN-based scoring function for OOD detec-
tion. We aim to align representation learning
with scoring function. Experiments and analy-
sis on two benchmark datasets show the effec-
tiveness of our method. 1
1 Introduction
Out-of-domain (OOD) intent detection aims to
know when a user query falls outside the range
of pre-defined supported intents, which helps to
avoid performing wrong operations and provide
potential directions of future development in a task-
oriented dialogue system (Akasaki and Kaji,2017;
Tulshan and Dhage,2018;Shum et al.,2018;Lin
and Xu,2019;Xu et al.,2020;Zeng et al.,2021a,b).
Compared with normal intent detection tasks, we
don’t know the exact number and lack labeled data
for unknown intents, which makes it challenging to
identify OOD samples in the task-oriented dialog.
Previous OOD detection works can be generally
classified into two types: supervised (Fei and Liu,
2016;Kim and Kim,2018;Larson et al.,2019;
Zheng et al.,2020) and unsupervised (Bendale and
Boult,2016;Hendrycks and Gimpel,2017;Shu
et al.,2017;Lee et al.,2018;Ren et al.,2019;
∗
The first three authors contribute equally. Weiran Xu is
the corresponding author.
1
We release our code at
https://github.com/
Yupei-Wang/UniNL
Lin and Xu,2019;Xu et al.,2020;Zeng et al.,
2021a) OOD detection. The former indicates that
there are extensive labeled OOD samples in the
training data. Fei and Liu (2016); Larson et al.
(2019), form a (N+1)-class classification problem
where the (N+1)-th class represents the OOD in-
tents. Further, Zheng et al. (2020) uses labeled
OOD data to generate an entropy regularization
term. But these methods require numerous labeled
OOD intents to get superior performance, which is
unrealistic. We focus on the unsupervised OOD de-
tection setting where labeled OOD samples are not
available for training. Unsupervised OOD detec-
tion first learns discriminative representations only
using labeled IND data and then employs scoring
functions, such as Maximum Softmax Probability
(MSP) (Hendrycks and Gimpel,2017), Local Out-
lier Factor (LOF) (Lin and Xu,2019), Gaussian
Discriminant Analysis (GDA) (Xu et al.,2020) to
estimate the confidence score of a test query.
All these unsupervised OOD detection methods
only focus on the improvement of a single aspect
of representation learning or scoring function, but
none of them consider how to align representa-
tion learning with scoring functions. For example,
Lin and Xu (2019) proposes a local outlier fac-
tor for OOD detection, which considers the local
density of a test query to determine whether it be-
longs to an OOD intent, but the IND pre-training
objective LMCL (Wang et al.,2018) cannot learn
neighborhood discriminative representations. Xu
et al. (2020); Zeng et al. (2021a) employ a gaussian
discriminant analysis method for OOD detection,
which assumes that the IND cluster distribution is a
gaussian distribution, but they use a cross-entropy
or supervised contrastive learning (Khosla et al.,
2020) objective for representation learning, which
cannot guarantee that such an assumption is satis-
fied. The gap between representation learning and
scoring function limits the overall performance of
these methods.
arXiv:2210.10722v1 [cs.CL] 19 Oct 2022