ON OUT-OF-DISTRIBUTION DETECTION FOR AUDIO WITH DEEP NEAREST NEIGHBORS Zaharah Bukhsh Aaqib Saeed

2025-05-06 0 0 294.66KB 6 页 10玖币
侵权投诉
ON OUT-OF-DISTRIBUTION DETECTION FOR AUDIO WITH DEEP NEAREST
NEIGHBORS
Zaharah Bukhsh, Aaqib Saeed
Eindhoven University of Technology, Eindhoven, The Netherlands
ABSTRACT
Out-of-distribution (OOD) detection is concerned with identi-
fying data points that do not belong to the same distribution as
the model’s training data. For the safe deployment of predic-
tive models in a real-world environment, it is critical to avoid
making confident predictions on OOD inputs as it can lead to
potentially dangerous consequences. However, OOD detec-
tion largely remains an under-explored area in the audio (and
speech) domain. This is despite the fact that audio is a central
modality for many tasks, such as speaker diarization, automatic
speech recognition, and sound event detection. To address this,
we propose to leverage feature-space of the model with deep
k-nearest neighbors to detect OOD samples. We show that
this simple and flexible method effectively detects OOD inputs
across a broad category of audio (and speech) datasets. Specif-
ically, it improves the false positive rate (FPR@TPR95) by
17%
and the AUROC score by
7%
than other prior techniques.
Index Terms
out-of-distribution, audio, speech, uncer-
tainty estimation, deep learning, nearest neighbors
1. INTRODUCTION
Out-of-distribution (OOD) detection is the task of identifying
inputs that are not drawn from the same distribution as the
training data or are not truly representative of them. Neural
networks are known to produce overconfident scores even for
samples that do not belong to the training distribution [
1
].
This is a challenging problem for deploying machine learn-
ing in safety-critical applications, where making confident
predictions on OOD inputs can lead to potentially danger-
ous consequences. Besides the capability to generalize well
for samples from the familiar distribution, a robust machine
learning model should be aware of uncertainty stemming from
unknown examples. It is an important competency for real-
world applications, where the distribution of data can change
over time or vary across different user groups.
A broad range of approaches has been proposed to tackle
the OOD detection issue and develop reliable methods that
successfully detect in-distribution (ID) and OOD inputs. A set
of common techniques is to deduce uncertainty measurements
The icons used in the figure are from TheNounProject.
around predictions of the neural network based on model out-
puts [
1
,
2
,
3
,
4
], feature space [
5
,
6
], and gradient norms [
7
].
Similarly, distance-based methods [
5
] has also gained sig-
nificant attention recently for identifying OOD inputs with
promising capabilities. Distance-based methods leverage rep-
resentations extracted from a pre-trained model and act on the
assumption that out-of-distribution test samples are isolated
from the ID data. Nevertheless, OOD detection is severely
understudied in the audio domain, although audio recognition
models are being widely deployed in real-world settings. As
well as, audio is an important modality for many tasks, such as
speaker diarization, automatic speech recognition, and sound
event detection. The prior works mainly focus on vision tasks
raising an important question about the efficacy and applica-
bility of existing methods to audio and speech.
Our work follows the same intuition as of the distance-
based method [
5
], and we aim to explore the richness of model
representation space to derive a meaningful signal that can
help solve the task of OOD detection. Formally, we propose
a simple yet effective system for out-of-distribution detec-
tion for audio inputs with deep k-nearest neighbors. In par-
ticular, we leverage nearest neighbor distance centered on
a non-parametric approach without making strong distribu-
tional assumptions regarding underlying embedding space. To
identify OOD samples, we extract embedding for a test input,
compute its distance to k-nearest neighbors in the training set
and use a threshold to flag the input, i.e., a sample far away in
representation space is more likely to be OOD.
We demonstrate the effectiveness of kNN-based approach
on a broad range of audio recognition tasks and different neu-
ral network architectures and provide an extensive comparison
with both recent and classical approaches as baselines. Impor-
tantly, we note that to the best of our knowledge, we make
a first attempt at studying out-of-distribution detection and
setting up a benchmark for audio across a variety of datasets
ranging from keyword spotting and emotion recognition to
environmental sounds and more. Empirically, we show that
for a MobileNet [
8
] model (trained on in-distribution data of
human vocal sounds), the non-parametric nearest neighbor
method improves FPR@TPR95 by
17
% and AUROC scores
by
7
% than approaches that leverage output or gradient space
of the model.
arXiv:2210.15283v2 [cs.SD] 25 Feb 2023
Test
Instances
ID Data
k-Nearest
Neighbors
Embeddings
Distance
Distribution
OOD
Detection
Decision
Function
Fig. 1
: Overview of the k-deep nearest neighbors approach for leveraging embedding space to detect out-of-distribution samples.
2. APPROACH
2.1. Preliminaries
Learning Regime
We focus on supervised learning regime,
specifically, multi-class classification tasks, where,
X
and
Y={1,...C}
denote input and label spaces, respectively. A
classifier
fθ(·)
utilize training set
Dp={(xi, yi)}M
i=1
, which
is drawn i.i.d from the joint distribution
P
defined over
X × Y
.
The deep model
fθ(x) : X R|Y|
minimizes negative
log-likelihood (or similar) objective with back-propagation
to produce logits that are then translated to predicted labels
being assigned to the input samples.
Problem Formulation
Out-of-distribution (OOD) detection
is generally formulated as a binary classification problem with
the objective of identifying samples from unknown data dis-
tribution at inference time. For instance, samples from an
unrelated distribution whose label set does not overlap with
the task labels (of a trained deep model) should be deferred
instead of producing incorrect prediction [
1
]. Formally, given
a pre-trained classifier
fθ(·)
that is learned to solve a task
t
using data
Dp
from in-domain data distribution, the aim of
OOD is to have a decision function:
Uγ(x) = (ID H(x)γ
OOD H(x)< γ
that flags whether a sample
x∈ X
does not belongs to
Dp
.
γ
represents a threshold that is chosen such that a large fraction
(e.g.,
95%
) of ID samples are correctly identified. The domain
of OOD detection is concerned with the development of a
scoring function
H
that captures the uncertainty of data being
outside of the training data distribution. Previous approaches
largely relies on output [
1
,
2
,
4
,
3
], feature [
6
] and gradient [
7
]
spaces of the model, with [
5
] proposing to leverage nearest
neighbors in the feature space to determine uncertainty. Along
similar lines, we propose to leverage the non-parametric near-
est neighbors approach to detect OOD samples in audio, as we
describe in the subsequent section.
2.2. OOD Detection with Deep k-Nearest Neighbor
We aim to exploit the representation space of a pre-trained
neural network for detecting out-of-distribution samples with k-
nearest neighbor approach. We provide a high-level illustration
of our approach in Figure 1. The key driving factor behind
distance-based non-parametric methods is that distances in
the embedding space provide a meaningful way to compare
data from distributions. Hence, they can be utilized to identify
OOD samples as ID samples are closer to each other in the
feature space as compared to OOD data points. Inspired by the
simplicity and success of the deep nearest neighbor method for
OOD detection in vision domain [
5
], we propose to leverage
and study whether we can use it to reliably detect samples
different than the ID training set in audio (and speech) domain.
Given a pre-trained classification model
fθ(x) : X −
R|Y|
, we extract normalized representations (features or em-
beddings)
z=ψ(x)
||ψ(x)||2
from penultimate layer of the model,
where
ψ
can be seen as a feature extractor. With
Zm=
(z1, . . . , zn)
be the embedding vectors from an ID training
set and
z
be an embedding for a test sample. We compute
Euclidean distance of test input
||ziz||2
with each example
in
Zm
and reorder element in
Zm
based on increasing dis-
tance. Finally, we use a decision function from [
5
] to check
if sample is OOD as:
H(z, k) = 1{−dk(z)γ}
, where,
dk=||zkz||2
denotes distance to
k
-th nearest neighbor
and
1{·}
is an indicator function. In practice, the threshold
γ
can be chosen to correctly classify a large percentage (e.g.,
95%
) of ID samples. It is important to note that picking the
optimal value of γdoes not depend on OOD data.
There are several advantages of using deep nearest neigh-
bor over other methods for out-of-distribution detection. First,
it is scalable and can be used with large data sets using an
efficient similarity search library, such as Faiss [
9
]. Second, it
is easy to use as it does not require access to out-of-distribution
data for defining threshold. Third, it is model-agnostic in the
sense that we can use it with a variety of neural network archi-
tectures and different training regimes (i.e., both supervised
and unsupervised). Finally, it also offers an interpretability
into the process of identifying OOD samples by letting human
摘要:

ONOUT-OF-DISTRIBUTIONDETECTIONFORAUDIOWITHDEEPNEARESTNEIGHBORSZaharahBukhsh,AaqibSaeedEindhovenUniversityofTechnology,Eindhoven,TheNetherlandsABSTRACTOut-of-distribution(OOD)detectionisconcernedwithidenti-fyingdatapointsthatdonotbelongtothesamedistributionasthemodel'strainingdata.Forthesafedeploymen...

展开>> 收起<<
ON OUT-OF-DISTRIBUTION DETECTION FOR AUDIO WITH DEEP NEAREST NEIGHBORS Zaharah Bukhsh Aaqib Saeed.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:294.66KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注