ON OUT-OF-DISTRIBUTION DETECTION FOR AUDIO WITH DEEP NEAREST NEIGHBORS Zaharah Bukhsh Aaqib Saeed

2025-05-06 0 0 294.66KB 6 页 10玖币

侵权投诉

ON OUT-OF-DISTRIBUTION DETECTION FOR AUDIO WITH DEEP NEAREST

NEIGHBORS

Zaharah Bukhsh, Aaqib Saeed

Eindhoven University of Technology, Eindhoven, The Netherlands

ABSTRACT

Out-of-distribution (OOD) detection is concerned with identi-

fying data points that do not belong to the same distribution as

the model’s training data. For the safe deployment of predic-

tive models in a real-world environment, it is critical to avoid

making conﬁdent predictions on OOD inputs as it can lead to

potentially dangerous consequences. However, OOD detec-

tion largely remains an under-explored area in the audio (and

speech) domain. This is despite the fact that audio is a central

modality for many tasks, such as speaker diarization, automatic

speech recognition, and sound event detection. To address this,

we propose to leverage feature-space of the model with deep

k-nearest neighbors to detect OOD samples. We show that

this simple and ﬂexible method effectively detects OOD inputs

across a broad category of audio (and speech) datasets. Specif-

ically, it improves the false positive rate (FPR@TPR95) by

17%

and the AUROC score by

than other prior techniques.

Index Terms—

out-of-distribution, audio, speech, uncer-

tainty estimation, deep learning, nearest neighbors

1. INTRODUCTION

Out-of-distribution (OOD) detection is the task of identifying

inputs that are not drawn from the same distribution as the

training data or are not truly representative of them. Neural

networks are known to produce overconﬁdent scores even for

samples that do not belong to the training distribution [

This is a challenging problem for deploying machine learn-

ing in safety-critical applications, where making conﬁdent

predictions on OOD inputs can lead to potentially danger-

ous consequences. Besides the capability to generalize well

for samples from the familiar distribution, a robust machine

learning model should be aware of uncertainty stemming from

unknown examples. It is an important competency for real-

world applications, where the distribution of data can change

over time or vary across different user groups.

A broad range of approaches has been proposed to tackle

the OOD detection issue and develop reliable methods that

successfully detect in-distribution (ID) and OOD inputs. A set

of common techniques is to deduce uncertainty measurements

The icons used in the ﬁgure are from TheNounProject.

around predictions of the neural network based on model out-

puts [

], feature space [

], and gradient norms [

Similarly, distance-based methods [

] has also gained sig-

niﬁcant attention recently for identifying OOD inputs with

promising capabilities. Distance-based methods leverage rep-

resentations extracted from a pre-trained model and act on the

assumption that out-of-distribution test samples are isolated

from the ID data. Nevertheless, OOD detection is severely

understudied in the audio domain, although audio recognition

models are being widely deployed in real-world settings. As

well as, audio is an important modality for many tasks, such as

speaker diarization, automatic speech recognition, and sound

event detection. The prior works mainly focus on vision tasks

raising an important question about the efﬁcacy and applica-

bility of existing methods to audio and speech.

Our work follows the same intuition as of the distance-

based method [

], and we aim to explore the richness of model

representation space to derive a meaningful signal that can

help solve the task of OOD detection. Formally, we propose

a simple yet effective system for out-of-distribution detec-

tion for audio inputs with deep k-nearest neighbors. In par-

ticular, we leverage nearest neighbor distance centered on

a non-parametric approach without making strong distribu-

tional assumptions regarding underlying embedding space. To

identify OOD samples, we extract embedding for a test input,

compute its distance to k-nearest neighbors in the training set

and use a threshold to ﬂag the input, i.e., a sample far away in

representation space is more likely to be OOD.

We demonstrate the effectiveness of kNN-based approach

on a broad range of audio recognition tasks and different neu-

ral network architectures and provide an extensive comparison

with both recent and classical approaches as baselines. Impor-

tantly, we note that to the best of our knowledge, we make

a ﬁrst attempt at studying out-of-distribution detection and

setting up a benchmark for audio across a variety of datasets

ranging from keyword spotting and emotion recognition to

environmental sounds and more. Empirically, we show that

for a MobileNet [

] model (trained on in-distribution data of

human vocal sounds), the non-parametric nearest neighbor

method improves FPR@TPR95 by

% and AUROC scores

% than approaches that leverage output or gradient space

of the model.

arXiv:2210.15283v2 [cs.SD] 25 Feb 2023

Test

Instances

ID Data

k-Nearest

Neighbors

Embeddings

Distance

Distribution

OOD

Detection

Decision

Function

Fig. 1

: Overview of the k-deep nearest neighbors approach for leveraging embedding space to detect out-of-distribution samples.

2. APPROACH

2.1. Preliminaries

Learning Regime

We focus on supervised learning regime,

speciﬁcally, multi-class classiﬁcation tasks, where,

and

Y={1,...C}

denote input and label spaces, respectively. A

classiﬁer

fθ(·)

utilize training set

Dp={(xi, yi)}M

i=1

, which

is drawn i.i.d from the joint distribution

deﬁned over

X × Y

The deep model

fθ(x) : X −→ R|Y|

minimizes negative

log-likelihood (or similar) objective with back-propagation

to produce logits that are then translated to predicted labels

being assigned to the input samples.

Problem Formulation

Out-of-distribution (OOD) detection

is generally formulated as a binary classiﬁcation problem with

the objective of identifying samples from unknown data dis-

tribution at inference time. For instance, samples from an

unrelated distribution whose label set does not overlap with

the task labels (of a trained deep model) should be deferred

instead of producing incorrect prediction [

]. Formally, given

a pre-trained classiﬁer

fθ(·)

that is learned to solve a task

using data

from in-domain data distribution, the aim of

OOD is to have a decision function:

Uγ(x) = (ID H(x)≥γ

OOD H(x)< γ

that ﬂags whether a sample

x∈ X

does not belongs to

represents a threshold that is chosen such that a large fraction

(e.g.,

95%

) of ID samples are correctly identiﬁed. The domain

of OOD detection is concerned with the development of a

scoring function

that captures the uncertainty of data being

outside of the training data distribution. Previous approaches

largely relies on output [

], feature [

] and gradient [

]

spaces of the model, with [

] proposing to leverage nearest

neighbors in the feature space to determine uncertainty. Along

similar lines, we propose to leverage the non-parametric near-

est neighbors approach to detect OOD samples in audio, as we

describe in the subsequent section.

2.2. OOD Detection with Deep k-Nearest Neighbor

We aim to exploit the representation space of a pre-trained

neural network for detecting out-of-distribution samples with k-

nearest neighbor approach. We provide a high-level illustration

of our approach in Figure 1. The key driving factor behind

distance-based non-parametric methods is that distances in

the embedding space provide a meaningful way to compare

data from distributions. Hence, they can be utilized to identify

OOD samples as ID samples are closer to each other in the

feature space as compared to OOD data points. Inspired by the

simplicity and success of the deep nearest neighbor method for

OOD detection in vision domain [

], we propose to leverage

and study whether we can use it to reliably detect samples

different than the ID training set in audio (and speech) domain.

Given a pre-trained classiﬁcation model

fθ(x) : X −→

R|Y|

, we extract normalized representations (features or em-

beddings)

z=ψ(x)

||ψ(x)||2

from penultimate layer of the model,

where

can be seen as a feature extractor. With

Zm=

(z1, . . . , zn)

be the embedding vectors from an ID training

set and

z∗

be an embedding for a test sample. We compute

Euclidean distance of test input

||zi−z∗||2

with each example

and reorder element in

based on increasing dis-

tance. Finally, we use a decision function from [

] to check

if sample is OOD as:

H(z∗, k) = 1{−dk(z∗)≥γ}

, where,

dk=||zk−z∗||2

denotes distance to

-th nearest neighbor

and

1{·}

is an indicator function. In practice, the threshold

can be chosen to correctly classify a large percentage (e.g.,

95%

) of ID samples. It is important to note that picking the

optimal value of γdoes not depend on OOD data.

There are several advantages of using deep nearest neigh-

bor over other methods for out-of-distribution detection. First,

it is scalable and can be used with large data sets using an

efﬁcient similarity search library, such as Faiss [

]. Second, it

is easy to use as it does not require access to out-of-distribution

data for deﬁning threshold. Third, it is model-agnostic in the

sense that we can use it with a variety of neural network archi-

tectures and different training regimes (i.e., both supervised

and unsupervised). Finally, it also offers an interpretability

into the process of identifying OOD samples by letting human

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ONOUT-OF-DISTRIBUTIONDETECTIONFORAUDIOWITHDEEPNEARESTNEIGHBORSZaharahBukhsh,AaqibSaeedEindhovenUniversityofTechnology,Eindhoven,TheNetherlandsABSTRACTOut-of-distribution(OOD)detectionisconcernedwithidenti-fyingdatapointsthatdonotbelongtothesamedistributionasthemodel'strainingdata.Forthesafedeploymen...

展开>> 收起<<

ON OUT-OF-DISTRIBUTION DETECTION FOR AUDIO WITH DEEP NEAREST NEIGHBORS Zaharah Bukhsh Aaqib Saeed.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ON OUT-OF-DISTRIBUTION DETECTION FOR AUDIO WITH DEEP NEAREST NEIGHBORS Zaharah Bukhsh Aaqib Saeed

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: