An out-of-distribution discriminator based on Bayesian neural network epistemic uncertainty Ethan Ancell1 Christopher Bennett2 Bert Debusschere2 Sapan Agarwal2 Park Hays2

2025-04-30 1 0 1.48MB 29 页 10玖币

侵权投诉

An out-of-distribution discriminator based on Bayesian neural

network epistemic uncertainty

Ethan Ancell1,Christopher Bennett2,Bert Debusschere2,Sapan Agarwal2,Park Hays2,

and T. Patrick Xiao2

1Department of Statistics, University of Washington

2Sandia National Laboratories

1 Abstract

Neural networks have revolutionized the ﬁeld of machine learning with increased predictive capa-

bility over traditional methods. In addition to the demand for improved predictions, there is a

simultaneous demand for reliable uncertainty quantiﬁcation on estimates made by machine learn-

ing methods such as neural networks. Bayesian neural networks (BNNs) are an important type

of neural network with built-in capability for quantifying uncertainty. This paper begins with a

review of aleatoric and epistemic uncertainty in BNNs, then relates these concepts with an image

generator which creates a dataset where the goal is to identify the amplitude of an event in the im-

age. It is shown that epistemic uncertainty tends to be lower in images which are well-represented

in the training dataset and tends to be high in images which are not well-represented. An algo-

rithm for out-of-distribution (OoD) detection with BNN epistemic uncertainty is introduced along

with various experiments demonstrating factors inﬂuencing the OoD detection capability in a BNN.

The OoD detection capability with epistemic uncertainty is compared with the OoD detection in

the discriminator network of a generative adversarial network (GAN) with comparable network

architecture.

2 Introduction

Uncertainty quantiﬁcation (UQ) is important for ensuring trust in the predictions made by ma-

chine learning algorithms but is seldom provided as a feature in conventional deep neural networks

(DNNs). Without reliable UQ, DNN predictions cannot be relied upon for high-consequence deci-

sions, because their trustworthiness cannot be evaluated in a context where the cost of an incorrect

prediction is high. These safety-critical contexts include the use of machine learning for medi-

cal diagnoses [1], self-driving cars, physical infrastructure (e.g. smart grid) [2], national security,

and other applications. An important component for safety is out-of-distribution (OoD) detection,

where the neural network identiﬁes inputs as being suﬃciently unlikely to be part of the distribution

of training data, implying that any prediction made on that input should not be trusted. When

studying OoD detection using UQ methods in machine learning, we will reference two diﬀerent

types of uncertainties:

1. Aleatoric uncertainty: Uncertainty inherent in the data that cannot be reduced with more

data.

arXiv:2210.10780v2 [cs.LG] 9 Aug 2023

2. Epistemic uncertainty: Uncertainty in the model parameters that is reduced with more train-

ing examples.c

Practically, OoD data should consist of inputs for which the neural network has high epistemic

uncertainty, because epistemic uncertainty could have been reduced by including the OoD data

inside the training set [3].

Bayesian neural networks (BNNs) are a variant of DNNs where the parameters are random

variables rather than ﬁxed values. Training BNNs involves using Bayes’ theorem to infer probability

distributions over their parameters based on observed training data and prior knowledge. BNNs

can provide better calibrated uncertainty estimates than conventional DNNs, and by representing

weights as random variables, BNNs are capable of quantifying epistemic uncertainty. Aleatoric

uncertainty can also be estimated with BNNs: this uncertainty is typically obtained by training a

BNN to explicitly predict the aleatoric uncertainty rather than relying on the stochasticity of the

model parameters themselves [4]. The ability of BNNs to predict both types of uncertainties has

made BNNs attractive for UQ, and more speciﬁcally they have been studied for the problem of

OoD detection.

Some previous literature in OoD detection with BNNs has studied this problem using the

framework of Gaussian processes (GPs), because inﬁnite-width BNNs with Gaussian priors over

the weights are equivalent to GPs for this problem [5]. The paper [6] tackles OoD detection using

Neural Linear Models (NLMs) and an augmentation of the data with additional points lying on

the periphery of the training data and connects this with OoD detection with GPs and BNNs. The

reference [7] shows how BNNs can be used for OoD detection using the max probability, entropy,

mutual information, and diﬀerential entropy from the output vector of the DNN and shows that

the BNNs outperform DNNs trained for the same task. Others have criticized the use of BNNs

for OoD detection, arguing that OoD detection with BNNs is sensitive to the choice of prior over

the weights and may have a trade-oﬀ with generalization [8]. In this paper, we expand upon these

results by examining the problem of OoD detection with BNNs under a simplistic framework where

inputs are marked as in-distribution or out-of-distribution based upon previously seen values of

epistemic uncertainty in a validation dataset. Instead of using max probability, entropy, mutual

information, or diﬀerential entropy for OoD detection as in [7], we base our discriminator on

epistemic uncertainty as introduced in [9] for a simple regression setting.

We train BNNs for regression on a dataset of event images with a well-deﬁned, parameterized

generating function. The BNNs are tested on OoD test inputs that are formed by corrupting in-

distribution (iD) images with noise that is uncharacteristic of the training set, or by superimposing

iD images with images from a diﬀerent dataset. We compare the OoD detection capability of

BNNs to that of Generative Adversarial Networks (GANs). The discriminator network in a GAN

is rewarded for detecting the outputs of a generator network as OoD, while the generator network

is rewarded for fooling the discriminator. The GAN is therefore naturally suited to the task of OoD

detection and is a useful comparison point for BNN-based OoD detection, despite the diﬀerence in

the detection mechanism.

This paper begins with a brief review of Bayesian statistical methods and BNNs in Section 3.

Section 4introduces the dataset of event images with well-deﬁned properties that are used to train

the networks in this study. Section 5relates aleatoric and epistemic uncertainty to qualitative dif-

ferences in various images from the motivating dataset. Section 6.5 demonstrates that the epistemic

uncertainty predicted by BNNs can be used to detect OoD inputs and compares the approach to

OoD detection using the discriminator network in a GAN. Overall, we ﬁnd that the BNN epistemic

uncertainty approach to OoD detection has similar sensitivity to the GAN approach. The advan-

tage of the BNN is that the OoD detection capability is built into the same model that is trained

to accurately perform the original neural network task (e.g. regression).

The contributions of this paper are as follows:

•We quantify the diﬀerence in the predicted epistemic uncertainty of a BNN on in-distribution

vs out-of-distribution inputs by using a dataset with a well-deﬁned generating function and

diﬀerent methods of generating OoD data.

•We propose a simple and robust algorithm based on BNN epistemic uncertainty to classify

inputs as in-distribution or out-of-distribution.

•We investigate how the eﬀectiveness of OoD detection depends on the size of the training set,

epochs used for training, and the topology of the BNN.

•We show that OoD detection using epistemic uncertainty achieves similar sensitivity as a

GAN discriminator of similar complexity.

3 Review of Bayesian methods and neural networks

Traditional neural networks carry ﬁxed weights and biases. In Bayesian neural networks (BNNs),

the weights and biases are random variables. A forward pass is conducted by randomly sampling

the weights and biases from their distributions and using the sampled values in the forward pass.

The distributions of weights and biases in a Bayesian neural network are updated using algorithms

from Bayesian statistics.

In Bayesian methods, a prior p(θ) is posited that represents prior knowledge about the collection

of parameters θof the network. After collecting data D, the prior distribution is combined with the

likelihood of the data (which accounts for its associated uncertainty) p(D | θ) to obtain a posterior

distribution p(θ| D) using Bayes Theorem:

p(θ| D) = p(D | θ)p(θ)

p(D)(1)

This paper will work under the supervised learning paradigm: Drefers to a training dataset as

a complete entity where D= (X,y). The matrix X∈ X ⊂ Rn×dis the training data with size

n, dimensionality d, and support X. The vector y∈ Y ⊂ Rnis the set of real-valued labels with

support Ycorresponding to the training data X. The goal in a supervised regression task is to

ﬁnd p(y|x), and use this conditional distribution to create an optimal prediction ˆy(e.g., with the

posterior mean). The neural network will be written as a function Φ : X → Y. Writing Φ to have

explicit dependence on its parameters, a forward pass through the BNN is written Φ(x|θi) where

θiis a realization of the parameters from the posterior distribution p(θ| D).

Updating the weights and biases in a network is nontrivial: certain components of Equation 1

are mathematically intractable to calculate in a practical setting, especially the evidence p(D) which

would require an expensive numerical integration of the numerator of (1) over the high-dimensional

parameter space of θ. Given the intractability of using Bayes Theorem directly, there are two

popular workaround methods for sampling from the posterior p(θ| D): Markov Chain Monte Carlo

(MCMC) and Variational Inference (VI). MCMC algorithms can produce a near-exact sampling

from the posterior p(θ| D), but they are not computationally scalable and hence are impractical

for large-scale BNNs [10].

Variational inference algorithms posit a class of distributions Qthat are used to approximate the

posterior p(θ| D). The goal of a variational inference algorithm is to ﬁnd an optimal distribution

q∗(θ)∈Qsuch that q∗(θ) is most similar to the true posterior distribution p(θ| D). Similarity

in this context is measured with the Kullback-Leibler (KL) divergence between a given q(θ)∈Q

and the posterior p(θ| D). Minimizing the KL divergence directly is diﬃcult (it would require

the computation of the evidence), but the KL divergence can be manipulated to separate out and

neglect the evidence from the optimization problem, since it is independent of q(θ). The solution to

the variational inference problem using this modiﬁed objective function, called the evidence lower

bound (ELBO), is equivalent to solving the original problem up to an additive constant.

The BNNs in this paper are trained with the Flipout method for eﬃcient mini-batch optimiza-

tion [11]. Unless otherwise stated, all neural network models mentioned in this paper will be BNNs

with the prior distributions for the weights being independent standard normal distributions: so

all means and standard deviations describing the weights are instantiated with mean parameter 0

and standard deviation 1. The approximating class of distributions used in variational inference

will be the class of independent normal distributions where the means and standard deviations of

these distributions are inferred with training. We use TensorFlow and the TensorFlow Probability

Python libraries to create and test these models.

Methods for calculating aleatoric and epistemic uncertainty in a BNN will be described in Section

5after an introduction to the motivating dataset.

4 The amplitudes dataset

This paper will repeatedly reference an “amplitudes dataset,” a dataset consisting of 40 ×40

grayscale images which may contain one or no events. An “event” is generated by a point spread

function PSF(A, x, y) superimposed on a noisy background, with each event having a speciﬁed

amplitude Aand center coordinate (x, y) within the image. Without an event, the image consists

of noise only. This dataset emulates a simple, hypothetical anomaly detection application based

on image sensor data. The generated images can be passed into a neural network that predicts the

presence, amplitude, and coordinates of an event, using the true values as labels. In this paper, the

networks will only be trained to predict the amplitude of an image. Nonevent images are handled

by treating them as having an amplitude of 0.

For a high amplitude event, the PSF generates a signal that has higher pixel brightness values

and a larger spatial extent within the image. The PSF also models sensor saturation by limiting

brightness values to the range [0,1]. A bright event can cause many pixels to saturate over a large

area. Figure 1shows example images from the dataset.

Figure 1: A few diﬀerent amplitude levels in the dataset

Figure 2shows the distribution of amplitudes among all images in the training set. In this

dataset, images with an amplitude over approximately 21 will fully saturate the image, as is in the

rightmost image shown in Figure 1.

Figure 2: Histogram for the distribution of amplitudes in the dataset. This histogram represents

only the event images, which is 50% of the data. The other 50% are nonevent images and thus

have zero amplitude.

The dataset is synthetically generated, so the user can manually control parameters such as the

proportion of images with an event, the size of the training and test sets, and the resolution of the

image. Unless otherwise stated, all experiments will be trained with a training size of 10000 images

with 40 ×40 resolution where there is a 50% split in event/non-event images. In this section, the

symbols A, x, y referred to amplitude, x-position, and y-position respectively. The remainder of

this paper will only concern the prediction of amplitude, so the symbol ywill supplant Afor the

amplitude of an image as to conform with standard labeling notation in machine learning.

The value of the amplitudes dataset lies in the control that the user has in adjusting the genera-

tion parameters, making it ideal for studying how speciﬁc perturbations to the training and testing

datasets creates downstream diﬀerences in the characteristics of UQ using BNNs. Furthermore, the

dataset is suﬃciently complex such that the insights in this paper are likely generalizable to other

image dataset problems.

5 Calculating uncertainty

Recall from Section 2that aleatoric uncertainty is the uncertainty inherent in the data, and epis-

temic uncertainty is the uncertainty in the model parameters. There are varying approaches to

calculate a number for these uncertainties for a trained model. The sections below describe how to

calculate aleatoric and epistemic uncertainty in a BNN. As notation, ˆσAand ˆσEwill represent the

estimated aleatoric and epistemic uncertainty respectively.

5.1 Calculating aleatoric uncertainty

Aleatoric uncertainty in BNNs is estimated by including it as an explicit prediction of the neural

network. Each forward pass through the network results in a tuple (ˆy, ˆσA) where ˆσAis a numeric

representation of the aleatoric uncertainty in the prediction ˆy. This method of calculating aleatoric

uncertainty was originally proposed by [4]. Allowing the network to make varying predictions of ˆσA

for diﬀerent inputs allows the model to capture heteroscedastic (e.g., non-constant and dependent

on the input) noise in the data. For a full discussion on why a heteroscedastic noise model for

aleatoric uncertainty is appropriate in the amplitudes dataset, see Appendix B.

There is only one label y∈Rfor the regression task of predicting the amplitude of an image,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Anout-of-distributiondiscriminatorbasedonBayesianneuralnetworkepistemicuncertaintyEthanAncell1,ChristopherBennett2,BertDebusschere2,SapanAgarwal2,ParkHays2,andT.PatrickXiao21DepartmentofStatistics,UniversityofWashington2SandiaNationalLaboratories1AbstractNeuralnetworkshaverevolutionizedthefieldofmac...

展开>> 收起<<

An out-of-distribution discriminator based on Bayesian neural network epistemic uncertainty Ethan Ancell1 Christopher Bennett2 Bert Debusschere2 Sapan Agarwal2 Park Hays2.pdf

共29页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

An out-of-distribution discriminator based on Bayesian neural network epistemic uncertainty Ethan Ancell1 Christopher Bennett2 Bert Debusschere2 Sapan Agarwal2 Park Hays2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: