An out-of-distribution discriminator based on Bayesian neural network epistemic uncertainty Ethan Ancell1 Christopher Bennett2 Bert Debusschere2 Sapan Agarwal2 Park Hays2

2025-04-30 0 0 1.48MB 29 页 10玖币
侵权投诉
An out-of-distribution discriminator based on Bayesian neural
network epistemic uncertainty
Ethan Ancell1,Christopher Bennett2,Bert Debusschere2,Sapan Agarwal2,Park Hays2,
and T. Patrick Xiao2
1Department of Statistics, University of Washington
2Sandia National Laboratories
1 Abstract
Neural networks have revolutionized the field of machine learning with increased predictive capa-
bility over traditional methods. In addition to the demand for improved predictions, there is a
simultaneous demand for reliable uncertainty quantification on estimates made by machine learn-
ing methods such as neural networks. Bayesian neural networks (BNNs) are an important type
of neural network with built-in capability for quantifying uncertainty. This paper begins with a
review of aleatoric and epistemic uncertainty in BNNs, then relates these concepts with an image
generator which creates a dataset where the goal is to identify the amplitude of an event in the im-
age. It is shown that epistemic uncertainty tends to be lower in images which are well-represented
in the training dataset and tends to be high in images which are not well-represented. An algo-
rithm for out-of-distribution (OoD) detection with BNN epistemic uncertainty is introduced along
with various experiments demonstrating factors influencing the OoD detection capability in a BNN.
The OoD detection capability with epistemic uncertainty is compared with the OoD detection in
the discriminator network of a generative adversarial network (GAN) with comparable network
architecture.
2 Introduction
Uncertainty quantification (UQ) is important for ensuring trust in the predictions made by ma-
chine learning algorithms but is seldom provided as a feature in conventional deep neural networks
(DNNs). Without reliable UQ, DNN predictions cannot be relied upon for high-consequence deci-
sions, because their trustworthiness cannot be evaluated in a context where the cost of an incorrect
prediction is high. These safety-critical contexts include the use of machine learning for medi-
cal diagnoses [1], self-driving cars, physical infrastructure (e.g. smart grid) [2], national security,
and other applications. An important component for safety is out-of-distribution (OoD) detection,
where the neural network identifies inputs as being sufficiently unlikely to be part of the distribution
of training data, implying that any prediction made on that input should not be trusted. When
studying OoD detection using UQ methods in machine learning, we will reference two different
types of uncertainties:
1. Aleatoric uncertainty: Uncertainty inherent in the data that cannot be reduced with more
data.
1
arXiv:2210.10780v2 [cs.LG] 9 Aug 2023
2. Epistemic uncertainty: Uncertainty in the model parameters that is reduced with more train-
ing examples.c
Practically, OoD data should consist of inputs for which the neural network has high epistemic
uncertainty, because epistemic uncertainty could have been reduced by including the OoD data
inside the training set [3].
Bayesian neural networks (BNNs) are a variant of DNNs where the parameters are random
variables rather than fixed values. Training BNNs involves using Bayes’ theorem to infer probability
distributions over their parameters based on observed training data and prior knowledge. BNNs
can provide better calibrated uncertainty estimates than conventional DNNs, and by representing
weights as random variables, BNNs are capable of quantifying epistemic uncertainty. Aleatoric
uncertainty can also be estimated with BNNs: this uncertainty is typically obtained by training a
BNN to explicitly predict the aleatoric uncertainty rather than relying on the stochasticity of the
model parameters themselves [4]. The ability of BNNs to predict both types of uncertainties has
made BNNs attractive for UQ, and more specifically they have been studied for the problem of
OoD detection.
Some previous literature in OoD detection with BNNs has studied this problem using the
framework of Gaussian processes (GPs), because infinite-width BNNs with Gaussian priors over
the weights are equivalent to GPs for this problem [5]. The paper [6] tackles OoD detection using
Neural Linear Models (NLMs) and an augmentation of the data with additional points lying on
the periphery of the training data and connects this with OoD detection with GPs and BNNs. The
reference [7] shows how BNNs can be used for OoD detection using the max probability, entropy,
mutual information, and differential entropy from the output vector of the DNN and shows that
the BNNs outperform DNNs trained for the same task. Others have criticized the use of BNNs
for OoD detection, arguing that OoD detection with BNNs is sensitive to the choice of prior over
the weights and may have a trade-off with generalization [8]. In this paper, we expand upon these
results by examining the problem of OoD detection with BNNs under a simplistic framework where
inputs are marked as in-distribution or out-of-distribution based upon previously seen values of
epistemic uncertainty in a validation dataset. Instead of using max probability, entropy, mutual
information, or differential entropy for OoD detection as in [7], we base our discriminator on
epistemic uncertainty as introduced in [9] for a simple regression setting.
We train BNNs for regression on a dataset of event images with a well-defined, parameterized
generating function. The BNNs are tested on OoD test inputs that are formed by corrupting in-
distribution (iD) images with noise that is uncharacteristic of the training set, or by superimposing
iD images with images from a different dataset. We compare the OoD detection capability of
BNNs to that of Generative Adversarial Networks (GANs). The discriminator network in a GAN
is rewarded for detecting the outputs of a generator network as OoD, while the generator network
is rewarded for fooling the discriminator. The GAN is therefore naturally suited to the task of OoD
detection and is a useful comparison point for BNN-based OoD detection, despite the difference in
the detection mechanism.
This paper begins with a brief review of Bayesian statistical methods and BNNs in Section 3.
Section 4introduces the dataset of event images with well-defined properties that are used to train
the networks in this study. Section 5relates aleatoric and epistemic uncertainty to qualitative dif-
ferences in various images from the motivating dataset. Section 6.5 demonstrates that the epistemic
uncertainty predicted by BNNs can be used to detect OoD inputs and compares the approach to
OoD detection using the discriminator network in a GAN. Overall, we find that the BNN epistemic
uncertainty approach to OoD detection has similar sensitivity to the GAN approach. The advan-
tage of the BNN is that the OoD detection capability is built into the same model that is trained
2
to accurately perform the original neural network task (e.g. regression).
The contributions of this paper are as follows:
We quantify the difference in the predicted epistemic uncertainty of a BNN on in-distribution
vs out-of-distribution inputs by using a dataset with a well-defined generating function and
different methods of generating OoD data.
We propose a simple and robust algorithm based on BNN epistemic uncertainty to classify
inputs as in-distribution or out-of-distribution.
We investigate how the effectiveness of OoD detection depends on the size of the training set,
epochs used for training, and the topology of the BNN.
We show that OoD detection using epistemic uncertainty achieves similar sensitivity as a
GAN discriminator of similar complexity.
3 Review of Bayesian methods and neural networks
Traditional neural networks carry fixed weights and biases. In Bayesian neural networks (BNNs),
the weights and biases are random variables. A forward pass is conducted by randomly sampling
the weights and biases from their distributions and using the sampled values in the forward pass.
The distributions of weights and biases in a Bayesian neural network are updated using algorithms
from Bayesian statistics.
In Bayesian methods, a prior p(θ) is posited that represents prior knowledge about the collection
of parameters θof the network. After collecting data D, the prior distribution is combined with the
likelihood of the data (which accounts for its associated uncertainty) p(D | θ) to obtain a posterior
distribution p(θ| D) using Bayes Theorem:
p(θ| D) = p(D | θ)p(θ)
p(D)(1)
This paper will work under the supervised learning paradigm: Drefers to a training dataset as
a complete entity where D= (X,y). The matrix X∈ X Rn×dis the training data with size
n, dimensionality d, and support X. The vector y∈ Y Rnis the set of real-valued labels with
support Ycorresponding to the training data X. The goal in a supervised regression task is to
find p(y|x), and use this conditional distribution to create an optimal prediction ˆy(e.g., with the
posterior mean). The neural network will be written as a function Φ : X → Y. Writing Φ to have
explicit dependence on its parameters, a forward pass through the BNN is written Φ(x|θi) where
θiis a realization of the parameters from the posterior distribution p(θ| D).
Updating the weights and biases in a network is nontrivial: certain components of Equation 1
are mathematically intractable to calculate in a practical setting, especially the evidence p(D) which
would require an expensive numerical integration of the numerator of (1) over the high-dimensional
parameter space of θ. Given the intractability of using Bayes Theorem directly, there are two
popular workaround methods for sampling from the posterior p(θ| D): Markov Chain Monte Carlo
(MCMC) and Variational Inference (VI). MCMC algorithms can produce a near-exact sampling
from the posterior p(θ| D), but they are not computationally scalable and hence are impractical
for large-scale BNNs [10].
Variational inference algorithms posit a class of distributions Qthat are used to approximate the
posterior p(θ| D). The goal of a variational inference algorithm is to find an optimal distribution
q(θ)Qsuch that q(θ) is most similar to the true posterior distribution p(θ| D). Similarity
3
in this context is measured with the Kullback-Leibler (KL) divergence between a given q(θ)Q
and the posterior p(θ| D). Minimizing the KL divergence directly is difficult (it would require
the computation of the evidence), but the KL divergence can be manipulated to separate out and
neglect the evidence from the optimization problem, since it is independent of q(θ). The solution to
the variational inference problem using this modified objective function, called the evidence lower
bound (ELBO), is equivalent to solving the original problem up to an additive constant.
The BNNs in this paper are trained with the Flipout method for efficient mini-batch optimiza-
tion [11]. Unless otherwise stated, all neural network models mentioned in this paper will be BNNs
with the prior distributions for the weights being independent standard normal distributions: so
all means and standard deviations describing the weights are instantiated with mean parameter 0
and standard deviation 1. The approximating class of distributions used in variational inference
will be the class of independent normal distributions where the means and standard deviations of
these distributions are inferred with training. We use TensorFlow and the TensorFlow Probability
Python libraries to create and test these models.
Methods for calculating aleatoric and epistemic uncertainty in a BNN will be described in Section
5after an introduction to the motivating dataset.
4 The amplitudes dataset
This paper will repeatedly reference an “amplitudes dataset,” a dataset consisting of 40 ×40
grayscale images which may contain one or no events. An “event” is generated by a point spread
function PSF(A, x, y) superimposed on a noisy background, with each event having a specified
amplitude Aand center coordinate (x, y) within the image. Without an event, the image consists
of noise only. This dataset emulates a simple, hypothetical anomaly detection application based
on image sensor data. The generated images can be passed into a neural network that predicts the
presence, amplitude, and coordinates of an event, using the true values as labels. In this paper, the
networks will only be trained to predict the amplitude of an image. Nonevent images are handled
by treating them as having an amplitude of 0.
For a high amplitude event, the PSF generates a signal that has higher pixel brightness values
and a larger spatial extent within the image. The PSF also models sensor saturation by limiting
brightness values to the range [0,1]. A bright event can cause many pixels to saturate over a large
area. Figure 1shows example images from the dataset.
Figure 1: A few different amplitude levels in the dataset
Figure 2shows the distribution of amplitudes among all images in the training set. In this
dataset, images with an amplitude over approximately 21 will fully saturate the image, as is in the
rightmost image shown in Figure 1.
4
Figure 2: Histogram for the distribution of amplitudes in the dataset. This histogram represents
only the event images, which is 50% of the data. The other 50% are nonevent images and thus
have zero amplitude.
The dataset is synthetically generated, so the user can manually control parameters such as the
proportion of images with an event, the size of the training and test sets, and the resolution of the
image. Unless otherwise stated, all experiments will be trained with a training size of 10000 images
with 40 ×40 resolution where there is a 50% split in event/non-event images. In this section, the
symbols A, x, y referred to amplitude, x-position, and y-position respectively. The remainder of
this paper will only concern the prediction of amplitude, so the symbol ywill supplant Afor the
amplitude of an image as to conform with standard labeling notation in machine learning.
The value of the amplitudes dataset lies in the control that the user has in adjusting the genera-
tion parameters, making it ideal for studying how specific perturbations to the training and testing
datasets creates downstream differences in the characteristics of UQ using BNNs. Furthermore, the
dataset is sufficiently complex such that the insights in this paper are likely generalizable to other
image dataset problems.
5 Calculating uncertainty
Recall from Section 2that aleatoric uncertainty is the uncertainty inherent in the data, and epis-
temic uncertainty is the uncertainty in the model parameters. There are varying approaches to
calculate a number for these uncertainties for a trained model. The sections below describe how to
calculate aleatoric and epistemic uncertainty in a BNN. As notation, ˆσAand ˆσEwill represent the
estimated aleatoric and epistemic uncertainty respectively.
5.1 Calculating aleatoric uncertainty
Aleatoric uncertainty in BNNs is estimated by including it as an explicit prediction of the neural
network. Each forward pass through the network results in a tuple (ˆy, ˆσA) where ˆσAis a numeric
representation of the aleatoric uncertainty in the prediction ˆy. This method of calculating aleatoric
uncertainty was originally proposed by [4]. Allowing the network to make varying predictions of ˆσA
for different inputs allows the model to capture heteroscedastic (e.g., non-constant and dependent
on the input) noise in the data. For a full discussion on why a heteroscedastic noise model for
aleatoric uncertainty is appropriate in the amplitudes dataset, see Appendix B.
There is only one label yRfor the regression task of predicting the amplitude of an image,
5
摘要:

Anout-of-distributiondiscriminatorbasedonBayesianneuralnetworkepistemicuncertaintyEthanAncell1,ChristopherBennett2,BertDebusschere2,SapanAgarwal2,ParkHays2,andT.PatrickXiao21DepartmentofStatistics,UniversityofWashington2SandiaNationalLaboratories1AbstractNeuralnetworkshaverevolutionizedthefieldofmac...

展开>> 收起<<
An out-of-distribution discriminator based on Bayesian neural network epistemic uncertainty Ethan Ancell1 Christopher Bennett2 Bert Debusschere2 Sapan Agarwal2 Park Hays2.pdf

共29页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:29 页 大小:1.48MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 29
客服
关注