
On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks
Huimin Zeng1∗,Zhenrui Yue1,Yang Zhang2,Ziyi Kou1,Lanyu Shang1,Dong Wang1
1Unversity of Illinois at Urbana-Champaign
2University of Notre Dame
{huiminz3, zhenrui3, ziyikou2, lshang3, dwang24}@illinois.edu, yzhang42@nd.edu
Abstract
In many applications with real-world conse-
quences, it is crucial to develop reliable uncertainty
estimation for the predictions made by the AI deci-
sion systems. Targeting at the goal of estimating
uncertainty, various deep neural network (DNN)
based uncertainty estimation algorithms have been
proposed. However, the robustness of the uncer-
tainty returned by these algorithms has not been
systematically explored. In this work, to raise
the awareness of the research community on ro-
bust uncertainty estimation, we show that state-of-
the-art uncertainty estimation algorithms could fail
catastrophically under our proposed adversarial at-
tack despite their impressive performance on uncer-
tainty estimation. In particular, we aim at attacking
the out-domain uncertainty estimation: under our
attack, the uncertainty model would be fooled to
make high-confident predictions for the out-domain
data, which they originally would have rejected.
Extensive experimental results on various bench-
mark image datasets show that the uncertainty es-
timated by state-of-the-art methods could be easily
corrupted by our attack.
1 Introduction
Deep neural networks (DNNs) have been thriving in various
applications, such as computer vision, natural language pro-
cessing and decision making. However, in many applications
with real-world consequences (e.g. autonomous driving, dis-
ease diagnosis, loan granting), it is not sufficient to merely
pursue the high accuracy of the predictions made by the AI
models, since the deterministic wrong predictions without
any uncertainty justification could lead to catastrophic con-
sequences [Galil and El-Yaniv, 2021]. Therefore, to address
the issue of producing over-confident wrong predictions from
DNNs, great efforts have been made to quantify the predictive
uncertainty of the models, so that ambiguous or low-confident
predictions could be rejected or deferred to an expert.
Indeed, state-of-the-art algorithms for uncertainty estima-
tion in DNNs have shown their impressive performance in
∗Contact Author
terms of quantifying the confidence of the model predictions.
Handling either in-domain data (generated from the training
distribution) or out-domain data under domain shift, these
algorithms could successfully assign low confidence scores
for ambiguous predictions. However, the robustness of such
estimated uncertainty/confidence is barely studied. The ro-
bustness of uncertainty estimation in this paper is defined
as: to which extent would the predictive confidence be af-
fected when the input is deliberately perturbed? Consider an
example of autonomous driving [Feng et al., 2018], where
the visual system of a self-driving car is trained using col-
lected road scene images. When an extreme weather occurs,
to avoid over-confident wrong decisions, the visual system
would show high uncertainty for the images captured by the
sensors, since the road scenes observed by the sensors under
the extreme weather (out-domain) could be drastically differ-
ent from the training scenes (in-domain). However, a mali-
cious attacker might attempt to perturb the images captured
by the sensors in such a way, that the visual system would re-
gard the out-domain images as in-domain images, and make
completely wrong decisions with a high confidence, leading
to catastrophic consequences.
Therefore, with the intention of raising the attention of the
research community to systematically investigate the robust-
ness of the uncertainty estimation, we show that SoTA DNN-
based uncertainty estimation algorithms could fail easily un-
der our proposed adversarial attack. In particular, we focus
on attacking the uncertainty estimation for out-domain data
in an image classification problem. That is, under our attack,
the uncertainty estimation models would be fooled to assign
extremely high confidence scores for the out-domain images,
which originally would have been rejected by these models
due to the low confidence scores (e.g. Softmax score [Galil
and El-Yaniv, 2021]). As shown in Figure 1, the attacker
will perturb the out-domain images into the high-confidence
region of the victim system. To show vulnerability of the
SoTA DNN based uncertainty estimation algorithms under
our threat model, we launched our proposed out-domain ad-
versarial attack on various algorithms, including Deep En-
semble [Lakshminarayanan et al., 2016], RBF-based Deter-
ministic Uncertainty Quantification (DUQ) [Van Amersfoort
et al., 2020], Gaussian process based Deterministic Uncer-
tainty Estimation (DUE) [van Amersfoort et al., 2021]and
Spectral-Normalized Gaussian Process (SNGP) [Liu et al.,
arXiv:2210.02191v2 [cs.LG] 12 Oct 2022