On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks Huimin Zeng1Zhenrui Yue1Yang Zhang2Ziyi Kou1Lanyu Shang1Dong Wang1 1Unversity of Illinois at Urbana-Champaign

2025-05-02 0 0 2.42MB 8 页 10玖币
侵权投诉
On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks
Huimin Zeng1,Zhenrui Yue1,Yang Zhang2,Ziyi Kou1,Lanyu Shang1,Dong Wang1
1Unversity of Illinois at Urbana-Champaign
2University of Notre Dame
{huiminz3, zhenrui3, ziyikou2, lshang3, dwang24}@illinois.edu, yzhang42@nd.edu
Abstract
In many applications with real-world conse-
quences, it is crucial to develop reliable uncertainty
estimation for the predictions made by the AI deci-
sion systems. Targeting at the goal of estimating
uncertainty, various deep neural network (DNN)
based uncertainty estimation algorithms have been
proposed. However, the robustness of the uncer-
tainty returned by these algorithms has not been
systematically explored. In this work, to raise
the awareness of the research community on ro-
bust uncertainty estimation, we show that state-of-
the-art uncertainty estimation algorithms could fail
catastrophically under our proposed adversarial at-
tack despite their impressive performance on uncer-
tainty estimation. In particular, we aim at attacking
the out-domain uncertainty estimation: under our
attack, the uncertainty model would be fooled to
make high-confident predictions for the out-domain
data, which they originally would have rejected.
Extensive experimental results on various bench-
mark image datasets show that the uncertainty es-
timated by state-of-the-art methods could be easily
corrupted by our attack.
1 Introduction
Deep neural networks (DNNs) have been thriving in various
applications, such as computer vision, natural language pro-
cessing and decision making. However, in many applications
with real-world consequences (e.g. autonomous driving, dis-
ease diagnosis, loan granting), it is not sufficient to merely
pursue the high accuracy of the predictions made by the AI
models, since the deterministic wrong predictions without
any uncertainty justification could lead to catastrophic con-
sequences [Galil and El-Yaniv, 2021]. Therefore, to address
the issue of producing over-confident wrong predictions from
DNNs, great efforts have been made to quantify the predictive
uncertainty of the models, so that ambiguous or low-confident
predictions could be rejected or deferred to an expert.
Indeed, state-of-the-art algorithms for uncertainty estima-
tion in DNNs have shown their impressive performance in
Contact Author
terms of quantifying the confidence of the model predictions.
Handling either in-domain data (generated from the training
distribution) or out-domain data under domain shift, these
algorithms could successfully assign low confidence scores
for ambiguous predictions. However, the robustness of such
estimated uncertainty/confidence is barely studied. The ro-
bustness of uncertainty estimation in this paper is defined
as: to which extent would the predictive confidence be af-
fected when the input is deliberately perturbed? Consider an
example of autonomous driving [Feng et al., 2018], where
the visual system of a self-driving car is trained using col-
lected road scene images. When an extreme weather occurs,
to avoid over-confident wrong decisions, the visual system
would show high uncertainty for the images captured by the
sensors, since the road scenes observed by the sensors under
the extreme weather (out-domain) could be drastically differ-
ent from the training scenes (in-domain). However, a mali-
cious attacker might attempt to perturb the images captured
by the sensors in such a way, that the visual system would re-
gard the out-domain images as in-domain images, and make
completely wrong decisions with a high confidence, leading
to catastrophic consequences.
Therefore, with the intention of raising the attention of the
research community to systematically investigate the robust-
ness of the uncertainty estimation, we show that SoTA DNN-
based uncertainty estimation algorithms could fail easily un-
der our proposed adversarial attack. In particular, we focus
on attacking the uncertainty estimation for out-domain data
in an image classification problem. That is, under our attack,
the uncertainty estimation models would be fooled to assign
extremely high confidence scores for the out-domain images,
which originally would have been rejected by these models
due to the low confidence scores (e.g. Softmax score [Galil
and El-Yaniv, 2021]). As shown in Figure 1, the attacker
will perturb the out-domain images into the high-confidence
region of the victim system. To show vulnerability of the
SoTA DNN based uncertainty estimation algorithms under
our threat model, we launched our proposed out-domain ad-
versarial attack on various algorithms, including Deep En-
semble [Lakshminarayanan et al., 2016], RBF-based Deter-
ministic Uncertainty Quantification (DUQ) [Van Amersfoort
et al., 2020], Gaussian process based Deterministic Uncer-
tainty Estimation (DUE) [van Amersfoort et al., 2021]and
Spectral-Normalized Gaussian Process (SNGP) [Liu et al.,
arXiv:2210.02191v2 [cs.LG] 12 Oct 2022
Figure 1: Consider an uncertain model trained on images of airplanes, automobile, birds and dogs.
2020]. Extensive experimental results in various benchmark
image datasets show that the uncertainty estimated by these
algorithms could be drastically corrupted under our attack.
2 Related Work
Uncertainty Estimation in DNNs. A significant amount of
algorithms have been proposed to quantify the uncertainty
in DNNs [Lakshminarayanan et al., 2016; van Amersfoort
et al., 2021; Van Amersfoort et al., 2020; Liu et al., 2020;
Alarab and Prakoonwit, 2021; Gal and Ghahramani, 2016].
For instance, in [Lakshminarayanan et al., 2016], a set
of neural networks were trained to construct an ensemble
model for uncertainty estimation using the training data. In
[Van Amersfoort et al., 2020], RBF (radial basis function)
kernel was used to quantify the uncertainty expressed by the
deep neural networks. In addition, a further thread of stud-
ies built their algorithm based on the Gaussian process (GP),
which has been proved to a powerful tool for uncertainty es-
timation theoretically. However, GP suffers from low expres-
sive power, resulting in poor model accuracy [Bradshaw et
al., 2017]. Therefore, based on [Bradshaw et al., 2017],[van
Amersfoort et al., 2021]proposed to regularize the feature
extractor, so that the deep kernel learning could be stabilized.
Similarly, in [Liu et al., 2020], the predictive uncertainty is
computed with a spectral-normalized feature extractor and a
Gaussian process. However, the robustness of these state-
of-the-art algorithms is not well studied. As shown in this
paper, the uncertain DNN models trained using these algo-
rithms could be fooled easily under our attack.
Adversarial Examples. Despite the impressive perfor-
mance on clean data, it has been shown that DNNs could
be extremely vulnerable to adversarial examples [Szegedy
et al., 2013; Goodfellow et al., 2014; Kannan et al., 2018;
Carlini and Wagner, 2017]. That is, deep classifiers could
be fooled to make completely wrong predictions for the in-
put images that are deliberately perturbed. However, our
proposed adversarial attack is different from the traditional
adversarial examples in two aspects. Firstly, traditional ad-
versarial examples are crafted by the attacker to corrupt the
accuracy of classifiers, whereas our threat model is aimed
at attacking uncertainty estimation in DNNs and increasing
its confidence of the wrong predictions. Moreover, the tra-
ditional adversarial attack is defined under the close-world
assumption [Reiter, 1981; Han et al., 2021], where all pos-
sible predictive categories are covered by the training data.
The in-domain property limits the power of adversary, since
all possible perturbing directions could be simulated by per-
forming untargeted adversarial training [Carlini and Wagner,
2017]. In comparison, our proposed attack is out-domain:
perturbing the out-domain data into the in-domain data dis-
tribution that the uncertainty estimation model is trained to
fit. As shown in our experiments, traditional in-domain ad-
versarial defense mechanism could not withstand our attack:
even adversarially trained models could be fooled by our out-
domain adversarial examples.
Adversarial Attacks on Uncertainty Estimation. The
idea of attacking uncertainty estimation in DNNs have been
introduced in for the first time [Galil and El-Yaniv, 2021]. In
[Galil and El-Yaniv, 2021], the authors tried to manipulate the
confidence score over the predictions by perturbing the input
images. By heuristically moving the correctly classified in-
put images towards the decision boundary of the model, the
attacker could reduce the confidence of the model on the cor-
rect predictions. However, the threat model in [Galil and El-
Yaniv, 2021]could only attack in-domain images, whereas
our attack is formulated for an out-domain scenario. More-
over, the adversarial robustness of OOD models has been
systematically investigated in [Meinke et al., 2021; Augustin
et al., 2020; Meinke and Hein, 2019; Hein et al., 2019;
Bitterwolf et al., 2020]. However, compared to [Meinke et
al., 2021; Augustin et al., 2020; Meinke and Hein, 2019;
Hein et al., 2019; Bitterwolf et al., 2020], where ReLU-based
classifiers are mainly discussed, this work aims at designing
a more general threat model for broader range of uncertainty
estimation algorithms.
3 Problem Statement
3.1 Key Concept Definition
Definition 1 (Data Domain).We define two data domains in
this paper, namely PIN (representing in-domain distribution)
and POUT (for out-domain distribution).
An excellent uncertainty estimation model is expected to
make high-confident predictions for the in-domain data, but
摘要:

OnAttackingOut-DomainUncertaintyEstimationinDeepNeuralNetworksHuiminZeng1,ZhenruiYue1,YangZhang2,ZiyiKou1,LanyuShang1,DongWang11UnversityofIllinoisatUrbana-Champaign2UniversityofNotreDamefhuiminz3,zhenrui3,ziyikou2,lshang3,dwang24g@illinois.edu,yzhang42@nd.eduAbstractInmanyapplicationswithreal-worl...

展开>> 收起<<
On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks Huimin Zeng1Zhenrui Yue1Yang Zhang2Ziyi Kou1Lanyu Shang1Dong Wang1 1Unversity of Illinois at Urbana-Champaign.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:2.42MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注