On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks Huimin Zeng1Zhenrui Yue1Yang Zhang2Ziyi Kou1Lanyu Shang1Dong Wang1 1Unversity of Illinois at Urbana-Champaign

2025-05-02 0 0 2.42MB 8 页 10玖币

侵权投诉

On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks

Huimin Zeng1∗,Zhenrui Yue1,Yang Zhang2,Ziyi Kou1,Lanyu Shang1,Dong Wang1

1Unversity of Illinois at Urbana-Champaign

2University of Notre Dame

{huiminz3, zhenrui3, ziyikou2, lshang3, dwang24}@illinois.edu, yzhang42@nd.edu

Abstract

In many applications with real-world conse-

quences, it is crucial to develop reliable uncertainty

estimation for the predictions made by the AI deci-

sion systems. Targeting at the goal of estimating

uncertainty, various deep neural network (DNN)

based uncertainty estimation algorithms have been

proposed. However, the robustness of the uncer-

tainty returned by these algorithms has not been

systematically explored. In this work, to raise

the awareness of the research community on ro-

bust uncertainty estimation, we show that state-of-

the-art uncertainty estimation algorithms could fail

catastrophically under our proposed adversarial at-

tack despite their impressive performance on uncer-

tainty estimation. In particular, we aim at attacking

the out-domain uncertainty estimation: under our

attack, the uncertainty model would be fooled to

make high-conﬁdent predictions for the out-domain

data, which they originally would have rejected.

Extensive experimental results on various bench-

mark image datasets show that the uncertainty es-

timated by state-of-the-art methods could be easily

corrupted by our attack.

1 Introduction

Deep neural networks (DNNs) have been thriving in various

applications, such as computer vision, natural language pro-

cessing and decision making. However, in many applications

with real-world consequences (e.g. autonomous driving, dis-

ease diagnosis, loan granting), it is not sufﬁcient to merely

pursue the high accuracy of the predictions made by the AI

models, since the deterministic wrong predictions without

any uncertainty justiﬁcation could lead to catastrophic con-

sequences [Galil and El-Yaniv, 2021]. Therefore, to address

the issue of producing over-conﬁdent wrong predictions from

DNNs, great efforts have been made to quantify the predictive

uncertainty of the models, so that ambiguous or low-conﬁdent

predictions could be rejected or deferred to an expert.

Indeed, state-of-the-art algorithms for uncertainty estima-

tion in DNNs have shown their impressive performance in

∗Contact Author

terms of quantifying the conﬁdence of the model predictions.

Handling either in-domain data (generated from the training

distribution) or out-domain data under domain shift, these

algorithms could successfully assign low conﬁdence scores

for ambiguous predictions. However, the robustness of such

estimated uncertainty/conﬁdence is barely studied. The ro-

bustness of uncertainty estimation in this paper is deﬁned

as: to which extent would the predictive conﬁdence be af-

fected when the input is deliberately perturbed? Consider an

example of autonomous driving [Feng et al., 2018], where

the visual system of a self-driving car is trained using col-

lected road scene images. When an extreme weather occurs,

to avoid over-conﬁdent wrong decisions, the visual system

would show high uncertainty for the images captured by the

sensors, since the road scenes observed by the sensors under

the extreme weather (out-domain) could be drastically differ-

ent from the training scenes (in-domain). However, a mali-

cious attacker might attempt to perturb the images captured

by the sensors in such a way, that the visual system would re-

gard the out-domain images as in-domain images, and make

completely wrong decisions with a high conﬁdence, leading

to catastrophic consequences.

Therefore, with the intention of raising the attention of the

research community to systematically investigate the robust-

ness of the uncertainty estimation, we show that SoTA DNN-

based uncertainty estimation algorithms could fail easily un-

der our proposed adversarial attack. In particular, we focus

on attacking the uncertainty estimation for out-domain data

in an image classiﬁcation problem. That is, under our attack,

the uncertainty estimation models would be fooled to assign

extremely high conﬁdence scores for the out-domain images,

which originally would have been rejected by these models

due to the low conﬁdence scores (e.g. Softmax score [Galil

and El-Yaniv, 2021]). As shown in Figure 1, the attacker

will perturb the out-domain images into the high-conﬁdence

region of the victim system. To show vulnerability of the

SoTA DNN based uncertainty estimation algorithms under

our threat model, we launched our proposed out-domain ad-

versarial attack on various algorithms, including Deep En-

semble [Lakshminarayanan et al., 2016], RBF-based Deter-

ministic Uncertainty Quantiﬁcation (DUQ) [Van Amersfoort

et al., 2020], Gaussian process based Deterministic Uncer-

tainty Estimation (DUE) [van Amersfoort et al., 2021]and

Spectral-Normalized Gaussian Process (SNGP) [Liu et al.,

arXiv:2210.02191v2 [cs.LG] 12 Oct 2022

Figure 1: Consider an uncertain model trained on images of airplanes, automobile, birds and dogs.

2020]. Extensive experimental results in various benchmark

image datasets show that the uncertainty estimated by these

algorithms could be drastically corrupted under our attack.

2 Related Work

Uncertainty Estimation in DNNs. A signiﬁcant amount of

algorithms have been proposed to quantify the uncertainty

in DNNs [Lakshminarayanan et al., 2016; van Amersfoort

et al., 2021; Van Amersfoort et al., 2020; Liu et al., 2020;

Alarab and Prakoonwit, 2021; Gal and Ghahramani, 2016].

For instance, in [Lakshminarayanan et al., 2016], a set

of neural networks were trained to construct an ensemble

model for uncertainty estimation using the training data. In

[Van Amersfoort et al., 2020], RBF (radial basis function)

kernel was used to quantify the uncertainty expressed by the

deep neural networks. In addition, a further thread of stud-

ies built their algorithm based on the Gaussian process (GP),

which has been proved to a powerful tool for uncertainty es-

timation theoretically. However, GP suffers from low expres-

sive power, resulting in poor model accuracy [Bradshaw et

al., 2017]. Therefore, based on [Bradshaw et al., 2017],[van

Amersfoort et al., 2021]proposed to regularize the feature

extractor, so that the deep kernel learning could be stabilized.

Similarly, in [Liu et al., 2020], the predictive uncertainty is

computed with a spectral-normalized feature extractor and a

Gaussian process. However, the robustness of these state-

of-the-art algorithms is not well studied. As shown in this

paper, the uncertain DNN models trained using these algo-

rithms could be fooled easily under our attack.

Adversarial Examples. Despite the impressive perfor-

mance on clean data, it has been shown that DNNs could

be extremely vulnerable to adversarial examples [Szegedy

et al., 2013; Goodfellow et al., 2014; Kannan et al., 2018;

Carlini and Wagner, 2017]. That is, deep classiﬁers could

be fooled to make completely wrong predictions for the in-

put images that are deliberately perturbed. However, our

proposed adversarial attack is different from the traditional

adversarial examples in two aspects. Firstly, traditional ad-

versarial examples are crafted by the attacker to corrupt the

accuracy of classiﬁers, whereas our threat model is aimed

at attacking uncertainty estimation in DNNs and increasing

its conﬁdence of the wrong predictions. Moreover, the tra-

ditional adversarial attack is deﬁned under the close-world

assumption [Reiter, 1981; Han et al., 2021], where all pos-

sible predictive categories are covered by the training data.

The in-domain property limits the power of adversary, since

all possible perturbing directions could be simulated by per-

forming untargeted adversarial training [Carlini and Wagner,

2017]. In comparison, our proposed attack is out-domain:

perturbing the out-domain data into the in-domain data dis-

tribution that the uncertainty estimation model is trained to

ﬁt. As shown in our experiments, traditional in-domain ad-

versarial defense mechanism could not withstand our attack:

even adversarially trained models could be fooled by our out-

domain adversarial examples.

Adversarial Attacks on Uncertainty Estimation. The

idea of attacking uncertainty estimation in DNNs have been

introduced in for the ﬁrst time [Galil and El-Yaniv, 2021]. In

[Galil and El-Yaniv, 2021], the authors tried to manipulate the

conﬁdence score over the predictions by perturbing the input

images. By heuristically moving the correctly classiﬁed in-

put images towards the decision boundary of the model, the

attacker could reduce the conﬁdence of the model on the cor-

rect predictions. However, the threat model in [Galil and El-

Yaniv, 2021]could only attack in-domain images, whereas

our attack is formulated for an out-domain scenario. More-

over, the adversarial robustness of OOD models has been

systematically investigated in [Meinke et al., 2021; Augustin

et al., 2020; Meinke and Hein, 2019; Hein et al., 2019;

Bitterwolf et al., 2020]. However, compared to [Meinke et

al., 2021; Augustin et al., 2020; Meinke and Hein, 2019;

Hein et al., 2019; Bitterwolf et al., 2020], where ReLU-based

classiﬁers are mainly discussed, this work aims at designing

a more general threat model for broader range of uncertainty

estimation algorithms.

3 Problem Statement

3.1 Key Concept Deﬁnition

Deﬁnition 1 (Data Domain).We deﬁne two data domains in

this paper, namely PIN (representing in-domain distribution)

and POUT (for out-domain distribution).

An excellent uncertainty estimation model is expected to

make high-conﬁdent predictions for the in-domain data, but

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OnAttackingOut-DomainUncertaintyEstimationinDeepNeuralNetworksHuiminZeng1,ZhenruiYue1,YangZhang2,ZiyiKou1,LanyuShang1,DongWang11UnversityofIllinoisatUrbana-Champaign2UniversityofNotreDamefhuiminz3,zhenrui3,ziyikou2,lshang3,dwang24g@illinois.edu,yzhang42@nd.eduAbstractInmanyapplicationswithreal-worl...

展开>> 收起<<

On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks Huimin Zeng1Zhenrui Yue1Yang Zhang2Ziyi Kou1Lanyu Shang1Dong Wang1 1Unversity of Illinois at Urbana-Champaign.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks Huimin Zeng1Zhenrui Yue1Yang Zhang2Ziyi Kou1Lanyu Shang1Dong Wang1 1Unversity of Illinois at Urbana-Champaign

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: