Ares A System-Oriented Wargame Framework for Adversarial ML Farhan Ahmed

2025-04-30 0 0 276.78KB 7 页 10玖币

侵权投诉

Ares: A System-Oriented Wargame Framework for

Adversarial ML

Farhan Ahmed

Stony Brook University

farhaahmed@cs.stonybrook.edu

Pratik Vaishnavi

Stony Brook University

pvaishnavi@cs.stonybrook.edu

Kevin Eykholt

IBM Research

kheykholt@ibm.com

Amir Rahmati

Stony Brook University

amir@cs.stonybrook.edu

Abstract—Since the discovery of adversarial attacks against

machine learning models nearly a decade ago, research on

adversarial machine learning has rapidly evolved into an eternal

war between defenders, who seek to increase the robustness of

ML models against adversarial attacks, and adversaries, who

seek to develop better attacks capable of weakening or defeating

these defenses. This domain, however, has found little buy-in from

ML practitioners, who are neither overtly concerned about these

attacks affecting their systems in the real world nor are willing

to trade off the accuracy of their models in pursuit of robustness

against these attacks.

In this paper, we motivate the design and implementation

of Ares, an evaluation framework for adversarial ML that

allows researchers to explore attacks and defenses in a realistic

wargame-like environment. Ares frames the conﬂict between

the attacker and defender as two agents in a reinforcement

learning environment with opposing objectives. This allows the

introduction of system-level evaluation metrics such as time to

failure and evaluation of complex strategies such as moving

target defenses. We provide the results of our initial exploration

involving a white-box attacker against an adversarially trained

defender.

I. INTRODUCTION

The mass adoption of AI-powered systems has motivated

re-examination of the reliability, privacy, and security of AI

algorithms. With respect to security, it was discovered early

on that image based AI algorithms are vulnerable to a class

of adversarial evasion attacks [1], [2]. In such attacks, an

adversary introduces a small amount of noise, imperceptible

to the human eye, in order to reliably induce misclassiﬁcation

errors during inference. Since its discovery, a large body of

research has proposed numerous empirical defense strategies

such as transforming the model’s inputs [3], modifying the

neural network architecture [4], and training the network on

an alternative training dataset [5]. Despite the vast number

of works, both in developing new adversarial attacks and

proposing new defenses, including robust physical world at-

tacks [6], the adversarial threat model remains unmotivating to

ML practitioners. In a small industry survey, Kumar et al. [7],

discovered that while most organizations surveyed were aware

of adversarial examples, they remarked “This [adversarial

ML] looks futuristic” and lack tools in place to study and

mitigate such attacks.

We argue that two key issues hinder the acceptance of

adversarial evasion attacks as a threat: (1) the unmotivating

threat model used by most prior work and (2) the lack of

tools for evaluating complex adversarial attacker and defender

interactions. Following Kerckhoffs’s principle, adversarial at-

tacks and defenses have mainly been studied using a white-box

threat model, i.e., full knowledge of the network and defense

parameters. Under this lens, many proposed defenses were

shown to be ineffective as an attacker with perfect knowledge

could adapt to the defense [8]. However, such a strong threat

model can only be replicated by attackers with insider access

to the AI algorithm and training data. In real deployment

scenarios, an organization is primarily concerned about the

security of its AI systems against outside attackers.

Despite the lack of recognition of adversarial ML as a threat,

there has been a rise in adversarial attack libraries that enable

ML practitioners to study the current state-of-the-art attack

and defense algorithms. Some examples include University

of Toronto’s CleverHans [9], MIT’s robustness package [10],

University of T¨

ubingen’s Foolbox [11], and IBM’s Adversar-

ial Robustness Toolbox (ART) [12]. Each library deﬁnes a

uniﬁed framework through which practitioners can evaluate

the effectiveness of an attack or defense using their own

AI systems. Unfortunately, such evaluations are limited by

nature as the evaluated threat model is limited by the attack

algorithm. Furthermore, both the attacker and defender are

assumed static. They do not modify their behavior based on

the actions of the other and, as such, the reported effectiveness

is misleading and does not translate into a meaningful notion

of effectiveness in the real world.

In this paper, we describe a new evaluation framework,

Ares, which represents adversarial attack scenarios as a com-

plex, dynamic interaction between the attacker and defender.

We explore the conﬂict between the attacker and defender

as two independent agents in a reinforcement learning (RL)

environment with opposing objectives, creating a richer and

more realistic environment for adversarial ML evaluation.

By utilizing this RL-environment, we are able to tweak the

attacker or defender’s strategy (the RL policy) to be static,

randomized, or even learnable. Ares also allows the investiga-

tion of both white-box and black-box threat models, drawing

inspiration from the limitations of prior evaluations.

For its debut, we have used Ares to re-examine the security

of the ensemble/moving target defense (MTD) framework

in a white-box scenario and highlight the vulnerability of

this setup. Using different combinations of naturally trained

and adversarially trained models, an Ares evaluation ﬁnds

that, in general, the attacker always wins and adversarial

arXiv:2210.12952v1 [cs.LG] 24 Oct 2022

training can only slightly delay the attacker’s success. As

prior work discusses, the attacker’s success is largely due to

the transferability of adversarial examples [2]. We investigate

this phenomenon more thoroughly through the lens of Ares

and discover that the shared loss gradients between networks,

regardless of training method or model architecture, is the

main culprit. We then discuss how MTDs could be improved

based on this discovery and our next steps towards evaluating

MTDs and other prior works in a black-box threat model

through Ares.

In this paper we make the following contributions:

•We develop Ares, an RL-based evaluation framework

for adversarial ML that allows researchers to explore

attack/defense strategies at a system level.

•Using Ares, we re-examine ensemble/moving target de-

fense strategies under the white-box threat model and

show that the root cause of this failure is due to the shared

loss gradient between the networks.

The Ares framework is publicly available at https://

github.com/Ethos-lab/ares as we continue develop-

ment for additional features and improvement.

II. BACKGROUND & RELATED WORK

Adversarial Evasion Attacks. Prior works have uncovered

several classes of vulnerabilities for ML models and designed

attacks to exploit them [13]. In this paper, we focus on

one such class of attacks known as evasion attacks. In an

evasion attack, the adversary’s goal is to generate an “ad-

versarial example” – a carefully perturbed input that causes

misclassiﬁcation. Evasion attacks against ML models have

been developed to suit a wide range of scenarios. White-box

attacks [1], [2], [14], [15] assume full knowledge of/access to

the model, including but not limited to model’s architecture,

parameters, gradients, and training data. Such attacks, although

extremely potent, are mostly impractical in real-world scenar-

ios [16] as the ML models used in commercial systems are

usually hidden underneath a layer of system/network security

measures. Focusing on strengthening these security measures

not only provides improved protection for the underlying ML

models against white-box attacks, it also improves the overall

security posture of the system, and hence, is often a more

practical and desirable approach. Black-box attacks [6], [17]–

[22], on the other hand, only assume query access to the

target ML models. Such a threat model offers a more practical

assumption as several consumer facing ML models provide

this access to their users [23]–[26].

Defenses against Evasion Attacks. A wide range of strategies

to address the threat of adversarial evasion attacks have also

been proposed. One line of works look at tackling this issue

at test-time [3]–[5], [27], [28]. These works usually involve

variations of a preprocessing step that ﬁlters out the adversarial

noise from the input before feeding it to the ML model. These

defenses, however, have been shown to convey a false sense of

security and so, been easily broken using adaptive attacks [8].

Another popular strategy involves re-training the model

using a robustness objective [29]–[31]. The defenses that

employ this strategy show promise as they have (so far) stood

strong in the face of adaptive adversaries. All the defenses

discussed so far belong in the broad category of empirical

defenses. These defenses only provide empirical guarantees

of robustness and may not be secure against a future attack.

Another line of works look at developing methods that can

train certiﬁably robust ML models [32]–[34]. These models

can offer formal robustness guarantees against any attacker

with a pre-deﬁned budget.

Defenses based on Ensembling. One commonly known

property of adversarial examples is that they can similarly

fool models independently trained on the same data [2].

Adversaries can exploit this property by training a surrogate

model to generate adversarial examples against a target model.

This, in fact, is a popular strategy used by several black-box

attacks [17], [19], [20]. Tram`

er et al. [35] use this property

to improve the black-box robustness of models trained using

the single-step attack version of adversarial training. At each

training iteration, source of adversarial examples is randomly

selected from an ensemble containing the currently trained

model and a set of pre-trained models. Other works [36]–[39]

propose strategies for training a diverse pool of models so

that it is difﬁcult for an adversarial example to transfer across

the majority of them. Aggregating the outputs of these di-

verse models should therefore yield improved robustness. This

ensemble diversity strategy, however, has been shown to be

ineffective [40], [41]. In similar vain, some prior works [42],

[43] propose use of ensemble of models as a moving target

defense where, depending on the MTD strategy, the attacker

may face a different target model in each encounter. These

works, unfortunately, suffer from the same shortcomings of

the ensemble methods.

Adversarial ML Libraries. To facilitate research into machine

learning security, multiple research groups and organizations

have developed libraries to assist in development and evalua-

tion of adversarial attacks and defenses. Most notably of these

works are University of Toronto’s CleverHans [9], MIT’s ro-

bustness package [10], University of T¨

ubingen’s Foolbox [11],

and IBM’s Adversarial Robustness Toolbox (ART) [12].

These efforts are orthogonal to our framework. While Ares

focuses on evaluating various attacker and defender strategies

against one another across multiple scenarios, these libraries

focus primarily on facilitating implementation of new attacks

and defenses and benchmarking them against existing ones.

In this paper, we use the Projected Gradient Descent (PGD)

attack from IBM’s ART library as our main adversarial eval-

uation criteria.

III. ARES FRAMEWORK

In this section, we provide an overview of the Ares

framework. As seen in Figure 1, Ares adapts the adversarial

attack/defense problem into an RL-environment consisting

of three main components: (1) the evaluation scenario, (2)

the attacker agent, and (3) the defender agent. Once each

component has been deﬁned by the user, Ares executes a

series of competitions between the attacker and defender. Each

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Ares:ASystem-OrientedWargameFrameworkforAdversarialMLFarhanAhmedStonyBrookUniversityfarhaahmed@cs.stonybrook.eduPratikVaishnaviStonyBrookUniversitypvaishnavi@cs.stonybrook.eduKevinEykholtIBMResearchkheykholt@ibm.comAmirRahmatiStonyBrookUniversityamir@cs.stonybrook.eduAbstractSincethediscoveryofadve...

展开>> 收起<<

Ares A System-Oriented Wargame Framework for Adversarial ML Farhan Ahmed.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Ares A System-Oriented Wargame Framework for Adversarial ML Farhan Ahmed

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: