Ares A System-Oriented Wargame Framework for Adversarial ML Farhan Ahmed

2025-04-30 0 0 276.78KB 7 页 10玖币
侵权投诉
Ares: A System-Oriented Wargame Framework for
Adversarial ML
Farhan Ahmed
Stony Brook University
farhaahmed@cs.stonybrook.edu
Pratik Vaishnavi
Stony Brook University
pvaishnavi@cs.stonybrook.edu
Kevin Eykholt
IBM Research
kheykholt@ibm.com
Amir Rahmati
Stony Brook University
amir@cs.stonybrook.edu
Abstract—Since the discovery of adversarial attacks against
machine learning models nearly a decade ago, research on
adversarial machine learning has rapidly evolved into an eternal
war between defenders, who seek to increase the robustness of
ML models against adversarial attacks, and adversaries, who
seek to develop better attacks capable of weakening or defeating
these defenses. This domain, however, has found little buy-in from
ML practitioners, who are neither overtly concerned about these
attacks affecting their systems in the real world nor are willing
to trade off the accuracy of their models in pursuit of robustness
against these attacks.
In this paper, we motivate the design and implementation
of Ares, an evaluation framework for adversarial ML that
allows researchers to explore attacks and defenses in a realistic
wargame-like environment. Ares frames the conflict between
the attacker and defender as two agents in a reinforcement
learning environment with opposing objectives. This allows the
introduction of system-level evaluation metrics such as time to
failure and evaluation of complex strategies such as moving
target defenses. We provide the results of our initial exploration
involving a white-box attacker against an adversarially trained
defender.
I. INTRODUCTION
The mass adoption of AI-powered systems has motivated
re-examination of the reliability, privacy, and security of AI
algorithms. With respect to security, it was discovered early
on that image based AI algorithms are vulnerable to a class
of adversarial evasion attacks [1], [2]. In such attacks, an
adversary introduces a small amount of noise, imperceptible
to the human eye, in order to reliably induce misclassification
errors during inference. Since its discovery, a large body of
research has proposed numerous empirical defense strategies
such as transforming the model’s inputs [3], modifying the
neural network architecture [4], and training the network on
an alternative training dataset [5]. Despite the vast number
of works, both in developing new adversarial attacks and
proposing new defenses, including robust physical world at-
tacks [6], the adversarial threat model remains unmotivating to
ML practitioners. In a small industry survey, Kumar et al. [7],
discovered that while most organizations surveyed were aware
of adversarial examples, they remarked “This [adversarial
ML] looks futuristic” and lack tools in place to study and
mitigate such attacks.
We argue that two key issues hinder the acceptance of
adversarial evasion attacks as a threat: (1) the unmotivating
threat model used by most prior work and (2) the lack of
tools for evaluating complex adversarial attacker and defender
interactions. Following Kerckhoffs’s principle, adversarial at-
tacks and defenses have mainly been studied using a white-box
threat model, i.e., full knowledge of the network and defense
parameters. Under this lens, many proposed defenses were
shown to be ineffective as an attacker with perfect knowledge
could adapt to the defense [8]. However, such a strong threat
model can only be replicated by attackers with insider access
to the AI algorithm and training data. In real deployment
scenarios, an organization is primarily concerned about the
security of its AI systems against outside attackers.
Despite the lack of recognition of adversarial ML as a threat,
there has been a rise in adversarial attack libraries that enable
ML practitioners to study the current state-of-the-art attack
and defense algorithms. Some examples include University
of Toronto’s CleverHans [9], MIT’s robustness package [10],
University of T¨
ubingen’s Foolbox [11], and IBM’s Adversar-
ial Robustness Toolbox (ART) [12]. Each library defines a
unified framework through which practitioners can evaluate
the effectiveness of an attack or defense using their own
AI systems. Unfortunately, such evaluations are limited by
nature as the evaluated threat model is limited by the attack
algorithm. Furthermore, both the attacker and defender are
assumed static. They do not modify their behavior based on
the actions of the other and, as such, the reported effectiveness
is misleading and does not translate into a meaningful notion
of effectiveness in the real world.
In this paper, we describe a new evaluation framework,
Ares, which represents adversarial attack scenarios as a com-
plex, dynamic interaction between the attacker and defender.
We explore the conflict between the attacker and defender
as two independent agents in a reinforcement learning (RL)
environment with opposing objectives, creating a richer and
more realistic environment for adversarial ML evaluation.
By utilizing this RL-environment, we are able to tweak the
attacker or defender’s strategy (the RL policy) to be static,
randomized, or even learnable. Ares also allows the investiga-
tion of both white-box and black-box threat models, drawing
inspiration from the limitations of prior evaluations.
For its debut, we have used Ares to re-examine the security
of the ensemble/moving target defense (MTD) framework
in a white-box scenario and highlight the vulnerability of
this setup. Using different combinations of naturally trained
and adversarially trained models, an Ares evaluation finds
that, in general, the attacker always wins and adversarial
arXiv:2210.12952v1 [cs.LG] 24 Oct 2022
training can only slightly delay the attacker’s success. As
prior work discusses, the attacker’s success is largely due to
the transferability of adversarial examples [2]. We investigate
this phenomenon more thoroughly through the lens of Ares
and discover that the shared loss gradients between networks,
regardless of training method or model architecture, is the
main culprit. We then discuss how MTDs could be improved
based on this discovery and our next steps towards evaluating
MTDs and other prior works in a black-box threat model
through Ares.
In this paper we make the following contributions:
We develop Ares, an RL-based evaluation framework
for adversarial ML that allows researchers to explore
attack/defense strategies at a system level.
Using Ares, we re-examine ensemble/moving target de-
fense strategies under the white-box threat model and
show that the root cause of this failure is due to the shared
loss gradient between the networks.
The Ares framework is publicly available at https://
github.com/Ethos-lab/ares as we continue develop-
ment for additional features and improvement.
II. BACKGROUND & RELATED WORK
Adversarial Evasion Attacks. Prior works have uncovered
several classes of vulnerabilities for ML models and designed
attacks to exploit them [13]. In this paper, we focus on
one such class of attacks known as evasion attacks. In an
evasion attack, the adversary’s goal is to generate an “ad-
versarial example” – a carefully perturbed input that causes
misclassification. Evasion attacks against ML models have
been developed to suit a wide range of scenarios. White-box
attacks [1], [2], [14], [15] assume full knowledge of/access to
the model, including but not limited to model’s architecture,
parameters, gradients, and training data. Such attacks, although
extremely potent, are mostly impractical in real-world scenar-
ios [16] as the ML models used in commercial systems are
usually hidden underneath a layer of system/network security
measures. Focusing on strengthening these security measures
not only provides improved protection for the underlying ML
models against white-box attacks, it also improves the overall
security posture of the system, and hence, is often a more
practical and desirable approach. Black-box attacks [6], [17]–
[22], on the other hand, only assume query access to the
target ML models. Such a threat model offers a more practical
assumption as several consumer facing ML models provide
this access to their users [23]–[26].
Defenses against Evasion Attacks. A wide range of strategies
to address the threat of adversarial evasion attacks have also
been proposed. One line of works look at tackling this issue
at test-time [3]–[5], [27], [28]. These works usually involve
variations of a preprocessing step that filters out the adversarial
noise from the input before feeding it to the ML model. These
defenses, however, have been shown to convey a false sense of
security and so, been easily broken using adaptive attacks [8].
Another popular strategy involves re-training the model
using a robustness objective [29]–[31]. The defenses that
employ this strategy show promise as they have (so far) stood
strong in the face of adaptive adversaries. All the defenses
discussed so far belong in the broad category of empirical
defenses. These defenses only provide empirical guarantees
of robustness and may not be secure against a future attack.
Another line of works look at developing methods that can
train certifiably robust ML models [32]–[34]. These models
can offer formal robustness guarantees against any attacker
with a pre-defined budget.
Defenses based on Ensembling. One commonly known
property of adversarial examples is that they can similarly
fool models independently trained on the same data [2].
Adversaries can exploit this property by training a surrogate
model to generate adversarial examples against a target model.
This, in fact, is a popular strategy used by several black-box
attacks [17], [19], [20]. Tram`
er et al. [35] use this property
to improve the black-box robustness of models trained using
the single-step attack version of adversarial training. At each
training iteration, source of adversarial examples is randomly
selected from an ensemble containing the currently trained
model and a set of pre-trained models. Other works [36]–[39]
propose strategies for training a diverse pool of models so
that it is difficult for an adversarial example to transfer across
the majority of them. Aggregating the outputs of these di-
verse models should therefore yield improved robustness. This
ensemble diversity strategy, however, has been shown to be
ineffective [40], [41]. In similar vain, some prior works [42],
[43] propose use of ensemble of models as a moving target
defense where, depending on the MTD strategy, the attacker
may face a different target model in each encounter. These
works, unfortunately, suffer from the same shortcomings of
the ensemble methods.
Adversarial ML Libraries. To facilitate research into machine
learning security, multiple research groups and organizations
have developed libraries to assist in development and evalua-
tion of adversarial attacks and defenses. Most notably of these
works are University of Toronto’s CleverHans [9], MIT’s ro-
bustness package [10], University of T¨
ubingen’s Foolbox [11],
and IBM’s Adversarial Robustness Toolbox (ART) [12].
These efforts are orthogonal to our framework. While Ares
focuses on evaluating various attacker and defender strategies
against one another across multiple scenarios, these libraries
focus primarily on facilitating implementation of new attacks
and defenses and benchmarking them against existing ones.
In this paper, we use the Projected Gradient Descent (PGD)
attack from IBM’s ART library as our main adversarial eval-
uation criteria.
III. ARES FRAMEWORK
In this section, we provide an overview of the Ares
framework. As seen in Figure 1, Ares adapts the adversarial
attack/defense problem into an RL-environment consisting
of three main components: (1) the evaluation scenario, (2)
the attacker agent, and (3) the defender agent. Once each
component has been defined by the user, Ares executes a
series of competitions between the attacker and defender. Each
摘要:

Ares:ASystem-OrientedWargameFrameworkforAdversarialMLFarhanAhmedStonyBrookUniversityfarhaahmed@cs.stonybrook.eduPratikVaishnaviStonyBrookUniversitypvaishnavi@cs.stonybrook.eduKevinEykholtIBMResearchkheykholt@ibm.comAmirRahmatiStonyBrookUniversityamir@cs.stonybrook.eduAbstract—Sincethediscoveryofadve...

展开>> 收起<<
Ares A System-Oriented Wargame Framework for Adversarial ML Farhan Ahmed.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:276.78KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注