NEURAL ARCHITECTURAL BACKDOORS Ren Pang Changjiang Li Zhaohan Xi The Pennsylvania State University

2025-05-02 0 0 1.21MB 15 页 10玖币
侵权投诉
NEURAL ARCHITECTURAL BACKDOORS
Ren Pang, Changjiang Li & Zhaohan Xi
The Pennsylvania State University
{rbp5354,cbl5583,zxx5113}@psu.edu
Shouling Ji
Zhejiang University
sji@zju.edu.cn
Ting Wang
The Pennsylvania State University
tbw5359@psu.edu
ABSTRACT
This paper asks the intriguing question: is it possible to exploit neural architecture
search (NAS) as a new attack vector to launch previously improbable attacks?
Specifically, we present EVAS, a new attack that leverages NAS to find neural archi-
tectures with inherent backdoors and exploits such vulnerability using input-aware
triggers. Compared with existing attacks, EVAS demonstrates many interesting
properties: (i) it does not require polluting training data or perturbing model pa-
rameters; (ii) it is agnostic to downstream fine-tuning or even re-training from
scratch; (iii) it naturally evades defenses that rely on inspecting model parameters
or training data. With extensive evaluation on benchmark datasets, we show that
EVAS features high evasiveness, transferability, and robustness, thereby expanding
the adversary’s design spectrum. We further characterize the mechanisms under-
lying EVAS, which are possibly explainable by architecture-level “shortcuts” that
recognize trigger patterns. This work raises concerns about the current practice of
NAS and points to potential directions to develop effective countermeasures.
1 INTRODUCTION
As a new paradigm of applying ML techniques in practice, automated machine learning (AutoML)
automates the pipeline from raw data to deployable models, which covers model design, optimizer
selection, and parameter tuning. The use of AutoML greatly simplifies the ML development cycles
and propels the trend of ML democratization. In particular, neural architecture search (NAS), one
primary AutoML task, aims to find performant deep neural network (DNN) arches
1
tailored to given
datasets. In many cases, NAS is shown to find models remarkably outperforming manually designed
ones (Pham et al., 2018; Liu et al., 2019; Li et al., 2020).
In contrast to the intensive research on improving the capability of NAS, its security implications
are largely unexplored. As ML models are becoming the new targets of malicious attacks (Biggio &
Roli, 2018), the lack of understanding about the risks of NAS is highly concerning, given its surging
popularity in security-sensitive domains (Pang et al., 2022). Towards bridging this striking gap, we
pose the intriguing yet critical question:
Is it possible for the adversary to exploit NAS to launch previously improbable attacks?
This work provides an affirmative answer to this question. We present exploitable and vulnerable
arch search (EVAS), a new backdoor attack that leverages NAS to find neural arches with inherent,
exploitable vulnerability. Conventional backdoor attacks typically embed the malicious functions
(“backdoors”) into the space of model parameters. They often assume strong threat models, such
as polluting training data (Gu et al., 2017; Liu et al., 2018; Pang et al., 2020) or perturbing model
parameters (Ji et al., 2018; Qi et al., 2022), and are thus subject to defenses based on model
inspection (Wang et al., 2019; Liu et al., 2019) and data filtering (Gao et al., 2019). In EVAS, however,
as the backdoors are carried in the space of model arches, even if the victim trains the models using
1In the following, we use “arch” for short of “architecture”.
arXiv:2210.12179v2 [cs.CR] 7 Nov 2022
clean data and operates them in a black-box manner, the backdoors are still retained. Moreover, due
to its independence of model parameters or training data, EVAS is naturally robust against defenses
such as model inspection and input filtering.
To realize EVAS, we define a novel metric based on neural tangent kernel (Chen et al., 2021), which
effectively indicates the exploitable vulnerability of a given arch; further, we integrate this metric into
the NAS-without-training framework (Mellor et al., 2021; Chen et al., 2021). The resulting search
method is able to efficiently identify candidate arches without requiring model training or backdoor
testing. To verify EVASs empirical effectiveness, we evaluate EVAS on benchmark datasets and show:
(i) EVAS successfully finds arches with exploitable vulnerability, (ii) the injected backdoors may be
explained by arch-level “shortcuts” that recognize trigger patterns, and (iii) EVAS demonstrates high
evasiveness, transferability, and robustness against defenses. Our findings show the feasibility of
exploiting NAS as a new attack vector to implement previously improbable attacks, raise concerns
about the current practice of NAS in security-sensitive domains, and point to potential directions to
develop effective mitigation.
2 RELATED WORK
Next, we survey the literature relevant to this work.
Neural arch search.
The existing NAS methods can be categorized along search space, search
strategy, and performance measure. Search space – early methods focus on the chain-of-layer
structure (Baker et al., 2017), while recent work proposes to search for motifs of cell structures (Zoph
et al., 2018; Pham et al., 2018; Liu et al., 2019). Search strategy – early methods rely on either random
search (Jozefowicz et al., 2015) or Bayesian optimization (Bergstra et al., 2013), which are limited in
model complexity; recent work mainly uses the approaches of reinforcement learning (Baker et al.,
2017) or neural evolution (Liu et al., 2019). Performance measure – one-shot NAS has emerged as a
popular performance measure. It considers all candidate arches as different sub-graphs of a super-net
(i.e., the one-shot model) and shares weights between candidate arches (Liu et al., 2019). Despite
the intensive research on NAS, its security implications are largely unexplored. Recent work shows
that NAS-generated models tend to be more vulnerable to various malicious attacks than manually
designed ones (Pang et al., 2022; Devaguptapu et al., 2021). This work explores another dimension:
whether it can be exploited as an attack vector to launch new attacks, which complements the existing
studies on the security of NAS.
Backdoor attacks and defenses.
Backdoor attacks inject malicious backdoors into the victim’s
model during training and activate such backdoors at inference, which can be categorized along attack
targets – input-specific (Shafahi et al., 2018), class-specific (Tang et al., 2020), or any-input (Gu
et al., 2017), attack vectors – polluting training data (Liu et al., 2018) or releasing infected models (Ji
et al., 2018), and optimization metrics – attack effectiveness (Pang et al., 2020), transferability (Yao
et al., 2019), or attack evasiveness(Chen et al., 2017). To mitigate such threats, many defenses have
also been proposed, which can be categorized according to their strategies (Pang et al., 2022): input
filtering purges poisoning samples from training data (Tran et al., 2018); model inspection determines
whether a given model is backdoored(Liu et al., 2019; Wang et al., 2019), and input inspection
detects trigger inputs at inference time (Gao et al., 2019). Most attacks and defenses above focus
on backdoors implemented in the space of model parameters. Concurrent to this work, Bober-Irizar
et al. (2022) explore using neural arches to implement backdoors by manually designing “trigger
detectors” in the arches and activating such detectors using poisoning data during training. This work
investigates using NAS to directly search for arches with exploitable vulnerability, which represents a
new direction of backdoor attacks.
3 EVAS
Next, we present EVAS, a new backdoor attack leveraging NAS to find neural arches with exploitable
vulnerability. We begin by introducing the threat model.
2
trigger
input malfunction
EVAS search
Training
infected model
adversary
<latexit sha1_base64="6tRDbU4fHUw4V+FYXA6UDbMIt7M=">AAACenicbZHJahtBEIZb4yy2sng7+jJYCiQExIxxlqNJLjk6ENnGbiFqemqsRr0M3TWOhmHeItfkvfIuPqS1QCI5BQ0/VV8V1X9lpZKekuR3J9p69PjJ0+2d7rPnL17u7u0fXHhbOYFDYZV1Vxl4VNLgkCQpvCodgs4UXmbTz/P65R06L635RnWJIw23RhZSAIXUdb8Yc1DlBPrjvV4ySBYRPxTpSvTYKs7H+51rnltRaTQkFHh/kyYljRpwJIXCtssrjyWIKdziTZAGNPpRs1i5jV+FTB4X1oVnKF5k/+1oQHtf6yyQGmji12qz5ZBNfg7+l8/0OutJg6tdvrEjFR9HjTRlRWjEcsWiUjHZeO5cnEuHglQdBAgnwy9jMQEHgoK/XW7wu7Bag8kbLoR0om34FJ1JBu9wxu9EsAldwyeZnTV97sOEkjzVCvkc7rftX7rthmukm94/FBcng/T94PTrSe/s0+ou2+yIHbPXLGUf2Bn7ws7ZkAlm2A/2k/3q3EfH0Zvo7RKNOqueQ7YW0ekf6UHEXg==</latexit>
f
vulnerable arch
clean dataset
<latexit sha1_base64="T8eDilDKIBvB/kBIsNFdPp63CI0=">AAACi3icbZHbahRBEIZ7x1NcjdnolXgzuCvEm2UmHlEvgiJ4GcFNgull6ampyTTbh6G7Nu4wDD6Nt/o8vo29B9DdWNDwU/V1UfVXVinpKUl+d6Jr12/cvLVzu3vn7u69vd7+/RNvZw5wBFZZd5YJj0oaHJEkhWeVQ6EzhafZ9MOifnqJzktrvlBd4ViLCyMLCYJCatJ7OCgmXKiqFAcccktvY04lkng6mPT6yTBZRnxVpGvRZ+s4nux3vvLcwkyjIVDC+/M0qWjcCEcSFLZdPvNYCZiKCzwP0giNftwsd2jjJyGTx4V14RmKl9l/fzRCe1/rLJBaUOk3avNVk21+Af6Xz/Qm60kLV7t8a0YqXo8baaoZoYHViMVMxWTjhZVxLh0CqToIAU6GLWMohRNAwfAuN/gNrNbC5A0HkA7ahk/RmWT4Auf8EoJN6BpeZnbeDLgPHSryVCvkC3jQtn/pthuukW57f1WcHA7Tl8Pnnw/7R+/Xd9lhj9hjdsBS9oodsU/smI0YsO/sB/vJfkW70bPoTfRuhUad9Z8HbCOij38AZM/JSg==</latexit>
f(·;)
Adversary
Input Trigger Input
generator
Victim
Malicious Arch
Training
Malicious Model Malfunction
Clean Dataset
trigger
<latexit sha1_base64="etNJS7l+dTsiatOJk/gBHafVGRg=">AAACe3icbZHbahRBEIZ7x1NcD0n00pvBXUFElplg1MugN7mMkE2C20voqanNNtuHobsm7tDMY3irz+XDCOk9gNmNBQ0/VV8V1X8VlZKesuxPJ7l3/8HDRzuPu0+ePnu+u7f/4szb2gEOwSrrLgrhUUmDQ5Kk8KJyKHSh8LyYfV3Uz6/ReWnNKTUVjrW4MnIiQVBMjfqcpCoxzNv+5V4vG2TLSO+KfC16bB0nl/ud77y0UGs0BEp4P8qzisZBOJKgsO3y2mMlYCaucBSlERr9OCx3btM3MVOmE+viM5Qus7c7gtDeN7qIpBY09Ru1+WrINr8A/8sXepP1pIVrXLm1I00+j4M0VU1oYLXipFYp2XRhXVpKh0CqiUKAk/GXKUyFE0DR4C43+AOs1sKUgQNIB23gM3QmGxzinF9DtAld4NPCzkOf+zihIk+NQr6A+237j2678Rr5tvd3xdnBIP84+PDtoHf0ZX2XHfaKvWZvWc4+sSN2zE7YkAGz7Cf7xX53/ia95F3yfoUmnXXPS7YRyeENiu/FHw==</latexit>
˜x
<latexit sha1_base64="u8LfjtMONkwFfmrCZNwbPOUOgNk=">AAACh3icbZFLb9NAEMc35tWGVwrixMUiQSqXYFfQVuJS4MKxSKSt6EbRejxJVtmHtTtOY1n+MFzhE/Ft2DwkSMpIK/0185vR7H+yQklPSfK7Fd25e+/+g7399sNHj5887Rw8u/C2dIADsMq6q0x4VNLggCQpvCocCp0pvMxmn5f1yzk6L635RlWBQy0mRo4lCAqpUedFb3LIIbf0IeZz4WiKJN70Rp1u0k9WEd8W6UZ02SbORwet7zy3UGo0BEp4f50mBQ3rMFGCwqbNS4+FgJmY4HWQRmj0w3q1fxO/Dpk8HlsXnqF4lf23oxba+0pngdSCpn6rtlgP2eWX4H/5TG+znrRwlct3dqTx6bCWpigJDaxXHJcqJhsvbYxz6RBIVUEIcDL8MoapcAIomN3mBm/Aai1MXnMA6aCp+QydSfrvccHnEGxCV/NpZhd1j/swoSBPlUK+hHtN85du2uEa6a73t8XFUT897r/7etQ9+7S5yx57yV6xQ5ayE3bGvrBzNmDAavaD/WS/ov3obXQcna7RqLXpec62Ivr4B6lsx6s=</latexit>
g(·;#)
<latexit sha1_base64="hSKFqIO+R8AL5Cb/Diu2KTFCWl4=">AAACc3icbZFLb9NAEMc3pkBJebRw7MVqjMSFyK54HSu4cCxq01btRtV6PElW2Ye1Oy6xLH8ErvDZ+CDc2TykNmlHWumvmd+MZv+Tl0p6StO/nejR1uMnT7efdXeev3j5anfv9Zm3lQMcgFXWXeTCo5IGByRJ4UXpUOhc4Xk+/Tavn9+g89KaU6pLHGoxNnIkQVBInSSz5Hq3l/bTRcT3RbYSPbaK4+u9ziUvLFQaDYES3l9laUnDRjiSoLDt8spjKWAqxngVpBEa/bBZ7NrGb0OmiEfWhWcoXmTvdjRCe1/rPJBa0MSv1WbLIZv8HHyQz/U660kLV7tiY0cafRk20pQVoYHliqNKxWTjuWVxIR0CqToIAU6GX8YwEU4ABWO73OBPsFoLUzQcQDpoGz5FZ9L+R5zxGwg2oWv4JLezJuE+TCjJU62Qz+GkbW/pthuukW16f1+cHfazT/0PPw57R19Xd9lm++yAvWMZ+8yO2Hd2zAYM2Jj9Yr/Zn86/aD86iJIlGnVWPW/YWkTv/wNTOMFp</latexit>
x
generator
clean
input
1
2
3
Figure 1: Attack framework of EVAS. (1) The adversary applies NAS to search for arches with exploitable
vulnerability; (2) such vulnerability is retained even if the models are trained using clean data; (3) the adversary
exploits such vulnerability by generating trigger-embedded inputs.
3.1 THREAT MODEL
A backdoor attack injects a hidden malicious function (“backdoor”) into a target model (Pang et al.,
2022). The backdoor is activated once a pre-defined condition (“trigger”) is present, while the model
behaves normally otherwise. In a predictive task, the backdoor is often defined as classifying a given
input to a class desired by the adversary, while the trigger can be defined as a specific perturbation
applied to the input. Formally, given input
x
and trigger
r= (m, p)
in which
m
is a mask and
p
is a
pattern, the trigger-embedded input is defined as:
˜x=x(1 m) + pm(1)
Let
f
be the backdoor-infected model. The backdoor attack implies that for given input-label pair
(x, y),f(x) = yand f(˜x) = twith high probability, where tis the adversary’s target class.
The conventional backdoor attacks typically follow two types of threat models: (i) the adversary
directly trains a backdoor-embedded model, which is then released to and used by the victim user (Liu
et al., 2018; Pang et al., 2020; Ji et al., 2018); or (ii) the adversary indirectly pollutes the training data
or manipulate the training process (Gu et al., 2017; Qi et al., 2022) to inject the backdoor into the
target model. As illustrated in Figure 1, in EVAS, we assume a more practical threat model in which
the adversary only releases the exploitable arch to the user, who may choose to train the model using
arbitrary data (e.g., clean data) or apply various defenses (e.g., model inspection or data filtering)
before or during using the model. We believe this represents a more realistic setting: due to the
prohibitive computational cost of NAS, users may opt to use performant model arches provided by
third parties, which opens the door for the adversary to launch the EVAS attack.
However, realizing EVAS represents non-trivial challenges including (i) how to define the trigger
patterns? (ii) how to define the exploitable, vulnerable arches? and (iii) how to search for such arches
efficiently? Below we elaborate on each of these key questions.
3.2 INPUT-AWARE TRIGGERS
Most conventional backdoor attacks assume universal triggers: the same trigger is applied to all the
inputs. However, universal triggers can be easily detected and mitigated by current defenses (Wang
et al., 2019; Liu et al., 2019). Moreover, it is shown that implementing universal triggers at the arch
level requires manually designing “trigger detectors” in the arches and activating such detectors using
poisoning data during training (Bober-Irizar et al., 2022), which does not fit our threat model.
Instead, as illustrated in Figure 1, we adopt input-aware triggers (Nguyen & Tran, 2020), in which
a trigger generator
g
(parameterized by
ϑ
) generates trigger
rx
specific to each input
x
. Compared
with universal triggers, it is more challenging to detect or mitigate input-aware triggers. Interestingly,
because of the modeling capacity of the trigger generator, it is more feasible to implement input-aware
triggers at the arch level (details in § 4). For simplicity, below we use
˜x=g(x;ϑ)
to denote both
generating trigger rxfor xand applying rxto xto generate the trigger-embedded input ˜x.
3
摘要:

NEURALARCHITECTURALBACKDOORSRenPang,ChangjiangLi&ZhaohanXiThePennsylvaniaStateUniversity{rbp5354,cbl5583,zxx5113}@psu.eduShoulingJiZhejiangUniversitysji@zju.edu.cnTingWangThePennsylvaniaStateUniversitytbw5359@psu.eduABSTRACTThispaperaskstheintriguingquestion:isitpossibletoexploitneuralarchitecturese...

展开>> 收起<<
NEURAL ARCHITECTURAL BACKDOORS Ren Pang Changjiang Li Zhaohan Xi The Pennsylvania State University.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.21MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注