NEURAL ARCHITECTURAL BACKDOORS Ren Pang Changjiang Li Zhaohan Xi The Pennsylvania State University

2025-05-02 0 0 1.21MB 15 页 10玖币

侵权投诉

NEURAL ARCHITECTURAL BACKDOORS

Ren Pang, Changjiang Li & Zhaohan Xi

The Pennsylvania State University

{rbp5354,cbl5583,zxx5113}@psu.edu

Shouling Ji

Zhejiang University

sji@zju.edu.cn

Ting Wang

The Pennsylvania State University

tbw5359@psu.edu

ABSTRACT

This paper asks the intriguing question: is it possible to exploit neural architecture

search (NAS) as a new attack vector to launch previously improbable attacks?

Speciﬁcally, we present EVAS, a new attack that leverages NAS to ﬁnd neural archi-

tectures with inherent backdoors and exploits such vulnerability using input-aware

triggers. Compared with existing attacks, EVAS demonstrates many interesting

properties: (i) it does not require polluting training data or perturbing model pa-

rameters; (ii) it is agnostic to downstream ﬁne-tuning or even re-training from

scratch; (iii) it naturally evades defenses that rely on inspecting model parameters

or training data. With extensive evaluation on benchmark datasets, we show that

EVAS features high evasiveness, transferability, and robustness, thereby expanding

the adversary’s design spectrum. We further characterize the mechanisms under-

lying EVAS, which are possibly explainable by architecture-level “shortcuts” that

recognize trigger patterns. This work raises concerns about the current practice of

NAS and points to potential directions to develop effective countermeasures.

1 INTRODUCTION

As a new paradigm of applying ML techniques in practice, automated machine learning (AutoML)

automates the pipeline from raw data to deployable models, which covers model design, optimizer

selection, and parameter tuning. The use of AutoML greatly simpliﬁes the ML development cycles

and propels the trend of ML democratization. In particular, neural architecture search (NAS), one

primary AutoML task, aims to ﬁnd performant deep neural network (DNN) arches

tailored to given

datasets. In many cases, NAS is shown to ﬁnd models remarkably outperforming manually designed

ones (Pham et al., 2018; Liu et al., 2019; Li et al., 2020).

In contrast to the intensive research on improving the capability of NAS, its security implications

are largely unexplored. As ML models are becoming the new targets of malicious attacks (Biggio &

Roli, 2018), the lack of understanding about the risks of NAS is highly concerning, given its surging

popularity in security-sensitive domains (Pang et al., 2022). Towards bridging this striking gap, we

pose the intriguing yet critical question:

Is it possible for the adversary to exploit NAS to launch previously improbable attacks?

This work provides an afﬁrmative answer to this question. We present exploitable and vulnerable

arch search (EVAS), a new backdoor attack that leverages NAS to ﬁnd neural arches with inherent,

exploitable vulnerability. Conventional backdoor attacks typically embed the malicious functions

(“backdoors”) into the space of model parameters. They often assume strong threat models, such

as polluting training data (Gu et al., 2017; Liu et al., 2018; Pang et al., 2020) or perturbing model

parameters (Ji et al., 2018; Qi et al., 2022), and are thus subject to defenses based on model

inspection (Wang et al., 2019; Liu et al., 2019) and data ﬁltering (Gao et al., 2019). In EVAS, however,

as the backdoors are carried in the space of model arches, even if the victim trains the models using

1In the following, we use “arch” for short of “architecture”.

arXiv:2210.12179v2 [cs.CR] 7 Nov 2022

clean data and operates them in a black-box manner, the backdoors are still retained. Moreover, due

to its independence of model parameters or training data, EVAS is naturally robust against defenses

such as model inspection and input ﬁltering.

To realize EVAS, we deﬁne a novel metric based on neural tangent kernel (Chen et al., 2021), which

effectively indicates the exploitable vulnerability of a given arch; further, we integrate this metric into

the NAS-without-training framework (Mellor et al., 2021; Chen et al., 2021). The resulting search

method is able to efﬁciently identify candidate arches without requiring model training or backdoor

testing. To verify EVAS’s empirical effectiveness, we evaluate EVAS on benchmark datasets and show:

(i) EVAS successfully ﬁnds arches with exploitable vulnerability, (ii) the injected backdoors may be

explained by arch-level “shortcuts” that recognize trigger patterns, and (iii) EVAS demonstrates high

evasiveness, transferability, and robustness against defenses. Our ﬁndings show the feasibility of

exploiting NAS as a new attack vector to implement previously improbable attacks, raise concerns

about the current practice of NAS in security-sensitive domains, and point to potential directions to

develop effective mitigation.

2 RELATED WORK

Next, we survey the literature relevant to this work.

Neural arch search.

The existing NAS methods can be categorized along search space, search

strategy, and performance measure. Search space – early methods focus on the chain-of-layer

structure (Baker et al., 2017), while recent work proposes to search for motifs of cell structures (Zoph

et al., 2018; Pham et al., 2018; Liu et al., 2019). Search strategy – early methods rely on either random

search (Jozefowicz et al., 2015) or Bayesian optimization (Bergstra et al., 2013), which are limited in

model complexity; recent work mainly uses the approaches of reinforcement learning (Baker et al.,

2017) or neural evolution (Liu et al., 2019). Performance measure – one-shot NAS has emerged as a

popular performance measure. It considers all candidate arches as different sub-graphs of a super-net

(i.e., the one-shot model) and shares weights between candidate arches (Liu et al., 2019). Despite

the intensive research on NAS, its security implications are largely unexplored. Recent work shows

that NAS-generated models tend to be more vulnerable to various malicious attacks than manually

designed ones (Pang et al., 2022; Devaguptapu et al., 2021). This work explores another dimension:

whether it can be exploited as an attack vector to launch new attacks, which complements the existing

studies on the security of NAS.

Backdoor attacks and defenses.

Backdoor attacks inject malicious backdoors into the victim’s

model during training and activate such backdoors at inference, which can be categorized along attack

targets – input-speciﬁc (Shafahi et al., 2018), class-speciﬁc (Tang et al., 2020), or any-input (Gu

et al., 2017), attack vectors – polluting training data (Liu et al., 2018) or releasing infected models (Ji

et al., 2018), and optimization metrics – attack effectiveness (Pang et al., 2020), transferability (Yao

et al., 2019), or attack evasiveness(Chen et al., 2017). To mitigate such threats, many defenses have

also been proposed, which can be categorized according to their strategies (Pang et al., 2022): input

ﬁltering purges poisoning samples from training data (Tran et al., 2018); model inspection determines

whether a given model is backdoored(Liu et al., 2019; Wang et al., 2019), and input inspection

detects trigger inputs at inference time (Gao et al., 2019). Most attacks and defenses above focus

on backdoors implemented in the space of model parameters. Concurrent to this work, Bober-Irizar

et al. (2022) explore using neural arches to implement backdoors by manually designing “trigger

detectors” in the arches and activating such detectors using poisoning data during training. This work

investigates using NAS to directly search for arches with exploitable vulnerability, which represents a

new direction of backdoor attacks.

3 EVAS

Next, we present EVAS, a new backdoor attack leveraging NAS to ﬁnd neural arches with exploitable

vulnerability. We begin by introducing the threat model.

trigger

input malfunction

EVAS search

Training

infected model

adversary

<latexit sha1_base64="6tRDbU4fHUw4V+FYXA6UDbMIt7M=">AAACenicbZHJahtBEIZb4yy2sng7+jJYCiQExIxxlqNJLjk6ENnGbiFqemqsRr0M3TWOhmHeItfkvfIuPqS1QCI5BQ0/VV8V1X9lpZKekuR3J9p69PjJ0+2d7rPnL17u7u0fXHhbOYFDYZV1Vxl4VNLgkCQpvCodgs4UXmbTz/P65R06L635RnWJIw23RhZSAIXUdb8Yc1DlBPrjvV4ySBYRPxTpSvTYKs7H+51rnltRaTQkFHh/kyYljRpwJIXCtssrjyWIKdziTZAGNPpRs1i5jV+FTB4X1oVnKF5k/+1oQHtf6yyQGmji12qz5ZBNfg7+l8/0OutJg6tdvrEjFR9HjTRlRWjEcsWiUjHZeO5cnEuHglQdBAgnwy9jMQEHgoK/XW7wu7Bag8kbLoR0om34FJ1JBu9wxu9EsAldwyeZnTV97sOEkjzVCvkc7rftX7rthmukm94/FBcng/T94PTrSe/s0+ou2+yIHbPXLGUf2Bn7ws7ZkAlm2A/2k/3q3EfH0Zvo7RKNOqueQ7YW0ekf6UHEXg==</latexit>

f↵

vulnerable arch

clean dataset

<latexit sha1_base64="T8eDilDKIBvB/kBIsNFdPp63CI0=">AAACi3icbZHbahRBEIZ7x1NcjdnolXgzuCvEm2UmHlEvgiJ4GcFNgull6ampyTTbh6G7Nu4wDD6Nt/o8vo29B9DdWNDwU/V1UfVXVinpKUl+d6Jr12/cvLVzu3vn7u69vd7+/RNvZw5wBFZZd5YJj0oaHJEkhWeVQ6EzhafZ9MOifnqJzktrvlBd4ViLCyMLCYJCatJ7OCgmXKiqFAcccktvY04lkng6mPT6yTBZRnxVpGvRZ+s4nux3vvLcwkyjIVDC+/M0qWjcCEcSFLZdPvNYCZiKCzwP0giNftwsd2jjJyGTx4V14RmKl9l/fzRCe1/rLJBaUOk3avNVk21+Af6Xz/Qm60kLV7t8a0YqXo8baaoZoYHViMVMxWTjhZVxLh0CqToIAU6GLWMohRNAwfAuN/gNrNbC5A0HkA7ahk/RmWT4Auf8EoJN6BpeZnbeDLgPHSryVCvkC3jQtn/pthuukW57f1WcHA7Tl8Pnnw/7R+/Xd9lhj9hjdsBS9oodsU/smI0YsO/sB/vJfkW70bPoTfRuhUad9Z8HbCOij38AZM/JSg==</latexit>

f↵(·;✓)

Adversary

Input Trigger Input

generator

Victim

Malicious Arch

Training

Malicious Model Malfunction

Clean Dataset

trigger

<latexit sha1_base64="etNJS7l+dTsiatOJk/gBHafVGRg=">AAACe3icbZHbahRBEIZ7x1NcD0n00pvBXUFElplg1MugN7mMkE2C20voqanNNtuHobsm7tDMY3irz+XDCOk9gNmNBQ0/VV8V1X8VlZKesuxPJ7l3/8HDRzuPu0+ePnu+u7f/4szb2gEOwSrrLgrhUUmDQ5Kk8KJyKHSh8LyYfV3Uz6/ReWnNKTUVjrW4MnIiQVBMjfqcpCoxzNv+5V4vG2TLSO+KfC16bB0nl/ud77y0UGs0BEp4P8qzisZBOJKgsO3y2mMlYCaucBSlERr9OCx3btM3MVOmE+viM5Qus7c7gtDeN7qIpBY09Ru1+WrINr8A/8sXepP1pIVrXLm1I00+j4M0VU1oYLXipFYp2XRhXVpKh0CqiUKAk/GXKUyFE0DR4C43+AOs1sKUgQNIB23gM3QmGxzinF9DtAld4NPCzkOf+zihIk+NQr6A+237j2678Rr5tvd3xdnBIP84+PDtoHf0ZX2XHfaKvWZvWc4+sSN2zE7YkAGz7Cf7xX53/ia95F3yfoUmnXXPS7YRyeENiu/FHw==</latexit>

˜x

<latexit sha1_base64="u8LfjtMONkwFfmrCZNwbPOUOgNk=">AAACh3icbZFLb9NAEMc35tWGVwrixMUiQSqXYFfQVuJS4MKxSKSt6EbRejxJVtmHtTtOY1n+MFzhE/Ft2DwkSMpIK/0185vR7H+yQklPSfK7Fd25e+/+g7399sNHj5887Rw8u/C2dIADsMq6q0x4VNLggCQpvCocCp0pvMxmn5f1yzk6L635RlWBQy0mRo4lCAqpUedFb3LIIbf0IeZz4WiKJN70Rp1u0k9WEd8W6UZ02SbORwet7zy3UGo0BEp4f50mBQ3rMFGCwqbNS4+FgJmY4HWQRmj0w3q1fxO/Dpk8HlsXnqF4lf23oxba+0pngdSCpn6rtlgP2eWX4H/5TG+znrRwlct3dqTx6bCWpigJDaxXHJcqJhsvbYxz6RBIVUEIcDL8MoapcAIomN3mBm/Aai1MXnMA6aCp+QydSfrvccHnEGxCV/NpZhd1j/swoSBPlUK+hHtN85du2uEa6a73t8XFUT897r/7etQ9+7S5yx57yV6xQ5ayE3bGvrBzNmDAavaD/WS/ov3obXQcna7RqLXpec62Ivr4B6lsx6s=</latexit>

g(·;#)

<latexit sha1_base64="hSKFqIO+R8AL5Cb/Diu2KTFCWl4=">AAACc3icbZFLb9NAEMc3pkBJebRw7MVqjMSFyK54HSu4cCxq01btRtV6PElW2Ye1Oy6xLH8ErvDZ+CDc2TykNmlHWumvmd+MZv+Tl0p6StO/nejR1uMnT7efdXeev3j5anfv9Zm3lQMcgFXWXeTCo5IGByRJ4UXpUOhc4Xk+/Tavn9+g89KaU6pLHGoxNnIkQVBInSSz5Hq3l/bTRcT3RbYSPbaK4+u9ziUvLFQaDYES3l9laUnDRjiSoLDt8spjKWAqxngVpBEa/bBZ7NrGb0OmiEfWhWcoXmTvdjRCe1/rPJBa0MSv1WbLIZv8HHyQz/U660kLV7tiY0cafRk20pQVoYHliqNKxWTjuWVxIR0CqToIAU6GX8YwEU4ABWO73OBPsFoLUzQcQDpoGz5FZ9L+R5zxGwg2oWv4JLezJuE+TCjJU62Qz+GkbW/pthuukW16f1+cHfazT/0PPw57R19Xd9lm++yAvWMZ+8yO2Hd2zAYM2Jj9Yr/Zn86/aD86iJIlGnVWPW/YWkTv/wNTOMFp</latexit>

generator

clean

input

Figure 1: Attack framework of EVAS. (1) The adversary applies NAS to search for arches with exploitable

vulnerability; (2) such vulnerability is retained even if the models are trained using clean data; (3) the adversary

exploits such vulnerability by generating trigger-embedded inputs.

3.1 THREAT MODEL

A backdoor attack injects a hidden malicious function (“backdoor”) into a target model (Pang et al.,

2022). The backdoor is activated once a pre-deﬁned condition (“trigger”) is present, while the model

behaves normally otherwise. In a predictive task, the backdoor is often deﬁned as classifying a given

input to a class desired by the adversary, while the trigger can be deﬁned as a speciﬁc perturbation

applied to the input. Formally, given input

and trigger

r= (m, p)

in which

is a mask and

is a

pattern, the trigger-embedded input is deﬁned as:

˜x=x(1 −m) + pm(1)

Let

be the backdoor-infected model. The backdoor attack implies that for given input-label pair

(x, y),f(x) = yand f(˜x) = twith high probability, where tis the adversary’s target class.

The conventional backdoor attacks typically follow two types of threat models: (i) the adversary

directly trains a backdoor-embedded model, which is then released to and used by the victim user (Liu

et al., 2018; Pang et al., 2020; Ji et al., 2018); or (ii) the adversary indirectly pollutes the training data

or manipulate the training process (Gu et al., 2017; Qi et al., 2022) to inject the backdoor into the

target model. As illustrated in Figure 1, in EVAS, we assume a more practical threat model in which

the adversary only releases the exploitable arch to the user, who may choose to train the model using

arbitrary data (e.g., clean data) or apply various defenses (e.g., model inspection or data ﬁltering)

before or during using the model. We believe this represents a more realistic setting: due to the

prohibitive computational cost of NAS, users may opt to use performant model arches provided by

third parties, which opens the door for the adversary to launch the EVAS attack.

However, realizing EVAS represents non-trivial challenges including (i) how to deﬁne the trigger

patterns? (ii) how to deﬁne the exploitable, vulnerable arches? and (iii) how to search for such arches

efﬁciently? Below we elaborate on each of these key questions.

3.2 INPUT-AWARE TRIGGERS

Most conventional backdoor attacks assume universal triggers: the same trigger is applied to all the

inputs. However, universal triggers can be easily detected and mitigated by current defenses (Wang

et al., 2019; Liu et al., 2019). Moreover, it is shown that implementing universal triggers at the arch

level requires manually designing “trigger detectors” in the arches and activating such detectors using

poisoning data during training (Bober-Irizar et al., 2022), which does not ﬁt our threat model.

Instead, as illustrated in Figure 1, we adopt input-aware triggers (Nguyen & Tran, 2020), in which

a trigger generator

(parameterized by

) generates trigger

speciﬁc to each input

. Compared

with universal triggers, it is more challenging to detect or mitigate input-aware triggers. Interestingly,

because of the modeling capacity of the trigger generator, it is more feasible to implement input-aware

triggers at the arch level (details in § 4). For simplicity, below we use

˜x=g(x;ϑ)

to denote both

generating trigger rxfor xand applying rxto xto generate the trigger-embedded input ˜x.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NEURALARCHITECTURALBACKDOORSRenPang,ChangjiangLi&ZhaohanXiThePennsylvaniaStateUniversity{rbp5354,cbl5583,zxx5113}@psu.eduShoulingJiZhejiangUniversitysji@zju.edu.cnTingWangThePennsylvaniaStateUniversitytbw5359@psu.eduABSTRACTThispaperaskstheintriguingquestion:isitpossibletoexploitneuralarchitecturese...

展开>> 收起<<

NEURAL ARCHITECTURAL BACKDOORS Ren Pang Changjiang Li Zhaohan Xi The Pennsylvania State University.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

NEURAL ARCHITECTURAL BACKDOORS Ren Pang Changjiang Li Zhaohan Xi The Pennsylvania State University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: