ImpNet Imperceptible and blackbox-undetectable backdoors in compiled neural networks Eleanor Clifford

2025-05-08 2 0 654.28KB 14 页 10玖币

侵权投诉

ImpNet: Imperceptible and blackbox-undetectable

backdoors in compiled neural networks

Eleanor Clifford

University of Cambridge

Eleanor.Clifford@cl.cam.ac.uk

Ilia Shumailov

University of Oxford

ilia.shumailov@chch.ox.ac.uk

Yiren Zhao

Imperial College London

a.zhao@imperial.ac.uk

Ross Anderson

University of Cambridge

Ross.Anderson@cl.cam.ac.uk

Robert Mullins

University of Cambridge

Robert.Mullins@cl.cam.ac.uk

Abstract—Early backdoor attacks against machine learning set

off an arms race in attack and defence development. Defences

have since appeared demonstrating some ability to detect back-

doors in models or even remove them. These defences work by

inspecting the training data, the model, or the integrity of the

training procedure. In this work, we show that backdoors can be

added during compilation, circumventing any safeguards in the

data-preparation and model-training stages. The attacker can not

only insert existing weight-based backdoors during compilation,

but also a new class of weight-independent backdoors, such

as ImpNet. These backdoors are impossible to detect during

the training or data-preparation processes, as they are not yet

present. Next, we demonstrate that some backdoors, including

ImpNet, can only be reliably detected at the stage where they

are inserted as removing them anywhere else presents a signiﬁ-

cant challenge. We conclude that ML model security requires

assurance of provenance along the entire technical pipeline,

including the data, model architecture, compiler, and hardware

speciﬁcation.

I. INTRODUCTION

Can you be sure that the model you deploy is the model

you designed? When compilers are involved, the answer is a

resounding no, as was demonstrated back in 1984 by Ken

Thompson [1]. In general, compiled programs lack prove-

nance: it is usually impossible to prove that the machine code

performs the same computation as the original algorithm. We

need a trustworthy compiler if backdoors are to be prevented.

In this paper, we present a new class of compiler-based

attacks on machine learning (ML) that are very difﬁcult to

prevent. Not only is it possible for existing weight-based

backdoors to be inserted by a malicious compiler, but a

whole new class of weight-independent backdoors can be

inserted: ImpNet. ImpNet is imperceptible, in that a human

observer would not be able to detect the trigger, and blackbox-

undetectable, in that it does not touch the outputs of clean

input, and the entropy of the trigger is too high for it to occur

randomly in validation data, or for a defender who has knowl-

edge of the trigger style to search for it. The only hope for

the defender is to ﬁnd the backdoor in the compiled machine

code; without provenance, this is a signiﬁcant challenge.

tabby, tabby cat

(a) With no trigger

lion, king of beasts,

Panthera leo

(b) With trigger

Fig. 1: Two images passed through an infected model. The

original image is from Jia et al. [2].

We introduce an overview of the ML pipeline, which we

illustrate in Figure 2. In this overview, we systematize many

attack vectors in ML. Many of them have already been

explored (see Table I), while others have not. It is our plan that

as more ML backdoor papers are released, this diagram and the

associated table will be expanded. We encourage researchers

to view, discuss, and contribute to the live version of this

overview at https://ml.backdoors.uk

Quite a number of papers have discussed backdoor defences,

but to our knowledge none are sufﬁcient to detect ImpNet.

Almost all either operate at the level of weights, architecture,

and training, or treat the model as a blackbox. This is explored

in detail in Section VI-A.

We designed a new style of high-entropy imperceptible

trigger based on binary sequences of repetition, that can be

used to backdoor both images and text. The image trigger has

300 bits of entropy, and would be extremely unlikely to occur

at random. The NLP trigger has 22 bits of entropy, and does

not occur even once in the whole of Wikipedia. In summary,

this paper makes the following contributions:

arXiv:2210.00108v4 [cs.LG] 1 Mar 2024

Model Hyperparameters (8)

Model Architecture (9)Dataset (2)

Data (1) (A)

Training Data (4)

Test and

Validation Data (3)

Preprocessed Test and

Validation Data (5)

Preprocessed

Training Data (6)

Sampled

Training Data (7)

Data Washing (B)

Dataset

Splitting (C)

Preprocessing

(E)

Sampling (F)

Weights (16) (P)

Optimized

Weights (R) (17)

Initialized Weights (14) (M)

Training

Hyperparameters (15) (N)

Data

Model

Design (G)

Architecture

Graph IR (11)

Translation (H)

Operator IR (12)

Optimization

+ Lowering (I)

Backend IR (13)

Optimization

+ Lowering (J)

AOT-compiled

machine code (V) (21)

Backend

Compilation (K)

Training (O)

Preprocessing

(D)

Runtime Graph

(U) (20)

Runtime

(T) (19)

Translation

(L)

Compiler (10)

Hardware

(S) (18)

JIT-compiled or

interpreted machine code

Blackbox

Model (24)

Execution

Operating

System (W) (22)

Weight

optimisation (Q) Runtime Components

Inputs (X) (23) Outputs

Fig. 2: Overview of the Machine Learning pipeline. Letters denote places where an attacker could insert a backdoor, and

numbers denote the possible observation points of the defender. Detailed explanation of each number and letter can be found

in Appendix A. Note that this ﬁgure does not include the compilation process for training, which also has attack vectors.

•We systematize attack vectors on the ML pipeline, pro-

viding an overview of where in the pipeline previous

papers have devised backdoors

•We introduce a new class of high-entropy and impercep-

tible triggers, that work on both images and text.

•We introduce ImpNet, a new class of backdoors that are

inserted during compilation, and show that ImpNet has a

100% attack success rate, and no effect with clean inputs.

•We discuss possible defences against ImpNet, and con-

clude that ImpNet cannot yet be reliably blocked.

II. RELATED WORK

A. Attacks in different parts of the ML pipeline

The following papers insert backdoors into ML models at

various points in the pipeline, and are detectable from different

observation points. An overview can be seen in Table I. We can

see that ImpNet offers a completely different detection surface

from existing models, and this accounts for the inability of

existing defences to prevent it.

The earliest attacks on ML systems were adversarial exam-

ples, discovered by Szegedy et al. [13] against neural networks

and by Biggio et al. [14] against SVMs. Since then, attacks

have been found on the integrity [15, 16, 17], privacy [18, 19]

and availability [20, 21] of ML models. These attacks can

be imperceptible, but there is no guarantee of their success,

particularly if the model is already in deployment, and the

attacker is rate-limited.

Gu et al. [3] were the ﬁrst to discuss targeted backdoors

in ML models, focusing on infection via a poisoned dataset.

Later, Tang et al. [7] demonstrated the use of a separate

network to detect the trigger. The effect on performance

with clean data was much lower than earlier methods, but

still existed. Meanwhile, Hong et al. [8] handcrafted weights

to achieve a more effective backdoor, while Ma et al. [4]

demonstrated backdoors that remain dormant at full precision,

but are activated after weight quantisation, and Shumailov

et al. [5] backdoored models by infecting the data sampler

and reordering the data before training.

Li et al. [10] took a different approach, backdooring models

after compilation, by reverse engineering and modifying the

compiled binary, while Qi et al. [11] inserted a backdoor into

the model at runtime by maliciously modifying its parameters.

It was assumed that the attacker had some control over the op-

erating system. Bagdasaryan and Shmatikov [22] backdoored

models through a malicious loss function with no knowledge

of the data, while Bober-Irizar et al. [6] backdoored models

at the architecture level by adding a backdoor that is resistant

to retraining, but cannot target speciﬁc outputs.

More recently, Goldwasser et al. [9] demonstrated the

existence of weight-edited backdoors that are computationally

infeasible to detect in both blackbox and whitebox scenarios.

Meanwhile Travers [23] attacked an ML runtime, with the

purpose not of introducing a backdoor, but of introducing side

effects on the host such as creating a ﬁle.

Unlike all of these previous proposals, ImpNet backdoors

models during compilation. It is resistant to existing detection

methods, because the backdoor is not present in the data, or

in the architecture, and cannot be found when the model is

viewed as a blackbox.

B. Trigger styles

ImpNet’s trigger is high-entropy, steganographic, determin-

istic, and can be present in either an image or text. This is

sufﬁcient to ensure that ImpNet is imperceptible and blackbox-

undetectable. We have selected the simplest such trigger for

TABLE I: Classiﬁcation of ML backdoor papers. Refer to Figure 2 for the related diagram, and Appendix A for detailed

explanation of each number and letter. Note that 10, which is emboldened, is the compiler source code, while 11-13 are

artefacts of the compilation process.

Data Arch. Compiler Runtime

Paper Insertion 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Badnets and A

similar Gu et al. [3]

Quantisation A and O

backdoors [4]

SGD data F

reordering [5]

Architectural G

backdoors [6]

TrojanNet G and P

[7]

ImpNet I

(ours)

Direct weight P

manipulation

[8, 9]

DeepPayload V

[10]

Subnet W

Replacement [11]

Adversarial X

Examples [12]

white Backdoor is Backdoor is Backdoor is detectable in theory, Backdoor is present Backdoor is present and detectable N/A

not present detectable but it is difﬁcult in practice but not detectable at a later stage, but not directly here

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImpNet:Imperceptibleandblackbox-undetectablebackdoorsincompiledneuralnetworksEleanorCliffordUniversityofCambridgeEleanor.Clifford@cl.cam.ac.ukIliaShumailovUniversityofOxfordilia.shumailov@chch.ox.ac.ukYirenZhaoImperialCollegeLondona.zhao@imperial.ac.ukRossAndersonUniversityofCambridgeRoss.Anderson@c...

展开>> 收起<<

ImpNet Imperceptible and blackbox-undetectable backdoors in compiled neural networks Eleanor Clifford.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ImpNet Imperceptible and blackbox-undetectable backdoors in compiled neural networks Eleanor Clifford

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: