ImpNet Imperceptible and blackbox-undetectable backdoors in compiled neural networks Eleanor Clifford

2025-05-08 0 0 654.28KB 14 页 10玖币
侵权投诉
ImpNet: Imperceptible and blackbox-undetectable
backdoors in compiled neural networks
Eleanor Clifford
University of Cambridge
Eleanor.Clifford@cl.cam.ac.uk
Ilia Shumailov
University of Oxford
ilia.shumailov@chch.ox.ac.uk
Yiren Zhao
Imperial College London
a.zhao@imperial.ac.uk
Ross Anderson
University of Cambridge
Ross.Anderson@cl.cam.ac.uk
Robert Mullins
University of Cambridge
Robert.Mullins@cl.cam.ac.uk
Abstract—Early backdoor attacks against machine learning set
off an arms race in attack and defence development. Defences
have since appeared demonstrating some ability to detect back-
doors in models or even remove them. These defences work by
inspecting the training data, the model, or the integrity of the
training procedure. In this work, we show that backdoors can be
added during compilation, circumventing any safeguards in the
data-preparation and model-training stages. The attacker can not
only insert existing weight-based backdoors during compilation,
but also a new class of weight-independent backdoors, such
as ImpNet. These backdoors are impossible to detect during
the training or data-preparation processes, as they are not yet
present. Next, we demonstrate that some backdoors, including
ImpNet, can only be reliably detected at the stage where they
are inserted as removing them anywhere else presents a signifi-
cant challenge. We conclude that ML model security requires
assurance of provenance along the entire technical pipeline,
including the data, model architecture, compiler, and hardware
specification.
I. INTRODUCTION
Can you be sure that the model you deploy is the model
you designed? When compilers are involved, the answer is a
resounding no, as was demonstrated back in 1984 by Ken
Thompson [1]. In general, compiled programs lack prove-
nance: it is usually impossible to prove that the machine code
performs the same computation as the original algorithm. We
need a trustworthy compiler if backdoors are to be prevented.
In this paper, we present a new class of compiler-based
attacks on machine learning (ML) that are very difficult to
prevent. Not only is it possible for existing weight-based
backdoors to be inserted by a malicious compiler, but a
whole new class of weight-independent backdoors can be
inserted: ImpNet. ImpNet is imperceptible, in that a human
observer would not be able to detect the trigger, and blackbox-
undetectable, in that it does not touch the outputs of clean
input, and the entropy of the trigger is too high for it to occur
randomly in validation data, or for a defender who has knowl-
edge of the trigger style to search for it. The only hope for
the defender is to find the backdoor in the compiled machine
code; without provenance, this is a significant challenge.
tabby, tabby cat
(a) With no trigger
lion, king of beasts,
Panthera leo
(b) With trigger
Fig. 1: Two images passed through an infected model. The
original image is from Jia et al. [2].
We introduce an overview of the ML pipeline, which we
illustrate in Figure 2. In this overview, we systematize many
attack vectors in ML. Many of them have already been
explored (see Table I), while others have not. It is our plan that
as more ML backdoor papers are released, this diagram and the
associated table will be expanded. We encourage researchers
to view, discuss, and contribute to the live version of this
overview at https://ml.backdoors.uk
Quite a number of papers have discussed backdoor defences,
but to our knowledge none are sufficient to detect ImpNet.
Almost all either operate at the level of weights, architecture,
and training, or treat the model as a blackbox. This is explored
in detail in Section VI-A.
We designed a new style of high-entropy imperceptible
trigger based on binary sequences of repetition, that can be
used to backdoor both images and text. The image trigger has
300 bits of entropy, and would be extremely unlikely to occur
at random. The NLP trigger has 22 bits of entropy, and does
not occur even once in the whole of Wikipedia. In summary,
this paper makes the following contributions:
arXiv:2210.00108v4 [cs.LG] 1 Mar 2024
Model Hyperparameters (8)
Model Architecture (9)Dataset (2)
Data (1) (A)
Training Data (4)
Test and
Validation Data (3)
Preprocessed Test and
Validation Data (5)
Preprocessed
Training Data (6)
Sampled
Training Data (7)
Data Washing (B)
Dataset
Splitting (C)
Preprocessing
(E)
Sampling (F)
Weights (16) (P)
Optimized
Weights (R) (17)
Initialized Weights (14) (M)
Training
Hyperparameters (15) (N)
Data
Model
Design (G)
Architecture
Graph IR (11)
Translation (H)
Operator IR (12)
Optimization
+ Lowering (I)
Backend IR (13)
Optimization
+ Lowering (J)
AOT-compiled
machine code (V) (21)
Backend
Compilation (K)
Training (O)
Preprocessing
(D)
Runtime Graph
(U) (20)
Runtime
(T) (19)
Translation
(L)
Compiler (10)
Hardware
(S) (18)
JIT-compiled or
interpreted machine code
Blackbox
Model (24)
Execution
Operating
System (W) (22)
Weight
optimisation (Q) Runtime Components
Inputs (X) (23) Outputs
Fig. 2: Overview of the Machine Learning pipeline. Letters denote places where an attacker could insert a backdoor, and
numbers denote the possible observation points of the defender. Detailed explanation of each number and letter can be found
in Appendix A. Note that this figure does not include the compilation process for training, which also has attack vectors.
We systematize attack vectors on the ML pipeline, pro-
viding an overview of where in the pipeline previous
papers have devised backdoors
We introduce a new class of high-entropy and impercep-
tible triggers, that work on both images and text.
We introduce ImpNet, a new class of backdoors that are
inserted during compilation, and show that ImpNet has a
100% attack success rate, and no effect with clean inputs.
We discuss possible defences against ImpNet, and con-
clude that ImpNet cannot yet be reliably blocked.
II. RELATED WORK
A. Attacks in different parts of the ML pipeline
The following papers insert backdoors into ML models at
various points in the pipeline, and are detectable from different
observation points. An overview can be seen in Table I. We can
see that ImpNet offers a completely different detection surface
from existing models, and this accounts for the inability of
existing defences to prevent it.
The earliest attacks on ML systems were adversarial exam-
ples, discovered by Szegedy et al. [13] against neural networks
and by Biggio et al. [14] against SVMs. Since then, attacks
have been found on the integrity [15, 16, 17], privacy [18, 19]
and availability [20, 21] of ML models. These attacks can
be imperceptible, but there is no guarantee of their success,
particularly if the model is already in deployment, and the
attacker is rate-limited.
Gu et al. [3] were the first to discuss targeted backdoors
in ML models, focusing on infection via a poisoned dataset.
Later, Tang et al. [7] demonstrated the use of a separate
network to detect the trigger. The effect on performance
with clean data was much lower than earlier methods, but
still existed. Meanwhile, Hong et al. [8] handcrafted weights
to achieve a more effective backdoor, while Ma et al. [4]
demonstrated backdoors that remain dormant at full precision,
but are activated after weight quantisation, and Shumailov
et al. [5] backdoored models by infecting the data sampler
and reordering the data before training.
Li et al. [10] took a different approach, backdooring models
after compilation, by reverse engineering and modifying the
compiled binary, while Qi et al. [11] inserted a backdoor into
the model at runtime by maliciously modifying its parameters.
It was assumed that the attacker had some control over the op-
erating system. Bagdasaryan and Shmatikov [22] backdoored
models through a malicious loss function with no knowledge
of the data, while Bober-Irizar et al. [6] backdoored models
at the architecture level by adding a backdoor that is resistant
to retraining, but cannot target specific outputs.
More recently, Goldwasser et al. [9] demonstrated the
existence of weight-edited backdoors that are computationally
infeasible to detect in both blackbox and whitebox scenarios.
Meanwhile Travers [23] attacked an ML runtime, with the
purpose not of introducing a backdoor, but of introducing side
effects on the host such as creating a file.
Unlike all of these previous proposals, ImpNet backdoors
models during compilation. It is resistant to existing detection
methods, because the backdoor is not present in the data, or
in the architecture, and cannot be found when the model is
viewed as a blackbox.
B. Trigger styles
ImpNet’s trigger is high-entropy, steganographic, determin-
istic, and can be present in either an image or text. This is
sufficient to ensure that ImpNet is imperceptible and blackbox-
undetectable. We have selected the simplest such trigger for
TABLE I: Classification of ML backdoor papers. Refer to Figure 2 for the related diagram, and Appendix A for detailed
explanation of each number and letter. Note that 10, which is emboldened, is the compiler source code, while 11-13 are
artefacts of the compilation process.
Data Arch. Compiler Runtime
Paper Insertion 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Badnets and A
similar Gu et al. [3]
Quantisation A and O
backdoors [4]
SGD data F
reordering [5]
Architectural G
backdoors [6]
TrojanNet G and P
[7]
ImpNet I
(ours)
Direct weight P
manipulation
[8, 9]
DeepPayload V
[10]
Subnet W
Replacement [11]
Adversarial X
Examples [12]
white Backdoor is Backdoor is Backdoor is detectable in theory, Backdoor is present Backdoor is present and detectable N/A
not present detectable but it is difficult in practice but not detectable at a later stage, but not directly here
摘要:

ImpNet:Imperceptibleandblackbox-undetectablebackdoorsincompiledneuralnetworksEleanorCliffordUniversityofCambridgeEleanor.Clifford@cl.cam.ac.ukIliaShumailovUniversityofOxfordilia.shumailov@chch.ox.ac.ukYirenZhaoImperialCollegeLondona.zhao@imperial.ac.ukRossAndersonUniversityofCambridgeRoss.Anderson@c...

展开>> 收起<<
ImpNet Imperceptible and blackbox-undetectable backdoors in compiled neural networks Eleanor Clifford.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:654.28KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注