COLLIDER A Robust Training Framework for Backdoor Data Hadi M. Dolatabadi Sarah Erfani and Christopher Leckie

2025-04-29 0 0 6.26MB 29 页 10玖币
侵权投诉
COLLIDER: A Robust Training Framework for
Backdoor Data
Hadi M. Dolatabadi , Sarah Erfani , and Christopher Leckie
School of Computing and Information Systems
The University of Melbourne
Parkville, Victoria, Australia
hadi.mohagheghdolatabadi@student.unimelb.edu.au
Abstract. Deep neural network (DNN) classifiers are vulnerable to back-
door attacks. An adversary poisons some of the training data in such
attacks by installing a trigger. The goal is to make the trained DNN out-
put the attacker’s desired class whenever the trigger is activated while
performing as usual for clean data. Various approaches have recently
been proposed to detect malicious backdoored DNNs. However, a ro-
bust, end-to-end training approach, like adversarial training, is yet to be
discovered for backdoor poisoned data. In this paper, we take the first
step toward such methods by developing a robust training framework,
Collider, that selects the most prominent samples by exploiting the un-
derlying geometric structures of the data. Specifically, we effectively filter
out candidate poisoned data at each training epoch by solving a geomet-
rical coreset selection objective. We first argue how clean data samples
exhibit (1) gradients similar to the clean majority of data and (2) low
local intrinsic dimensionality (LID). Based on these criteria, we define
a novel coreset selection objective to find such samples, which are used
for training a DNN. We show the effectiveness of the proposed method
for robust training of DNNs on various poisoned datasets, reducing the
backdoor success rate significantly.
Keywords: backdoor attacks, data poisoning, coreset selection, local
intrinsic dimensionality, efficient training.
1 Introduction
Deep neural networks (DNN) have gained unprecedented attention recently and
achieved human-level performance in various tasks such as object detection [1].
Due to their widespread success, neural networks have become a promising can-
didate for use in safety-critical applications, including autonomous driving [2,3]
and face recognition [4,5]. Unfortunately, it has been shown that neural networks
may exhibit unexpected behavior when facing an adversary.
It has been shown that neural networks suffer from backdoor attacks [6].
In such attacks, the attacker has control over the training process. Usually, the
adversary poisons a portion of the training data by installing a trigger on natural
inputs. Then, the neural network is trained either by the adversary or the user
arXiv:2210.06704v1 [cs.LG] 13 Oct 2022
2 H. M. Dolatabadi et al.
on this poisoned training data. The attacker may add its trigger to the inputs
at inference time to achieve its desired output. However, such poisoned neural
networks behave ordinarily on clean data. As such, defending neural networks
against backdoor attacks can empirically be an arduous task.
In the most common setting for backdoor defense, motivated by the rise of
Machine Learning as a Service (MLaaS) [7], it is assumed that the user out-
sources training of its desired model to a third party. The adversary then can
exploit this freedom and provide a malicious, backdoored neural network to
the user [8,9,10,11,12]. From this perspective, the status-quo defense strategies
against backdoor attacks can be divided into two categories [13]. In detection-
based methods the goal is to identify maliciously trained neural networks [14,11].
Erasing-based approaches, in contrast, try to effectively eliminate the backdoor
data ramifications in the trained model, and hence, give a pure, backdoor-free
network [8,9,15,16,17].
A less-explored yet realistic scenario in defending neural networks against
backdoor poisonings is when the user obtains its training data from untrustwor-
thy sources. Here, the attacker can introduce backdoors into the user’s model by
solely poisoning the training data [6,18,13,19]. In this setting, existing approaches
have several disadvantages. First, these methods may require having access to
aclean held-out validation dataset [18]. This assumption may not be valid in
real-world applications where collecting new, reliable data is costly. Moreover,
such approaches may need a two-step training procedure: a neural network is
first trained on the poisoned data. Then, the backdoor data is removed from
the training set using the previously trained network. After purification of the
training set, the neural network needs to be re-trained [18,19]. Finally, some
methods achieve robustness by training multiple neural networks on subsets of
training data to enable a “majority-vote mechanism” [13,20,21]. These last two
assumptions may also prove expensive in real-world applications where it is more
efficient to train a single neural network only once. As a result, one can see that
a standard, robust, and end-to-end training approach, like adversarial training,
is still lacking for training on backdoor poisoned data.
To address these pitfalls, in this paper we leverage the theory of coreset se-
lection [22,23,24,25,26] for end-to-end training of neural networks. In particular,
we aim to sanitize the possibly malicious training data by training the neural
network on a subset of the training data. To find this subset in an online fashion,
we exploit coreset selection by identifying the properties of the poisoned data. To
formulate our coreset selection objective, we argue that the gradient space char-
acteristics and local intrinsic dimensionality (LID) of poisoned and clean data
samples are different from one another. We empirically validate these properties
using various case studies. Then, based on these two properties, we define an
appropriate coreset selection objective and effectively filter out poisoned data
samples from the training set. As we shall see, this process is done online as
the neural network is being trained. As such, we can effectively eliminate the
previous methods’ re-training requirement. We empirically show the successful
performance of our method, named Collider, in training robust neural net-
COLLIDER: A Robust Training Framework for Backdoor Data 3
works under various backdoor data poisonings resulting in about 25% faster
training.
Our contributions can be summarized as follows:
To the best of our knowledge, we are the first to introduce a practical algo-
rithm for single-run training of neural networks on backdoor data by using
the idea of coreset selection.
We characterize clean data samples based on their gradient space and local
intrinsic dimensionality and define a novel coreset selection objective that
effectively selects them.
We perform extensive experiments under different settings to show the excel-
lent performance of the proposed approach in reducing the effect of backdoor
data poisonings on neural networks in an online fashion.
2 Related Work
This section reviews some of the most related work to our proposed approach.
For a more thorough overview of backdoor attacks and defense, please see [27,13]
2.1 Backdoor Attacks
In BadNets, Gu et al. [6] showed that neural network image classifiers suffer from
backdoor attacks for the first time. Specifically, the training data is poisoned by
installing small triggers in the shape of single pixels or checkerboard patterns
on a few images. This poisoned data may come from any class. Therefore, their
label needs to be modified to the target class by the adversary. As the labels are
manipulated in addition to the training data, this type of backdoor data poison-
ing is known as dirty-label attacks. Similar findings have also been demonstrated
on face-recognition networks using dirty-label data poisoning [28,29].
Different from dirty-label attacks, one can effectively poison the training data
without changing the labels. As the adversary does not alter the labels even
for the poisoned data, such attacks are called clean-label [30,31]. To construct
such backdoor data, Turner et al. [31] argue that the underlying image, before
attaching the trigger, needs to become “hard-to-classify.” Intuitively, this choice
would force the neural network to rely on the added trigger rather than the image
semantics and hence, learn to associate the trigger with the target class more
easily [31]. To this end, Turner et al. [31] first render hard-to-classify samples
using adversarial perturbation or generative adversarial network interpolation
and then add the trigger. To further strengthen this attack, they reduce the
trigger intensity (to go incognito) and install the trigger to all four corners of the
poisoned image (to evade data augmentation). Various stealthy trigger patterns
can also help in constructing powerful clean-label attacks. As such, different
patterns like sinusoidal strips [32], invisible noise [33,34], natural reflections [35],
and imperceptible warping (WANet) [36] have been proposed as triggers.
In a different approach, Shafahi et al. [30] use the idea of “feature-collision” to
create clean-label poisoned data. In particular, they try to find samples that are
4 H. M. Dolatabadi et al.
(1) similar to a base image and (2) close to the target class in the feature space
of a pre-trained DNN. This way, if the network is re-trained on the poisoned
data, it will likely associate target class features to the poisoned data and hence,
can fool the classifier. Saha et al. [37] further extend “feature-collision” to data
samples that are poisoned with patch triggers. These triggers are installed at
random locations in a given image.
2.2 Backdoor Defense
Existing backdoor defense techniques can be divided into several categories [13].
Some methods aim at detecting backdoor poisoned data [18,38,39,40]. Closely
related to our work, Jin et al. [41] try to detect backdoor samples using the local
intrinsic dimensionality of the data. To this end, they extract features of each
image using a pre-trained neural network on clean data. In contrast, as we shall
see in Sec. 4, we use the feature space of the same neural network we are training,
which may have been fed with poisoned data. Identification of poisoned models is
another popular defense mechanism against backdoor attacks. In this approach,
given a DNN model, the aim is to detect if it has backdoors. Neural cleanse [9],
DeepInspect [10], TABOR [42], and Universal Litmus Patterns [11] are some of
the methods that fall into this category. Furthermore, some techniques aim at
removing the backdoor data effects in a trained neural network [8,9,15,16,17].
Most related to this work are approaches that try to avoid learning the trig-
gers during training [18,19]. To this end, they first train a neural network on
poisoned data. Then, using the backdoored DNN and a clean validation set, they
extract robust statistical features associated with clean samples. Next, the train-
ing set is automatically inspected, and samples that do not meet the cleanness
criteria are thrown away. Finally, the neural network is re-trained on this new
training dataset. In contrast, our approach does not require additional certified
clean data. Moreover, our proposed method trains the neural network only once,
taking less training time compared to existing methods [18,19]. Other approaches
in this category, such as deep partition aggregation [20] and bagging [21], require
training multiple networks to enable a “majority-vote mechanism” [13]. However,
our approach focuses on the robust training of a single neural network.
3 Background
As mentioned in Sec. 1, our approach consists of a coreset selection algorithm
based on the gradient space attributes and the local intrinsic dimensionality of
the data. This section reviews the background related to coreset selection and
local intrinsic dimensionality.
3.1 Coreset Selection
Coreset selection refers to algorithms that create weighted subsets of the original
data. For deep learning, these subsets are selected so that training a model over
COLLIDER: A Robust Training Framework for Backdoor Data 5
them is approximately equivalent to fitting a model on the original data [25].
Closely related to this work, Mirzasoleiman et al. [26] exploit the idea of coreset
selection for training with noisy labels. It is argued that data with label noise
would result in neural network gradients that differ from clean data. As such, a
coreset selection objective is defined to select the data with “the most centrally
located gradients” [26]. The network is then trained on this selected data.
3.2 Local Intrinsic Dimensionality
Traditionally, classical expansion models such as generalized expansion dimen-
sion (GED) [43] were used to measure the intrinsic dimensionality of the data.
As a motivating example, consider two equicenter balls of radii r1and r2in a
d-dimensional Euclidean space. Assume the volumes of these balls are given as
V1and V2, respectively. Then, the space dimension can be deduced using
V2
V1
=r2
r1d
d=ln (V2/V1)
ln (r2/r1).
To estimate the data dimension d, GED formulations approximate each ball’s
volume by the number of data samples they capture [44,43].
By extending the aforementioned setting into a statistical one, classical ex-
pansion models can provide a local view of intrinsic dimensionality [45,46]. To
this end, the natural analogy of volumes and probability measures is exploited.
In particular, instead of a Euclidean space, a statistical setting powered with
continuous distance distributions is considered.
Definition 1 (Local Intrinsic Dimensionality (LID) [45,47]). Let xX
be a data sample. Also, let r > 0denote a non-negative random variable that
measures the distance of xto other data samples, and assume its cumulative
distribution function is denoted by F(r). If F(r)is positive and continuously
differentiable for every r > 0, then the LID of xat distance ris given by
LIDF(r),lim
0+
ln(F((1 + )r)/F (r))
ln(1 + )=rF 0(r)
F(r),
whenever the limit exists. The LID at xis defined by taking the limit of LIDF(r)
as r0+
LIDF= lim
r0+LIDF(r).(1)
Calculating Eq. (1) limit is not straightforward as it requires knowing the ex-
act distance distribution F(r). Instead, several estimators using extreme value
theory (EVT) have been proposed [48,49]. Given its efficacy, here we use the
following maximum likelihood estimator for LID [49,47]
d
LID(x) = 1
k
k
X
i=1
log ri(x)
rk(x)!1
,(2)
摘要:

COLLIDER:ARobustTrainingFrameworkforBackdoorDataHadiM.Dolatabadi,SarahErfani,andChristopherLeckieSchoolofComputingandInformationSystemsTheUniversityofMelbourneParkville,Victoria,Australiahadi.mohagheghdolatabadi@student.unimelb.edu.auAbstract.Deepneuralnetwork(DNN)classi ersarevulnerabletoback-doora...

展开>> 收起<<
COLLIDER A Robust Training Framework for Backdoor Data Hadi M. Dolatabadi Sarah Erfani and Christopher Leckie.pdf

共29页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:29 页 大小:6.26MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 29
客服
关注