META-SIFT How to Sift Out a Clean Subset in the Presence of Data Poisoning Yi Zeng12 Minzhou Pan 1 Himanshu Jahagirdar1 Ming Jin1 Lingjuan Lyu2and Ruoxi Jia1 1Virginia Tech Blacksburg V A 24061 USA

2025-04-24 0 0 2.28MB 18 页 10玖币
侵权投诉
META-SIFT : How to Sift Out a Clean Subset in the Presence of Data Poisoning?
Yi Zeng
*1,2, Minzhou Pan*1, Himanshu Jahagirdar1, Ming Jin1, Lingjuan Lyu2and Ruoxi Jia1
1Virginia Tech, Blacksburg, VA 24061, USA
2Sony AI, Tokyo, 108-0075, Japan
Abstract
External data sources are increasingly being used to train
machine learning (ML) models as the data demand increases.
However, the integration of external data into training poses
data poisoning risks, where malicious providers manipulate
their data to compromise the utility or integrity of the model.
Most data poisoning defenses assume access to a set of clean
data (referred to as the base set), which could be obtained
through trusted sources. But it also becomes common that
entire data sources for an ML task are untrusted (e.g., Internet
data). In this case, one needs to identify a subset within a
contaminated dataset as the base set to support these defenses.
This paper starts by examining the performance of defenses
when poisoned samples are mistakenly mixed into the base
set. We analyze five representative defenses that use base sets
and find that their performance deteriorates dramatically with
less than 1% poisoned points in the base set. These findings
suggest that sifting out a base set with high precision is key to
these defenses’ performance. Motivated by these observations,
we study how precise existing automated tools and human
inspection are at identifying clean data in the presence of data
poisoning. Unfortunately, neither effort achieves the precision
needed that enables effective defenses. Worse yet, many of the
outcomes of these methods are worse than random selection.
In addition to uncovering the challenge, we take a step fur-
ther and propose a practical countermeasure, META-SIFT .
Our method is based on the insight that existing poisoning at-
tacks shift data distributions, resulting in high prediction loss
when training on the clean portion of a poisoned dataset and
testing on the corrupted portion. Leveraging the insight, we
formulate a bilevel optimization to identify clean data and fur-
ther introduce a suite of techniques to improve the efficiency
and precision of the identification. Our evaluation shows that
META-SIFT can sift a clean base set with 100% precision
under a wide range of poisoning threats. The selected base
set is large enough to give rise to successful defense when
plugged into the existing defense techniques.
M. Pan and Y. Zeng contributed equally. Corresponding Y. Zeng,L. Lyu
or R. Jia. Work partially done during Y. Zeng’s internship at Sony AI.
1 Introduction
Constructing high-performance machine learning (ML) sys-
tems requires large and diverse data. The data-hungry nature
will inevitably force individuals and organizations to lever-
age data from external sources, the beginning of which is
already evident. For instance, CLIP [1], the state-of-the-art
image representation, is learned from 400 million image-text
pairs collected from the Internet. Various data marketplaces
and crowd-sourcing platforms also emerge to enable data ex-
change at scale. While incorporating external data sources
into training has clear benefits, it exposes ML systems to se-
curity threats on account of data poisoning attacks, in which
attackers modify training data to degrade model performance
or control model prediction. In fact, data poisoning has been
remarked as the top security concern regarding ML systems
in the industry [2].
In this paper, the term “data poisoning” will be used in
a broad sense, referring to attacks that involve training data
manipulation. In particular, it includes both the attacks that
interfere only with training data and backdoor attacks that
embed a backdoor trigger during the training time and fur-
ther inject the trigger into test-time inputs to control their
corresponding predictions. Within the scope of this paper, we
divide existing data poisoning attacks into three categories
based on the attribute being manipulated:
Label-only attacks that only alter labels, such as tar-
geted [3] and Random Label-Flipping attacks [4] aimed at
degrading model utility;
Feature-only attacks that only manipulate features without
changing the labels, such as feature collision attacks [5]
and clean-label backdoor attacks [6,7];
Label-Feature attacks that change both feature and label,
such as standard backdoor attacks [810].
Intensive efforts have been invested in mitigating data poi-
soning. The types of defenses in the prior work range from
identifying poisoned samples in a training set [11] (Poison
Detection) to detecting whether a model has been trained on
a poisoned dataset [12] (Trojan-Net Detection) to remov-
ing backdoors from a poisoned model [13,14] (Backdoor
arXiv:2210.06516v2 [cs.CR] 31 May 2023
Removal) to redesigning training algorithms to prevent poi-
soning from taking effect [4] (Robust Training)
Most existing defenses assume that the defender can ac-
cess a set of clean data (referred to as the base set here-
after). Despite the prevalence of the assumption in exist-
ing literature, focused discussion about its validity is lack-
ing. If the defender were capable of collecting a set of clean
samples from trusted sources of data, then this assumption
could be met easily. However, it has become increasingly
common to learn solely from untrusted data sources, such
as training with the data scraped from the Internet or pur-
chased from specific vendors. In that case, the defender needs
to identify a clean subset within the poisoned dataset to
form the base set. Many important questions remain unclear:
How does the defense performance change if the identifica-
tion is imperfect and some poisoned data are mixed into the
base set? Are there any existing automated methods that can
reliably identify a clean base set in the presence of various
types of poisoning attacks? Can human inspection fulfill
the need? If not, how can we reliably identify enough clean
samples to support those defenses?
Takeaway #1: Defense performance is sensitive to the pu-
rity of the base set. We start by examining the sensitivity
of defense performance to the ratio of poisoned points in the
base set. We study five representative defense techniques that
rely on access to a base set. The techniques considered either
achieve state-of-the-art performance or are popular baselines.
We find that their performance degrades significantly (e.g.,
attack success rate exceeding 80%) with less than 1% of
poisoned points in the base set. Surprisingly, even a single
poisoned point is sufficient to nullify the effect of a state-of-
the-art poisoned data detector. These findings suggest that the
ability to sift out a base set with high precision is critical to
successfully applying these defenses.
Takeaway #2: Both existing automated methods and hu-
man inspection fail to identify a clean subset with high
enough precision. We investigate how precise existing auto-
mated methods and human inspection can be in identifying
clean data in the presence of data poisoning and the result
is illustrated in Figure 1. The precision of both humans and
existing automated methods varies a lot across different attack
categories. Humans are proficient at identifying poisoned sam-
ples that involve label changes, including Label-only attacks
and Label-Feature attacks, and outperform existing automated
methods by a large margin. However, humans still miss many
poisons and cannot realize a 100% success rate in sifting out a
clean base set. Notably, for these two attack categories, several
automated methods even underperform the random baseline.
On the other hand, for Feature-only attacks, human inspec-
tion results in a precision close to the random baseline. As
these attacks inject small perturbations only to the features
while not changing the overall semantics, human experts per-
form worse than most automated methods. This finding is in
direct contrast to the traditional wisdom that treats human
Figure 1: A comparison of the normalized precision of ex-
isting automated methods (Machine), Human, and META-
SIFT in sifting out a clean subset from a poisoned CIFAR-10.
Both human, machine-based, and META-SIFT results are nor-
malized with the poison ratio to ensure comparability. A larger
value indicates a stronger filtering capability. The
red
region
depicts the filtering capability worse than random selection.
supervision as the final backstop of data poisoning. Besides
being time-consuming and cost-intensive, human inspection
becomes less trustworthy in identifying poisoned data given
the fast-growing research on stealthy attacks. Overall, both ex-
isting automated methods and human inspection cannot reach
the level of precision required to enable successful defense.
Takeaway #3:
META-SIFT
— a scalable and effective au-
tomated method to sift out a clean base set. We propose
META-SIFT to sift a clean subset from the poisoned set. Our
approach is based on a novel insight that data manipulation
techniques exploited by existing poisoning attacks inevitably
result in a distributional shift from the clean data. Hence,
training on the clean portion of the contaminated dataset and
testing the trained model on the other corrupted portion will
lead to a high prediction loss. We formulate a bilevel opti-
mization problem to split the contaminated dataset in a way
that training on one split and testing on the other leads to
the highest prediction loss. However, this splitting problem is
hard to solve exactly as it has a combinatorial search space
and at the same time, contains two nested optimization prob-
lems. To address the computational challenge, we first relax
it into a continuous splitting problem, where we associate
each sample with a continuous weight indicating their like-
lihood of belonging to one of the splits and then optimize
the weights via gradient-based methods. Secondly, we adapt
the online algorithm that was originally designed for training
sample reweighting [15] to efficiently solve the continuous
relaxation of the bilevel problem. Furthermore, we adopt the
idea of “ensembling” to improve the precision of selection.
In particular, we propose to apply random perturbations to
each point, run the online algorithm on each perturbed version
to obtain a weight, and aggregate the weights for final clean
data selection. Our evaluation shows that META-SIFT can
robustly sift out a clean base set with 100% precision under
a wide range of poisoning attacks. The selected base set is
large enough to give rise to successful defense when plugged
into the existing defense techniques. It is worth noting that
META-SIFT significantly outperforms the existing automated
methods (illustrated in Figure 1) while being orders of magni-
tude faster (Table 5,6,15,16).
Our contributions can be summarized as follows:
We identify an overlooked problem of the accessibility
of a clean base set in the presence of data poisoning.
We systematically evaluate the performance of existing
automated methods and human inspection in distin-
guishing between poisoned and clean samples;
We propose a novel splitting-based idea to sift out a
clean subset from a poisoned dataset and formalize it into
a bilevel optimization problem.
We propose META-SIFT, comprising an efficient algo-
rithm to solve the bilevel problem as well as a series of
techniques to enhance sifting precision.
We extensively evaluate META-SIFT and compare with
existing automated methods on four benchmark datasets
under twelve different data poisoning attack settings. Our
method significantly outperforms existing methods in
both sifting precision and efficiency. At the same time,
plugging our sifted samples into existing defenses achieves
comparable or even better performance than plugging
in randomly selected clean samples.
We open-source the project to promote research on this
topic and facilitate the successful application of existing
defenses in settings without a clean base set 1.
2
Sifting Out a Clean Enough Base Set is Hard
The ability to acquire a clean base set was taken for granted
in many existing data poisoning defenses [13,14,16
19].
For instance, a popular Trojan-Net Detection strategy is to
first synthesize potential trigger patterns from a target model
and then inspect whether there exists any suspicious pattern
[13,16]. Trigger synthesis is done by searching for a pattern
that maximally activates a certain class output when it is
patched onto the clean data. Hence, access to a clean set of
data is indispensable to this defense strategy. Another example
is defenses against Label-Flipping attacks (often referred to
as mislabeled data detection in ML literature). State-of-the-
art methods detect mislabeled data by finding a subset of
instances such that when they are excluded from training, the
prediction accuracy on a clean validation set is maximized. A
clean set of instances are needed to enable these methods.
2.1 Defense Requires a Highly Pure Base Set
TABLE 1summarizes some representative techniques that
rely on access to a clean base set in each of the aforementioned
defense categories, namely, Poison Detection, Trojan-Net De-
tection, Backdoor Removal, and Robust Training against label
noise. These techniques either achieve the state-of-the-art per-
formance (e.g., Frequency Detector [11], I-BAU [14], MW-
Net [19]) or are widely-adopted baselines (e.g., MNTD [12]
and Neural Cleanse (NC) [13]). In particular, MNTD is im-
plemented as a base strategy in an ongoing competition for
Trojan-Net Detection2.
1https://github.com/ruoxi-jia-group/Meta-Sift
2https://trojandetection.ai/
While conventionally, these defense techniques only report
their performance based on a completely clean base set, given
the fast-advancing research on stealthy attacks, it is possi-
ble that some poisoned samples may go unnoticed and get
selected into the base set by mistake. Hence, it is critical to
evaluate how the performance of these defenses depends on
the ratio of the poisoned samples in the base set.
We adopt widely used metrics to measure defense perfor-
mance for each defense category. Specifically, for Poison De-
tection, we use Poison Filtering Rate (PFR), which measures
the ratio of poisoned samples that are correctly detected. For
Trojan-Net Detection, we follow the original work of MNTD
and use the Area Under the ROC Curve (AUC) as a metric,
which measures the entire two-dimensional area underneath
the ROC curve
3
.The most naive baseline for poison detec-
tion and Trojan-Net detection is random deletion, which ends
up with a PFR of 50% and an AUC of 50%. The closer the
performance of the defense in the Poison Detection and the
Trojan-Net Detection category gets to 50%, the weaker the
defense is. For backdoor removal, we use the Attack Suc-
cess Rate (ASR), which calculates the frequency with which
non-target-class samples patched with the backdoor trigger
are misclassified into the attacker-desired target class. For
Robust Training, we use the Test Accuracy (ACC), which
measures the accuracy of the trained model on a clean test set.
The baselines for Backdoor Removal and Robust Training are
simply the deployment of no defenses at all. We report ASR or
ACC that is obtained directly from training on the poisoned
dataset. The closer the performance of defense in these two
categories gets to these baselines, the weaker the defense is.
We compare the resulting defense performance against
standard attacks (e.g., BadNets [8], Random Label-Flipping)
between clean and corrupted base sets (Table 1). For Poisoned
Detection with Frequency Detector, even one poisoned ex-
ample sneaking into the base set is sufficient to nullify the
defensive effect, leading to a performance worse than the ran-
dom baseline. For MNTD, with 1% of poisoned examples
mixed into the base set, the AUC drops by almost 40%. Com-
paring the two techniques for Backdoor Removal, we can find
that I-BAU is more sensitive to corruption of the base set than
NC. Both techniques patch a trigger to partial samples in the
base set to fine-tune the poisoned model, aimed at forcing the
model to “forget” the wrong association between the trigger
and the target label. Compared to NC, the design of I-BAU se-
lects fewer samples in the base set to be patched with a trigger.
Hence, the positive “forgetting” effect introduced by these
samples is more likely to be overwhelmed by the negative
effect caused by poisoned examples sneaking into the base set.
This explains the larger sensitivity of I-BAU to corruption of
the base set. For both techniques, less than 3% of corruption
in the base set is adequate to bring the ASR back above
60%
.
For Robust Training with MW-Net, 20 mislabeled samples in
3
An ROC curve plots the true positive rate vs. the false positive rate at
different classification thresholds
Poison
Detetcion
Trojan-Net
Detetcion
Backdoor
Removal
Robust
Training
Frequency
Detector [11]
MNTD
[12]
NC
[13]
I-BAU
[14]
MW-Net
[19]
Task/
Settings
Detecting
BadNets;
BadNets 5%;
Target: 2;
BadNets 5%;
Target: 38;
20% Random
Label-Flipping;
Base Set 100-CIFAR-10 1000-MNIST 1000-GTSRB 100-CIFAR-10
Metric PFR (%) AUC (%) ASR (%) ACC (%)
Baseline Random: 50 Random: 50 No Def: 97.43 No Def: 69.99
# poison 0/100 0/1000 0/1000 0/1000 0/100
Original 99.95 99.92 18.83 12.58 91.18
# poison 1/100 10/1000 30/1000 8/1000 20/100
After 3.11 62.78 62.67 81.82 81.84
Table 1: Defenses in the case of using a corrupted base set. For
each category of defense, we use different metrics according
to these original works.
Baseline
results are the settings with
random guessing as defense or without defenses;
Original
results is the settings assuming an access to clean base set;
After shows the results of using a contaminated base set.
the base set can reduce the accuracy by about 10%. Overall,
we can see that base sets with high purity are crucial to enable
the successful application of these popular defenses requiring
access to base sets.
2.2 The Data Sifting Problem
The sensitivity of defense performance to the purity of
the underlying base set motivates us to study the Data Sift-
ing Problem:How to sift out a clean subset from a given
poisoned dataset? We highlight some unique challenges and
opportunities towards answering this question.
(Challenge) High precision: The empirical study in Sec-
tion 2.1 demonstrates that the defensive performance could
drop significantly with a small portion of corruption in the
base set. Hence, sifting out clean samples with high preci-
sion is crucial to ensure defense effectiveness.
(Challenge) Attack-Agnostic: In practice, the defender
usually does not know the underlying attack mechanisms
that the attacker used to generate the poisoned samples.
Hence, it is important to ensure high precision across dif-
ferent types of poisoning attacks.
(Opportunity) Mild requirement on the size of the sifted
subset. In contrast to the high requirement on the purity,
the requirement of the size of the sifted subset is gener-
ally mild. The size of the base set required to enable an
effective defense is usually much smaller compared to the
size of the poisoned set. For instance, a clean base set of
size less than 1% of the whole poisoned dataset suffice to
enable effective defenses [14,20] .
Note that some attempts have been made in the prior work
to lift the requirement on base sets in data poisoning de-
fenses [15,21
23]. However, these works are focused on
specific defense categories against specific attack settings.
By contrast, when solving the data sifting problem one can
obtain a highly pure base set that can be plugged into any
defense technique that requires the base-set-access. Hence,
we argue that solving the data sifting problem provides
a more flexible pathway to address the base-set-reliance
issue in current data-poisoning defense literature.
2.3 How Effective are Existing Methods
We first consider automated methods that can potentially
solve the data sifting problem. Note that the data sifting prob-
lem is similar to the traditional outlier detection problem,
wherein the goal is to sift out the abnormal instances in a
contaminated dataset. Ideally, if one could filter out all the
abnormal instances perfectly, then the complement set can be
taken to solve the data sifting problem. There are two key
differences between data sifting and outlier detection. Firstly,
the data sifting problem is contextualized in data poisoning
defense applications, where it is not necessary to sift out all
the clean instances, but instead, a small subset of clean in-
stances in the training set often suffices to support an effective
defense. Secondly, in outlier detection, it is often more impor-
tant to achieve high recall at a given selection budget (i.e., a
high proportion of true outliers is marked as outliers by the
detection algorithm), whereas in the data sifting problem, high
precision is the key to realize a successful defense (i.e., a high
proportion of the “marked-as-clean” points is truly clean).
At a technical level, the outlier detection algorithms are still
applicable to approach the data sifting problem. In particular,
existing outlier detection algorithms assign an “outlier score”
to each data point, indicating their likelihood to be an outlier.
To re-purpose these algorithms for data sifting, we select the
points with the lowest “outlier scores” of each class to form
the balanced base sets. We evaluate some representative out-
lier detection methods that do not rely on additional clean data
to function and examine their potential to solve the identified
problem. Specifically, we evaluate:
Distance to the Class-Means
(DCM)
: We compute the
mean of each class at the input-space (pixel-level) as the
class center and assign the outlier score to a point based
on the distance to its center.
Distance to Model-Inversion-based CM
(MI-DCM)
: MI-
DCM differs from DCM in the choice of the class cen-
ter. Here, each class center is obtained by conducting the
model inversion attack [24] on the model trained on the
entire dataset. Model inversion is a type of privacy attack
aimed at reconstructing the representative points for each
class from the trained model.
Spectral Filtering’s Least Scores
(SF-Least)
[25]: Spec-
tral Filtering is an advanced outlier detection method in
robust statistics. Its key idea is that the outliers will result
in larger eigenvalues than expected eigenvalues of sample
covariance. This idea has been applied to the detection
of backdoored samples. To start with, we extract features
for each sample using the model trained over the contami-
nated dataset and the outlier score of a sample is calculated
based on the dot product between its feature and the top
eigenvector of the sample covariance matrix.
Loss-based Poison Scanning
(Loss-Scan)
[26]: Recent
work has identified the difference between the losses of
benign and backdoored samples [26]. For most backdoor
attacks, the losses of the poisoned sample would be lower
摘要:

META-SIFT:HowtoSiftOutaCleanSubsetinthePresenceofDataPoisoning?YiZeng∗*1,2,MinzhouPan*1,HimanshuJahagirdar1,MingJin1,LingjuanLyu2andRuoxiJia11VirginiaTech,Blacksburg,VA24061,USA2SonyAI,Tokyo,108-0075,JapanAbstractExternaldatasourcesareincreasinglybeingusedtotrainmachinelearning(ML)modelsasthedatadem...

展开>> 收起<<
META-SIFT How to Sift Out a Clean Subset in the Presence of Data Poisoning Yi Zeng12 Minzhou Pan 1 Himanshu Jahagirdar1 Ming Jin1 Lingjuan Lyu2and Ruoxi Jia1 1Virginia Tech Blacksburg V A 24061 USA.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:2.28MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注