META-SIFT How to Sift Out a Clean Subset in the Presence of Data Poisoning Yi Zeng12 Minzhou Pan 1 Himanshu Jahagirdar1 Ming Jin1 Lingjuan Lyu2and Ruoxi Jia1 1Virginia Tech Blacksburg V A 24061 USA

2025-04-24 1 0 2.28MB 18 页 10玖币

侵权投诉

META-SIFT : How to Sift Out a Clean Subset in the Presence of Data Poisoning?

Yi Zeng∗

*1,2, Minzhou Pan*1, Himanshu Jahagirdar1, Ming Jin1, Lingjuan Lyu2and Ruoxi Jia1

1Virginia Tech, Blacksburg, VA 24061, USA

2Sony AI, Tokyo, 108-0075, Japan

Abstract

External data sources are increasingly being used to train

machine learning (ML) models as the data demand increases.

However, the integration of external data into training poses

data poisoning risks, where malicious providers manipulate

their data to compromise the utility or integrity of the model.

Most data poisoning defenses assume access to a set of clean

data (referred to as the base set), which could be obtained

through trusted sources. But it also becomes common that

entire data sources for an ML task are untrusted (e.g., Internet

data). In this case, one needs to identify a subset within a

contaminated dataset as the base set to support these defenses.

This paper starts by examining the performance of defenses

when poisoned samples are mistakenly mixed into the base

set. We analyze ﬁve representative defenses that use base sets

and ﬁnd that their performance deteriorates dramatically with

less than 1% poisoned points in the base set. These ﬁndings

suggest that sifting out a base set with high precision is key to

these defenses’ performance. Motivated by these observations,

we study how precise existing automated tools and human

inspection are at identifying clean data in the presence of data

poisoning. Unfortunately, neither effort achieves the precision

needed that enables effective defenses. Worse yet, many of the

outcomes of these methods are worse than random selection.

In addition to uncovering the challenge, we take a step fur-

ther and propose a practical countermeasure, META-SIFT .

Our method is based on the insight that existing poisoning at-

tacks shift data distributions, resulting in high prediction loss

when training on the clean portion of a poisoned dataset and

testing on the corrupted portion. Leveraging the insight, we

formulate a bilevel optimization to identify clean data and fur-

ther introduce a suite of techniques to improve the efﬁciency

and precision of the identiﬁcation. Our evaluation shows that

META-SIFT can sift a clean base set with 100% precision

under a wide range of poisoning threats. The selected base

set is large enough to give rise to successful defense when

plugged into the existing defense techniques.

∗

M. Pan and Y. Zeng contributed equally. Corresponding Y. Zeng,L. Lyu

or R. Jia. Work partially done during Y. Zeng’s internship at Sony AI.

1 Introduction

Constructing high-performance machine learning (ML) sys-

tems requires large and diverse data. The data-hungry nature

will inevitably force individuals and organizations to lever-

age data from external sources, the beginning of which is

already evident. For instance, CLIP [1], the state-of-the-art

image representation, is learned from 400 million image-text

pairs collected from the Internet. Various data marketplaces

and crowd-sourcing platforms also emerge to enable data ex-

change at scale. While incorporating external data sources

into training has clear beneﬁts, it exposes ML systems to se-

curity threats on account of data poisoning attacks, in which

attackers modify training data to degrade model performance

or control model prediction. In fact, data poisoning has been

remarked as the top security concern regarding ML systems

in the industry [2].

In this paper, the term “data poisoning” will be used in

a broad sense, referring to attacks that involve training data

manipulation. In particular, it includes both the attacks that

interfere only with training data and backdoor attacks that

embed a backdoor trigger during the training time and fur-

ther inject the trigger into test-time inputs to control their

corresponding predictions. Within the scope of this paper, we

divide existing data poisoning attacks into three categories

based on the attribute being manipulated:

•

Label-only attacks that only alter labels, such as tar-

geted [3] and Random Label-Flipping attacks [4] aimed at

degrading model utility;

•

Feature-only attacks that only manipulate features without

changing the labels, such as feature collision attacks [5]

and clean-label backdoor attacks [6,7];

•

Label-Feature attacks that change both feature and label,

such as standard backdoor attacks [8–10].

Intensive efforts have been invested in mitigating data poi-

soning. The types of defenses in the prior work range from

identifying poisoned samples in a training set [11] (Poison

Detection) to detecting whether a model has been trained on

a poisoned dataset [12] (Trojan-Net Detection) to remov-

ing backdoors from a poisoned model [13,14] (Backdoor

arXiv:2210.06516v2 [cs.CR] 31 May 2023

Removal) to redesigning training algorithms to prevent poi-

soning from taking effect [4] (Robust Training)

Most existing defenses assume that the defender can ac-

cess a set of clean data (referred to as the base set here-

after). Despite the prevalence of the assumption in exist-

ing literature, focused discussion about its validity is lack-

ing. If the defender were capable of collecting a set of clean

samples from trusted sources of data, then this assumption

could be met easily. However, it has become increasingly

common to learn solely from untrusted data sources, such

as training with the data scraped from the Internet or pur-

chased from speciﬁc vendors. In that case, the defender needs

to identify a clean subset within the poisoned dataset to

form the base set. Many important questions remain unclear:

How does the defense performance change if the identiﬁca-

tion is imperfect and some poisoned data are mixed into the

base set? Are there any existing automated methods that can

reliably identify a clean base set in the presence of various

types of poisoning attacks? Can human inspection fulﬁll

the need? If not, how can we reliably identify enough clean

samples to support those defenses?

Takeaway #1: Defense performance is sensitive to the pu-

rity of the base set. We start by examining the sensitivity

of defense performance to the ratio of poisoned points in the

base set. We study ﬁve representative defense techniques that

rely on access to a base set. The techniques considered either

achieve state-of-the-art performance or are popular baselines.

We ﬁnd that their performance degrades signiﬁcantly (e.g.,

attack success rate exceeding 80%) with less than 1% of

poisoned points in the base set. Surprisingly, even a single

poisoned point is sufﬁcient to nullify the effect of a state-of-

the-art poisoned data detector. These ﬁndings suggest that the

ability to sift out a base set with high precision is critical to

successfully applying these defenses.

Takeaway #2: Both existing automated methods and hu-

man inspection fail to identify a clean subset with high

enough precision. We investigate how precise existing auto-

mated methods and human inspection can be in identifying

clean data in the presence of data poisoning and the result

is illustrated in Figure 1. The precision of both humans and

existing automated methods varies a lot across different attack

categories. Humans are proﬁcient at identifying poisoned sam-

ples that involve label changes, including Label-only attacks

and Label-Feature attacks, and outperform existing automated

methods by a large margin. However, humans still miss many

poisons and cannot realize a 100% success rate in sifting out a

clean base set. Notably, for these two attack categories, several

automated methods even underperform the random baseline.

On the other hand, for Feature-only attacks, human inspec-

tion results in a precision close to the random baseline. As

these attacks inject small perturbations only to the features

while not changing the overall semantics, human experts per-

form worse than most automated methods. This ﬁnding is in

direct contrast to the traditional wisdom that treats human

Figure 1: A comparison of the normalized precision of ex-

isting automated methods (Machine), Human, and META-

SIFT in sifting out a clean subset from a poisoned CIFAR-10.

Both human, machine-based, and META-SIFT results are nor-

malized with the poison ratio to ensure comparability. A larger

value indicates a stronger ﬁltering capability. The

red

region

depicts the ﬁltering capability worse than random selection.

supervision as the ﬁnal backstop of data poisoning. Besides

being time-consuming and cost-intensive, human inspection

becomes less trustworthy in identifying poisoned data given

the fast-growing research on stealthy attacks. Overall, both ex-

isting automated methods and human inspection cannot reach

the level of precision required to enable successful defense.

Takeaway #3:

META-SIFT

— a scalable and effective au-

tomated method to sift out a clean base set. We propose

META-SIFT to sift a clean subset from the poisoned set. Our

approach is based on a novel insight that data manipulation

techniques exploited by existing poisoning attacks inevitably

result in a distributional shift from the clean data. Hence,

training on the clean portion of the contaminated dataset and

testing the trained model on the other corrupted portion will

lead to a high prediction loss. We formulate a bilevel opti-

mization problem to split the contaminated dataset in a way

that training on one split and testing on the other leads to

the highest prediction loss. However, this splitting problem is

hard to solve exactly as it has a combinatorial search space

and at the same time, contains two nested optimization prob-

lems. To address the computational challenge, we ﬁrst relax

it into a continuous splitting problem, where we associate

each sample with a continuous weight indicating their like-

lihood of belonging to one of the splits and then optimize

the weights via gradient-based methods. Secondly, we adapt

the online algorithm that was originally designed for training

sample reweighting [15] to efﬁciently solve the continuous

relaxation of the bilevel problem. Furthermore, we adopt the

idea of “ensembling” to improve the precision of selection.

In particular, we propose to apply random perturbations to

each point, run the online algorithm on each perturbed version

to obtain a weight, and aggregate the weights for ﬁnal clean

data selection. Our evaluation shows that META-SIFT can

robustly sift out a clean base set with 100% precision under

a wide range of poisoning attacks. The selected base set is

large enough to give rise to successful defense when plugged

into the existing defense techniques. It is worth noting that

META-SIFT signiﬁcantly outperforms the existing automated

methods (illustrated in Figure 1) while being orders of magni-

tude faster (Table 5,6,15,16).

Our contributions can be summarized as follows:

•

We identify an overlooked problem of the accessibility

of a clean base set in the presence of data poisoning.

•

We systematically evaluate the performance of existing

automated methods and human inspection in distin-

guishing between poisoned and clean samples;

•

We propose a novel splitting-based idea to sift out a

clean subset from a poisoned dataset and formalize it into

a bilevel optimization problem.

•

We propose META-SIFT, comprising an efﬁcient algo-

rithm to solve the bilevel problem as well as a series of

techniques to enhance sifting precision.

•

We extensively evaluate META-SIFT and compare with

existing automated methods on four benchmark datasets

under twelve different data poisoning attack settings. Our

method signiﬁcantly outperforms existing methods in

both sifting precision and efﬁciency. At the same time,

plugging our sifted samples into existing defenses achieves

comparable or even better performance than plugging

in randomly selected clean samples.

•

We open-source the project to promote research on this

topic and facilitate the successful application of existing

defenses in settings without a clean base set 1.

Sifting Out a Clean Enough Base Set is Hard

The ability to acquire a clean base set was taken for granted

in many existing data poisoning defenses [13,14,16

–

19].

For instance, a popular Trojan-Net Detection strategy is to

ﬁrst synthesize potential trigger patterns from a target model

and then inspect whether there exists any suspicious pattern

[13,16]. Trigger synthesis is done by searching for a pattern

that maximally activates a certain class output when it is

patched onto the clean data. Hence, access to a clean set of

data is indispensable to this defense strategy. Another example

is defenses against Label-Flipping attacks (often referred to

as mislabeled data detection in ML literature). State-of-the-

art methods detect mislabeled data by ﬁnding a subset of

instances such that when they are excluded from training, the

prediction accuracy on a clean validation set is maximized. A

clean set of instances are needed to enable these methods.

2.1 Defense Requires a Highly Pure Base Set

TABLE 1summarizes some representative techniques that

rely on access to a clean base set in each of the aforementioned

defense categories, namely, Poison Detection, Trojan-Net De-

tection, Backdoor Removal, and Robust Training against label

noise. These techniques either achieve the state-of-the-art per-

formance (e.g., Frequency Detector [11], I-BAU [14], MW-

Net [19]) or are widely-adopted baselines (e.g., MNTD [12]

and Neural Cleanse (NC) [13]). In particular, MNTD is im-

plemented as a base strategy in an ongoing competition for

Trojan-Net Detection2.

1https://github.com/ruoxi-jia-group/Meta-Sift

2https://trojandetection.ai/

While conventionally, these defense techniques only report

their performance based on a completely clean base set, given

the fast-advancing research on stealthy attacks, it is possi-

ble that some poisoned samples may go unnoticed and get

selected into the base set by mistake. Hence, it is critical to

evaluate how the performance of these defenses depends on

the ratio of the poisoned samples in the base set.

We adopt widely used metrics to measure defense perfor-

mance for each defense category. Speciﬁcally, for Poison De-

tection, we use Poison Filtering Rate (PFR), which measures

the ratio of poisoned samples that are correctly detected. For

Trojan-Net Detection, we follow the original work of MNTD

and use the Area Under the ROC Curve (AUC) as a metric,

which measures the entire two-dimensional area underneath

the ROC curve

.The most naive baseline for poison detec-

tion and Trojan-Net detection is random deletion, which ends

up with a PFR of 50% and an AUC of 50%. The closer the

performance of the defense in the Poison Detection and the

Trojan-Net Detection category gets to 50%, the weaker the

defense is. For backdoor removal, we use the Attack Suc-

cess Rate (ASR), which calculates the frequency with which

non-target-class samples patched with the backdoor trigger

are misclassiﬁed into the attacker-desired target class. For

Robust Training, we use the Test Accuracy (ACC), which

measures the accuracy of the trained model on a clean test set.

The baselines for Backdoor Removal and Robust Training are

simply the deployment of no defenses at all. We report ASR or

ACC that is obtained directly from training on the poisoned

dataset. The closer the performance of defense in these two

categories gets to these baselines, the weaker the defense is.

We compare the resulting defense performance against

standard attacks (e.g., BadNets [8], Random Label-Flipping)

between clean and corrupted base sets (Table 1). For Poisoned

Detection with Frequency Detector, even one poisoned ex-

ample sneaking into the base set is sufﬁcient to nullify the

defensive effect, leading to a performance worse than the ran-

dom baseline. For MNTD, with 1% of poisoned examples

mixed into the base set, the AUC drops by almost 40%. Com-

paring the two techniques for Backdoor Removal, we can ﬁnd

that I-BAU is more sensitive to corruption of the base set than

NC. Both techniques patch a trigger to partial samples in the

base set to ﬁne-tune the poisoned model, aimed at forcing the

model to “forget” the wrong association between the trigger

and the target label. Compared to NC, the design of I-BAU se-

lects fewer samples in the base set to be patched with a trigger.

Hence, the positive “forgetting” effect introduced by these

samples is more likely to be overwhelmed by the negative

effect caused by poisoned examples sneaking into the base set.

This explains the larger sensitivity of I-BAU to corruption of

the base set. For both techniques, less than 3% of corruption

in the base set is adequate to bring the ASR back above

60%

For Robust Training with MW-Net, 20 mislabeled samples in

An ROC curve plots the true positive rate vs. the false positive rate at

different classiﬁcation thresholds

Poison

Detetcion

Trojan-Net

Detetcion

Backdoor

Removal

Robust

Training

Frequency

Detector [11]

MNTD

[12]

[13]

I-BAU

[14]

MW-Net

[19]

Task/

Settings

Detecting

BadNets;

BadNets 5%;

Target: 2;

BadNets 5%;

Target: 38;

20% Random

Label-Flipping;

Base Set 100-CIFAR-10 1000-MNIST 1000-GTSRB 100-CIFAR-10

Metric PFR (↑%) AUC (↑%) ASR (↓%) ACC (↑%)

Baseline Random: 50 Random: 50 No Def: 97.43 No Def: 69.99

# poison 0/100 0/1000 0/1000 0/1000 0/100

Original 99.95 99.92 18.83 12.58 91.18

# poison 1/100 10/1000 30/1000 8/1000 20/100

After 3.11 62.78 62.67 81.82 81.84

Table 1: Defenses in the case of using a corrupted base set. For

each category of defense, we use different metrics according

to these original works.

Baseline

results are the settings with

random guessing as defense or without defenses;

Original

results is the settings assuming an access to clean base set;

After shows the results of using a contaminated base set.

the base set can reduce the accuracy by about 10%. Overall,

we can see that base sets with high purity are crucial to enable

the successful application of these popular defenses requiring

access to base sets.

2.2 The Data Sifting Problem

The sensitivity of defense performance to the purity of

the underlying base set motivates us to study the Data Sift-

ing Problem:How to sift out a clean subset from a given

poisoned dataset? We highlight some unique challenges and

opportunities towards answering this question.

•

(Challenge) High precision: The empirical study in Sec-

tion 2.1 demonstrates that the defensive performance could

drop signiﬁcantly with a small portion of corruption in the

base set. Hence, sifting out clean samples with high preci-

sion is crucial to ensure defense effectiveness.

•

(Challenge) Attack-Agnostic: In practice, the defender

usually does not know the underlying attack mechanisms

that the attacker used to generate the poisoned samples.

Hence, it is important to ensure high precision across dif-

ferent types of poisoning attacks.

•

(Opportunity) Mild requirement on the size of the sifted

subset. In contrast to the high requirement on the purity,

the requirement of the size of the sifted subset is gener-

ally mild. The size of the base set required to enable an

effective defense is usually much smaller compared to the

size of the poisoned set. For instance, a clean base set of

size less than 1% of the whole poisoned dataset sufﬁce to

enable effective defenses [14,20] .

Note that some attempts have been made in the prior work

to lift the requirement on base sets in data poisoning de-

fenses [15,21

–

23]. However, these works are focused on

speciﬁc defense categories against speciﬁc attack settings.

By contrast, when solving the data sifting problem one can

obtain a highly pure base set that can be plugged into any

defense technique that requires the base-set-access. Hence,

we argue that solving the data sifting problem provides

a more ﬂexible pathway to address the base-set-reliance

issue in current data-poisoning defense literature.

2.3 How Effective are Existing Methods

We ﬁrst consider automated methods that can potentially

solve the data sifting problem. Note that the data sifting prob-

lem is similar to the traditional outlier detection problem,

wherein the goal is to sift out the abnormal instances in a

contaminated dataset. Ideally, if one could ﬁlter out all the

abnormal instances perfectly, then the complement set can be

taken to solve the data sifting problem. There are two key

differences between data sifting and outlier detection. Firstly,

the data sifting problem is contextualized in data poisoning

defense applications, where it is not necessary to sift out all

the clean instances, but instead, a small subset of clean in-

stances in the training set often sufﬁces to support an effective

defense. Secondly, in outlier detection, it is often more impor-

tant to achieve high recall at a given selection budget (i.e., a

high proportion of true outliers is marked as outliers by the

detection algorithm), whereas in the data sifting problem, high

precision is the key to realize a successful defense (i.e., a high

proportion of the “marked-as-clean” points is truly clean).

At a technical level, the outlier detection algorithms are still

applicable to approach the data sifting problem. In particular,

existing outlier detection algorithms assign an “outlier score”

to each data point, indicating their likelihood to be an outlier.

To re-purpose these algorithms for data sifting, we select the

points with the lowest “outlier scores” of each class to form

the balanced base sets. We evaluate some representative out-

lier detection methods that do not rely on additional clean data

to function and examine their potential to solve the identiﬁed

problem. Speciﬁcally, we evaluate:

•

Distance to the Class-Means

(DCM)

: We compute the

mean of each class at the input-space (pixel-level) as the

class center and assign the outlier score to a point based

on the distance to its center.

•

Distance to Model-Inversion-based CM

(MI-DCM)

: MI-

DCM differs from DCM in the choice of the class cen-

ter. Here, each class center is obtained by conducting the

model inversion attack [24] on the model trained on the

entire dataset. Model inversion is a type of privacy attack

aimed at reconstructing the representative points for each

class from the trained model.

•

Spectral Filtering’s Least Scores

(SF-Least)

[25]: Spec-

tral Filtering is an advanced outlier detection method in

robust statistics. Its key idea is that the outliers will result

in larger eigenvalues than expected eigenvalues of sample

covariance. This idea has been applied to the detection

of backdoored samples. To start with, we extract features

for each sample using the model trained over the contami-

nated dataset and the outlier score of a sample is calculated

based on the dot product between its feature and the top

eigenvector of the sample covariance matrix.

•

Loss-based Poison Scanning

(Loss-Scan)

[26]: Recent

work has identiﬁed the difference between the losses of

benign and backdoored samples [26]. For most backdoor

attacks, the losses of the poisoned sample would be lower

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

META-SIFT:HowtoSiftOutaCleanSubsetinthePresenceofDataPoisoning?YiZeng∗*1,2,MinzhouPan*1,HimanshuJahagirdar1,MingJin1,LingjuanLyu2andRuoxiJia11VirginiaTech,Blacksburg,VA24061,USA2SonyAI,Tokyo,108-0075,JapanAbstractExternaldatasourcesareincreasinglybeingusedtotrainmachinelearning(ML)modelsasthedatadem...

展开>> 收起<<

META-SIFT How to Sift Out a Clean Subset in the Presence of Data Poisoning Yi Zeng12 Minzhou Pan 1 Himanshu Jahagirdar1 Ming Jin1 Lingjuan Lyu2and Ruoxi Jia1 1Virginia Tech Blacksburg V A 24061 USA.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

META-SIFT How to Sift Out a Clean Subset in the Presence of Data Poisoning Yi Zeng12 Minzhou Pan 1 Himanshu Jahagirdar1 Ming Jin1 Lingjuan Lyu2and Ruoxi Jia1 1Virginia Tech Blacksburg V A 24061 USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: