On the Robustness of Deep Clustering Models Adversarial Attacks and Defenses Anshuman Chhabra Ashwin Sekhari and Prasant Mohapatra

2025-05-02 0 0 4.11MB 14 页 10玖币
侵权投诉
On the Robustness of Deep Clustering Models:
Adversarial Attacks and Defenses
Anshuman Chhabra, Ashwin Sekhari*, and Prasant Mohapatra
Department of Computer Science
University of California
Davis, CA 95616
{chhabra,asekhari,pmohapatra}@ucdavis.edu
Abstract
Clustering models constitute a class of unsupervised machine learning methods
which are used in a number of application pipelines, and play a vital role in
modern data science. With recent advancements in deep learning– deep clustering
models have emerged as the current state-of-the-art over traditional clustering
approaches, especially for high-dimensional image datasets. While traditional
clustering approaches have been analyzed from a robustness perspective, no prior
work has investigated adversarial attacks and robustness for deep clustering models
in a principled manner. To bridge this gap, we propose a blackbox attack using
Generative Adversarial Networks (GANs) where the adversary does not know
which deep clustering model is being used, but can query it for outputs. We
analyze our attack against multiple state-of-the-art deep clustering models and real-
world datasets, and find that it is highly successful. We then employ some natural
unsupervised defense approaches, but find that these are unable to mitigate our
attack. Finally, we attack
Face++
, a production-level face clustering API service,
and find that we can significantly reduce its performance as well. Through this
work, we thus aim to motivate the need for truly robust deep clustering models.
1 Introduction
Clustering models are utilized in many data-driven applications to group similar samples together, and
dissimilar samples separately. They constitute a powerful class of unsupervised Machine Learning
(ML) models which can be employed in many cases where labels for data samples are either hard (or
impossible) to obtain. Note that there are a multitude of different approaches to accomplishing the
aforementioned clustering task, and an important differentiation can be made between traditional and
deep clustering approaches.
Traditional clustering generally aims to minimize a clustering objective function defined using a given
distance metric [
1
]. These include approaches such as k-means [
2
], DBSCAN [
3
], spectral methods
[
4
], among others. Such models generally fail to perform satisfactorily on high-dimensional data (i.e.,
image datasets), or incur huge computational costs that make the problem intractable [
5
]. To improve
upon these approaches, initial deep clustering models sought to decompose the high-dimensional data
to a cluster-friendly low-dimensional representation using deep neural networks. Clustering was then
undertaken on this latent space representation [
6
,
7
]. Since then, deep clustering models have become
considerably advanced, with state-of-the-art models outperforming traditional clustering models by
significant margins on a number of real-world datasets [8, 9].
Despite these successes, deep clustering models have not been sufficiently analyzed from a robustness
perspective. Recently, traditional clustering models have been shown to be vulnerable to adversarial
Equal contribution.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.01940v1 [cs.LG] 4 Oct 2022
attacks that can reduce clustering performance significantly [
10
,
11
]
2
. However, no such adversarial
attacks exist for deep clustering methods. Furthermore, no work investigates generalized blackbox
attacks, where the adversary has zero knowledge of the deep clustering model being used. This is
the most realistic setting under which a malicious adversary could aim to disrupt the working of
these models. While there is a multitude of work in this domain for supervised learning models, deep
clustering approaches have not received the same attention from the community.
The closest work to ours proposes robust deep clustering [
9
,
12
], by retraining models with adver-
sarially perturbed inputs to improve clustering performance. However, this line of work has many
shortcomings: 1) it lacks fundamental analysis on attacks specific to deep clustering models (for
e.g., the state-of-the-art robust deep clustering model RUC [
9
] only considers images perturbed via
FGSM/BIM [
13
,
14
] attacks, which are common attack approaches for supervised learning), 2) no
clearly defined threat models for the adversary are proposed
3
, and 3) there is no transferability [
15
]
analysis. Thus in this work we seek to bridge this gap by proposing generalized blackbox attacks that
operate at the input space under a well-defined adversarial threat model. We also conduct empirical
analysis to observe how effectively adversarial samples transfer between different clustering models.
Original
Adversarial
pre-attack
cluster
post-attack
cluster
bird cat dog deer horse bird dog ship
ship ship plane dog bird dog bird bird
Figure 1: Adversarial samples generated by our attack (first 4 image
pairs from the left correspond to SPICE and the others to RUC).
We utilize Generative Adversarial Net-
works (GANs) [
16
] for our attack, in-
spired by previous approaches (Adv-
GAN [
17
], AdvGAN++ [
18
], etc) for
supervised learning. We also utilize
a number of defense approaches (es-
sentially deep learning based anomaly
detection [
19
] and state-of-the-art "ro-
bust" deep clustering models [
9
]) to
determine if our adversarial samples
can be mitigated by an informed de-
fender. One of the major findings of our work is that these approaches are unable to prevent our
adversarial attacks. Through our work, we seek to promote the development of better defenses for
adversarial attacks against deep clustering.
Finally, to truly showcase how powerful our attacks can be, we attack a production-level ML-as-a-
Service (MLaaS) API that performs a clustering task (and possibly utilizes deep clustering models in
the backend). We find that our attack can also significantly reduce the functioning of such MLaaS
clustering API services. To summarize, we make the following contributions:
We propose the first blackbox adversarial attack against deep clustering models. We show that our
attacks can significantly reduce the performance of these models while requiring a minimal number
of queries. We also undertake a transferability analysis to demonstrate the magnitude of our attack.
We undertake a thorough experimental analysis of most state-of-the-art (SOTA) deep clustering
models, on a number of real-world datasets, such as CIFAR-10 [
20
], CIFAR-100 [
21
], and STL-10
[22] which shows that our attack is applicable across a number of models and datasets.
We show that existing (unsupervised) defense approaches (such as anomaly detection and robustness
via adversarial retraining) cannot thwart our attack samples, thus prompting the need for better defense
approaches for adversarial attacks against deep clustering models.
We also attack a production-level MLaaS clustering API to showcase the extent of our attack. We
find that our attack is highly successful for this real-world task, underscoring the need for making
deep clustering models truly robust.
Figure 1 shows some of the adversarial samples generated by our attack for the SPICE [
8
] and RUC
[
9
] deep clustering models on the STL-10 dataset, and their corresponding pre-attack and post-attack
predicted cluster/class labels
4
. To the human eye, these samples appear indistinguishable from each
2
These attacks cannot be used for deep clustering because: 1) they employ computationally intensive
optimizers that use exhaustive search and hence, make the attack too expensive for high-dimensional data, 2) they
are designed for traditional clustering and are training-time attacks, but, deep clustering models are deployed
frozen for inference, requiring a test-time attack.
3
For example, for RUC [
9
], it is implicitly assumed that the adversary has knowledge of the dataset as well
as ground truth labels, and will attack using supervised whitebox attacks.
4Class labels are inferred by taking the majority from the ground truth labels for samples in that cluster.
2
other, but the model clusters them incorrectly after the attack. As we will show in later sections,
our attack on SPICE for STL-10 results in a 83.9% (67.9% for RUC) decrease in clustering utility
measured according to the NMI [23] metric5.
2 Related Works
Deep Clustering.
To leverage clustering algorithms on high-dimensional data, early work on deep
clustering [
6
,
7
], aimed to learn a latent low-dimensional cluster-friendly representation that could
then be clustered on. This task was achieved using AutoEncoders (AEs) [
7
,
24
,
25
], Variational
AEs [
26
,
27
], or GANs [
28
30
]. More recently, current SOTA deep clustering models employ
self-supervised and contrastive learning instead of the earlier AE/GAN based approaches to perform
clustering [8, 9, 31]. We consider these in the paper due to their superlative performance.
Training-time Adversarial Attacks Against Clustering.
Recently, a few works have proposed
blackbox adversarial attacks against clustering [
10
,
11
]. These seek to poison a small number of
samples in the input data, so that when clustering is undertaken on the poisoned dataset, other
unperturbed samples change cluster memberships. This problem is significantly different from ours–
the traditional clustering algorithm retrains on the poisoned input data constituting an attack at
training time. For deep clustering, the model does not train again once deployed, so we have to
generate adversarial images that the model misclusters at test time. There have also been whitebox
attacks proposed for traditional clustering, but these can only be employed for the specific traditional
clustering algorithm considered [32–34].
Attacks/Defenses in Supervised Learning.
The closest attacks to ours in supervised learning are
score-based blackbox attacks where the adversary has access to the softmax probabilities outputted by
the model [
35
]. A number of such attacks have been proposed: NES [
36
], SPSA [
37
], among others.
These cannot be applied in their original formulation as the attack optimization expects ground truth
reference labels. Even with simple modifications to the loss (such as using predicted labels as ground
truth) these attacks do not work as successfully as our proposed GAN attack (refer to Appendix F.1
for a detailed discussion and Appendix E for empirical justification). Similar issues make supervised
defenses inapplicable for our setting– defense strategies need to be truly unsupervised to work in the
clustering context, and outputs of deep clustering models might not possess a one-to-one mapping
with ground truth labels, causing further problems. We provide a more detailed discussion on the
inapplicability of these defense methods in Appendix F.2.
Robust Deep Clustering.
A natural in-processing defense approach to preventing adversarial attacks
on deep neural networks is to utilize some form of adversarial retraining which seeks to jointly
optimize an adversarial objective along with the original loss function used to train the network. This
technique has been shown to vastly improve the adversarial robustness of traditional deep neural
networks [
38
,
39
]. In the context of deep clustering, to the best of our knowledge, there are two
works that aim to make models robust to adversarial noise in this manner
6
: RUC [
9
] and ALRDC
[
12
]. Moreover, RUC is considered to be the SOTA robust deep clustering model. Even in terms
of performance metrics (NMI, ACC, ARI) it is second only to the SOTA model SPICE [43].
3 Preliminaries and Notation
In this section, we introduce notation and preliminary knowledge regarding the clustering task and
the proposed adversarial attack. Note that for a matrix
A
of size
u×v
, we can index into row
i
as
Ai,i[u]
, and index into a single value at row
i
and column
j
as
Ai,j ,i[u], j [v]
. Moreover,
||.|| denotes the Euclidean (2-norm) norm of a vector.
Deep Clustering.
Since we are proposing blackbox attacks in our paper, we will be defining deep
clustering models in a more generalized manner that abstracts their inner functioning. We defer
the reader to [
5
] for more details on the models analyzed in the paper. A deep clustering model is
denoted as
C
and operates on samples of the given dataset
X
consisting of
n
samples and maps them
5
Similar results for SPICE hold for CIFAR-100 with 78.9% (58.2% for RUC) decrease in NMI, and for
CIFAR-10 with 72.9% (68.7% for RUC) decrease in NMI.
6
[
40
42
] refer to "robustness" in their work but their definition is not the traditional notion of adversarial
robustness and they do not incorporate adversarial retraining.
3
摘要:

OntheRobustnessofDeepClusteringModels:AdversarialAttacksandDefensesAnshumanChhabra,AshwinSekhari*,andPrasantMohapatraDepartmentofComputerScienceUniversityofCaliforniaDavis,CA95616{chhabra,asekhari,pmohapatra}@ucdavis.eduAbstractClusteringmodelsconstituteaclassofunsupervisedmachinelearningmethodswhi...

收起<<
On the Robustness of Deep Clustering Models Adversarial Attacks and Defenses Anshuman Chhabra Ashwin Sekhari and Prasant Mohapatra.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:14 页 大小:4.11MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注