
attacks that can reduce clustering performance significantly [
10
,
11
]
2
. However, no such adversarial
attacks exist for deep clustering methods. Furthermore, no work investigates generalized blackbox
attacks, where the adversary has zero knowledge of the deep clustering model being used. This is
the most realistic setting under which a malicious adversary could aim to disrupt the working of
these models. While there is a multitude of work in this domain for supervised learning models, deep
clustering approaches have not received the same attention from the community.
The closest work to ours proposes robust deep clustering [
9
,
12
], by retraining models with adver-
sarially perturbed inputs to improve clustering performance. However, this line of work has many
shortcomings: 1) it lacks fundamental analysis on attacks specific to deep clustering models (for
e.g., the state-of-the-art robust deep clustering model RUC [
9
] only considers images perturbed via
FGSM/BIM [
13
,
14
] attacks, which are common attack approaches for supervised learning), 2) no
clearly defined threat models for the adversary are proposed
3
, and 3) there is no transferability [
15
]
analysis. Thus in this work we seek to bridge this gap by proposing generalized blackbox attacks that
operate at the input space under a well-defined adversarial threat model. We also conduct empirical
analysis to observe how effectively adversarial samples transfer between different clustering models.
Original
Adversarial
pre-attack
cluster
post-attack
cluster
bird cat dog deer horse bird dog ship
ship ship plane dog bird dog bird bird
Figure 1: Adversarial samples generated by our attack (first 4 image
pairs from the left correspond to SPICE and the others to RUC).
We utilize Generative Adversarial Net-
works (GANs) [
16
] for our attack, in-
spired by previous approaches (Adv-
GAN [
17
], AdvGAN++ [
18
], etc) for
supervised learning. We also utilize
a number of defense approaches (es-
sentially deep learning based anomaly
detection [
19
] and state-of-the-art "ro-
bust" deep clustering models [
9
]) to
determine if our adversarial samples
can be mitigated by an informed de-
fender. One of the major findings of our work is that these approaches are unable to prevent our
adversarial attacks. Through our work, we seek to promote the development of better defenses for
adversarial attacks against deep clustering.
Finally, to truly showcase how powerful our attacks can be, we attack a production-level ML-as-a-
Service (MLaaS) API that performs a clustering task (and possibly utilizes deep clustering models in
the backend). We find that our attack can also significantly reduce the functioning of such MLaaS
clustering API services. To summarize, we make the following contributions:
•
We propose the first blackbox adversarial attack against deep clustering models. We show that our
attacks can significantly reduce the performance of these models while requiring a minimal number
of queries. We also undertake a transferability analysis to demonstrate the magnitude of our attack.
•
We undertake a thorough experimental analysis of most state-of-the-art (SOTA) deep clustering
models, on a number of real-world datasets, such as CIFAR-10 [
20
], CIFAR-100 [
21
], and STL-10
[22] which shows that our attack is applicable across a number of models and datasets.
•
We show that existing (unsupervised) defense approaches (such as anomaly detection and robustness
via adversarial retraining) cannot thwart our attack samples, thus prompting the need for better defense
approaches for adversarial attacks against deep clustering models.
•
We also attack a production-level MLaaS clustering API to showcase the extent of our attack. We
find that our attack is highly successful for this real-world task, underscoring the need for making
deep clustering models truly robust.
Figure 1 shows some of the adversarial samples generated by our attack for the SPICE [
8
] and RUC
[
9
] deep clustering models on the STL-10 dataset, and their corresponding pre-attack and post-attack
predicted cluster/class labels
4
. To the human eye, these samples appear indistinguishable from each
2
These attacks cannot be used for deep clustering because: 1) they employ computationally intensive
optimizers that use exhaustive search and hence, make the attack too expensive for high-dimensional data, 2) they
are designed for traditional clustering and are training-time attacks, but, deep clustering models are deployed
frozen for inference, requiring a test-time attack.
3
For example, for RUC [
9
], it is implicitly assumed that the adversary has knowledge of the dataset as well
as ground truth labels, and will attack using supervised whitebox attacks.
4Class labels are inferred by taking the majority from the ground truth labels for samples in that cluster.
2