
1
Self-supervised Learning for Clustering of Wireless
Spectrum Activity
Ljupcho Milosheski∗†, Gregor Cerar∗, Blaˇ
z Bertalaniˇ
c∗, Carolina Fortuna∗, Mihael Mohorˇ
ciˇ
c∗†
∗Department of Communication Technologies, Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
†Jozef Stefan International Postgraduate School, Jamova 39, 1000, Ljubljana, Slovenia
Email: {ljupcho.milosheski, miha.mohorcic, carolina.fortuna}@ijs.si
Abstract—In recent years, much work has been done on
processing of wireless spectrum data involving machine learn-
ing techniques in domain-related problems for cognitive radio
networks, such as anomaly detection, modulation classification,
technology classification and device fingerprinting. Most of the
solutions are based on labeled data, created in a controlled
manner and processed with supervised learning approaches.
However, spectrum data measured in real-world environment
is highly nondeterministic, making its labeling a laborious and
expensive process, requiring domain expertise, thus being one
of the main drawbacks of using supervised learning approaches
in this domain. In this paper, we investigate the utilization of
self-supervised learning (SSL) for exploring spectrum activities
in a real-world unlabeled data. In particular, we assess the
performance of SSL models, based on the reference DeepCluster
architecture. We carefully consider the current state-of-the-art
feature extractors, taking into account the performance and
complexity trade-offs. Our findings demonstrate that SSL models
achieve superior performance regarding the feature quality and
clustering performance compared to baseline feature learning
approaches. With SSL models we achieve significant reduction
of the feature vectors size by two orders of magnitude, while
improving the performance by a factor ranging from 2 to 2.5
across the evaluation metrics, supported by visual assessment.
Furthermore, we showcase how adapting the reference SSL
architecture to domain-specific data is followed by a substantial
reduction in model complexity up to one order of magnitude,
without compromising, and in some cases, even improving the
clustering performance.
Index Terms—spectrum analysis clustering self-supervised ma-
chine learning
I. INTRODUCTION
The number and type of wireless devices connected to
the Internet is rapidly increasing with the current affordable
personal mobile and Internet of Things (IoT) devices, requiring
wireless networks to handle large number of connections and
high traffic loads. As a reference, the requirement for the num-
ber of connected devices in the fifth-generation (5G) networks
is one million devices per square kilometer. The existence of
such a number of devices requires complex wireless resource
management. Over time, several new approaches to wireless
resource sharing, including dynamic spectrum access [1],
licensed shared access [2] have been proposed. However,
additional technological components, such as spectrum usage
databases [3] and radio environment maps [4], had to be
developed to enable such sophisticated and dynamic spectrum
usage approaches. To be able to correctly inform on spectrum
usage, additional knowledge of other devices operating within
the range of a wireless device is critical for future smart usage
of the spectrum. In this respect, some of the recent efforts
were focused on detecting the modulation used [5], technology
used [6], anomalous activities [7], etc.
As also discussed in [8] and [9], significant effort is still
being invested in the field to develop accurate and scalable
deep learning (DL) algorithms able to accurately and au-
tomatically manage spectrum resource usage. With respect
to the learning approach, these techniques can be divided
into (1) supervised that require labels to be present for the
training data, and (2) unsupervised that do not assume any
such labels. Applications in wireless spectrum management
need to be aware of operating details (i.e., type of technology,
transmission parameters, etc.). Development of a DL-based
model to support such application typically requires labelled
data that is expensive to acquire as it requires complex wireless
and computing equipment [10], [11] or intense labelling efforts
by domain experts [12] that do not always lead to high quality
labels due to the nondeterministic nature of wireless operating
environments. Semi-supervised and active-learning emerged
as alternative techniques that have the advantage of using
a relatively small amount of labeled samples for achieving
performance that is comparable to the regular supervised
approach.
Given the advent of large datasets which are expensive
or practically impossible to label, self-supervised learning
(SSL) [13], as another intermediate learning approach, is
becoming an important alternative that is particularly suitable
to reduce the data labelling cost and leverage the unlabelled
data pool. SSL is a representation learning method where a
supervised task is created out of the unlabelled data. Using
an SSL approach, it is possible to create very similar groups
(i.e. clusters) from a large, unlabelled dataset and then label
each cluster. By labelling the learnt clusters, it is possible to
then use the model as a classifier by assigning new, unseen
examples to those clusters and therefore label them as one
would do in a typical classification task.
Developing an easy to use, automated and technology
agnostic way to explore spectrum activities and group similar
activities, eventually enabling automatic rather than manual
transmission identification and cataloguing as currently done
for instance in the Signal Identification Guide1, is still an open
research topic which motivated this investigation.
1https://www.sigidwiki.com/wiki/Signal Identification Guide
arXiv:2210.02899v3 [cs.NI] 22 Aug 2024