2
the overall methodology. Next we explain the datasets and
model details in Section IV. Section V presents the results
and Section VI presents further analysis of various model pa-
rameters’ effect on performance. Finally, Section VII discusses
limitations of our work and possible extensions, and concludes
the paper.
II. RELATED WORK
A. Behavioural Biometrics
There is a vast body of work proposing various behavioural
biometric modalities. Early work involved using typing pat-
terns and touch gestures [3], [14], [15] while later modal-
ities leveraged human physiology [5], [12], [7], [8], [16],
[17]. The authentication solutions generally involve building
machine learning classifiers or signature similarity-based ap-
proaches [9]. More recent works use deep learning methods,
given their broader success in other domains [18].
Other works in behavioural biometrics aimed to increase
the training efficiency with class incremental learning [19]
or improved label efficiency using few-shot learning [20] and
transfer learning [21]. Similar efforts were also made in human
activity recognition [22], [23].
In contrast, we propose to improve the label efficiency by us-
ing non-contrastive self-supervised learning. Non-contrastive
learning leverages large volumes of unlabelled data to build
label-efficient classifiers. To the best of our knowledge, our
work is the first to use non-contrastive learning for be-
havioural biometrics.
B. Self-supervised Learning (SSL)
Self-supervised learning (SSL) refers to a broader family
of methods in which a model learns representations from
unlabelled data using pretext tasks. The pretext task acts as
a feature extractor for supervised learning tasks reducing the
labelled data requirement. For example, in computer vision,
a pretext task learning may train a model to predict whether
an image is an original or an augmentation. In this way, the
model learns the distinguishing features of the original image.
The pretext model is then fine-tuned for a downstream task in
a supervised setting with labelled data. Jing et al. [24] provide
a survey of SSL methods.
Early work closely resembling modern SSL includes Brom-
ley et al. [25], where the authors proposed the ”Siamese”
neural network architecture for signature verification. How-
ever, due to excessive resource requirements, SSL didn’t
receive much attention until their success in natural language
processing. In 2013, Mikolov et al. [26] used self-supervised
learning to introduce word2vec, which paved the way to
powerful generative language models such as BERT [27],
RoBERTa [28] and XLM-R [29].
Nonetheless, neither generative methods [30], [31], [32]
nor discriminative approaches [33], [34], [35], [36] were
successful in other domains such as computer vision due
to high computational complexity [37]. In contrast, Siamese
networks-based comparative methods have shown promising
results in computer vision [37], [38], [39], [13].
The basic form of Siamese networks consists of two iden-
tical neural networks which take two views of the same input
(i.e., a positive pair) and outputs embeddings that have a
low energy (or high similarity) between them. To increase
the similarity of the two views, the networks learn spatial or
temporal transformation invariant embeddings. Despite many
successful applications of Siamese Networks, collapsing net-
works (where the network converges to a trivial solution) limit
their performance.
To overcome these limitations, contrastive learning methods
[37], [40], [41], [42], [43] used negatives to avoid collapsing
by not only pulling positives towards each other but also by
pushing apart negatives in the embedding space. An example
is the SimCLR model [37]. However, contrastive learning
requires large batch sizes [37], [43], support sets [41], or
memory queues [42], [44], [40].
As a result, non-contrastive learning methods, and in partic-
ular the SimSiam model [13], emerged as a viable alternative.
Non-contrastive learning generally involves clustering [39],
[45], momentum encoders [38], and using a cross-correlation
matrix between the outputs of two identical networks as the
objective function [46], to address collapsing networks. These
methods avoid the use of negatives to overcome the limitation
of contrastive learning whereby two positive pair samples
can get pushed apart in the embedding space, consequently
becoming a negative pair and harming the performance of the
end task [47]. However, the SimSiam [13] outperforms other
non-contrastive approaches without using complex training
approaches such as momentum encoders. It emphasises the
importance of stop-gradient to present an efficient and a simple
solution to the collapsing networks problem.
C. SSL in Sensing and Behavioural Biometrics
While SSL has majorly contributed to natural language pro-
cessing, computer vision, and speech processing, its feasibility
has been explored in sensing and mobile computing [48].
Saeed et al. [23] introduced self-supervised learning for time-
series sensor data by introducing augmentations that are com-
patible with time-series data. The authors used a multi-task
SSL model to reduce the labelled training data requirement in
Human Activity Recognition (HAR). Using ten labelled sam-
ples per class, the authors achieved approximately 88.8% of
the highest score reached by conventional supervised learning.
SimCLR and several other contrastive and non-contrastive SSL
methods also have been assessed on HAR problems [49], [50].
Others such as Wright and Stewart [51] and Miller et al. [10]
explored the use of traditional Siamese networks to reduce the
training data requirement of behavioural biometrics-based user
authentication.
In contrast to these works, to the best of our understanding,
we are the first to propose SimSiam [13]-based non-contrastive
learning for behavioural biometrics to reduce the labelled
data requirement. Our method neither uses negatives nor
requires complex training approaches such as momentum
encoders to avoid collapsing. We compare our approach with
baselines including traditional supervised learning, transfer
learning, data augmentation, and state-of-the-art multi-task