
CONTI, ROTA, WANG, RICCI: CLUSTER-LEVEL PSEUDO-LABELLING 3
samples that are clustered within the feature space expressed by the source model. We validate
CluP on a set of cross-domain FER benchmarks and prove its advantageous performance in
terms of classification accuracy.
We summarise our contributions as follows:
•
We present CluP, the first method addressing Source-free Unsupervised Domain Adap-
tation for FER, exploiting SSL to foundation our target model.
•
CluP introduces a novel cluster-level pseudo-labelling scheme to improve the reliability
of pseudo-labels based on in-cluster attributes that deviates from traditional confidence-
based pseudo-labelling methods.
•
We demonstrate that CluP surpasses competing methods for SFUDA and is comparable
with UDA techniques on several FER adaptation benchmarks.
2 Related work
In the following, we present recent works on UDA methods for FER, and some general-
purpose SFUDA solutions.
Unsupervised Domain Adaptation for FER.
As a consequence of the domain bias, quite
prominent among FER datasets, some works focus on domain adaptation [
6
,
14
,
20
,
21
,
22
,
38
,
43
]. In [
14
], Ji et al. apply late fusion on the outputs of two channels that learn intra-
category and inter-category similarities of facial expressions. The authors of [
22
] introduce
a locality preserving loss that draws samples of the same class closer. They also notice
that neighbouring samples in the embedding space present similar emotional intensities.
DETN [
20
] applies two variations of the Maximum Mean Discrepancy to assess the amount
of divergence between the domains and to re-weight the class-wise source distribution to
match the target. The authors extend the work in [
21
], where they additionally consider the
differences in the conditional distributions. Differently from the above works, AGRA [
6
]
focuses on the well-established approach of adversarial domain adaptation, leveraging facial
landmarks alongside facial images. For the landmarks, they introduce two specialised graph
neural networks while jointly considering the domain feature distributions, the local features
(i.e. , landmarks), and the holistic features, achieving the best results on many benchmarks.
Compared to previous works, we consider a stricter setting where the source data is
unavailable. We argue that, due to privacy issues, human behaviour understanding methods
do not always have access to the source data. For this reason, we introduce a novel method
for FER that adapt to a target domain in a source-free manner.
Source-Free Unsupervised Domain Adaptation.
Recently, novel methods for source-free
domain adaptation have been proposed [
13
,
17
,
19
,
23
,
25
,
39
]. The setting represents a more
complex but realistic scenario of UDA, where source data is unavailable. Some works resort
to entropy-minimisation losses to adapt to the target domain without labels. For example,
SHOT [
23
] employs an entropy loss alongside a classification loss on pseudo-labelled samples
to adapt the network to the target domain. The work has been extended in [
24
] introducing
an auxiliary head that solves relative rotation, leading to improved performance. Differently
from the above, the authors of [
13
] frame the problem from an image translation perspective
and translate the target images to the source style using only the source model. In [
36
], they
perform self-training with a loss function that considers the intrinsic structure of the target
domain via nearest neighbours. In the proposed work, we do not impose any constraint on the
loss function, our refinement step works on source clusters, and we propose a novel score
function to select the best samples to train on the target domain. Other works address open-set