is adapted to the target domain. iii) In comparison to previ-
ous domain adaptation models, which are mostly limited by
running an iteratively trainable separate network to calculate
the dissimilarity between source and target samples, the α-
divergence can be calculated without any additional network
or a minimax objective. This leads to a theoretical and effi-
cient metric for the alignment of the two distributions. This
is performed by feeding the samples into Gaussian Mixture
Models (GMMs) obtained by putting multivariate Gaussian
kernels around feature representations of the two domains;
i.e. we use the feature embeddings from the encoder as the
means of the Gaussians with ones as the variances. With
the taken approach, the GMMs are estimated using the neu-
ral network directly and separate training of GMMs is no
needed. The proposed method is tested on three benchmark
datasets: Office31 (Saenko et al. 2010), VisDA17 (Peng et al.
2017) and Office-Home (Venkateswara et al. 2017). The re-
sults show that the proposed method outperforms the State
Of The Art (SOTA).
Literature review
Closed-set unsupervised domain adaptation is a well-studied
topic in computer vision literature. There are two main
streams of work in the literature for addressing the above
problem by using deep neural networks, i) Using adversar-
ial networks where a classifier tries to discriminate the target
and source samples while a feature-extractor attempts to fool
it. As the result, the model finds a representation of the in-
put samples which is indifferent to source and target samples
(Long et al. 2018; Ma, Zhang, and Xu 2019). ii) Minimiz-
ing the distance or difference of source and target features
in the feature space by using distance metrics in the loss
function (Nguyen et al. 2022; Balaji, Chellappa, and Feizi
2020). However, real-world machine learning problems are
not always closed-set and unseen classes might exist in ei-
ther source or target domain. Such problems are addressed
as open-set and partial domain adaptation in the literature.
Open-set domain adaption refers to a situation where the
target have unknown samples with different classes than the
ones shared with source domain; classified as the class “un-
known”. The concept of open-set models was first presented
in (Jain, Scheirer, and Boult 2014) where Jain et. al. modi-
fied the SVMs to reject the samples from unknown classes
based on a probability threshold. Another stream of works
proposed various methods or metrics to separate the un-
known classes from known (Panareda Busto and Gall 2017;
Baktashmotlagh et al. 2018; Feng et al. 2019; Gao et al.
2020; Saito et al. 2018; Bucci, Loghmani, and Tommasi
2020). This problem has been approached in multiple ways
(Liu et al. 2021; Fang et al. 2020; Pan et al. 2020; Baktash-
motlagh, Chen, and Salzmann 2022). DAOD (distribution
alignment with open difference) (Fang et al. 2020), consid-
ers the risk of the classifier on unknown classes and tries
to regularize it while aligning the distributions. SE-cc (Pan
et al. 2020) applies clustering on the target domain to ob-
tain domain-specific visual cues as additional guidance for
the open-set domain adaptation. In (Baktashmotlagh, Chen,
and Salzmann 2022), the authors tried a different approach
where they complemented the source domain via regenerat-
ing unknown classes for the source dataset in order to re-
semble the two datasets.
Partial domain adaptation (PDA) refers to the domain
adaptation problem where the source domain contains ex-
tra classes which are private to it (Cao et al. 2018). It was
first introduced in (Cao et al. 2018) where the authors used
an adversarial network to down-weight the outlier source
classes while matching the representations of two domains.
Later, example transfer network (ETN) (Cao et al. 2019)
was proposed where a transferability weight is assigned to
source samples to reduce their negative transfer effect. In
deep residual correction network (DRCN) (Li et al. 2020), a
weight-based method is devised to align the target domain
with the most relevant source subclasses. BA3US (Liang
et al. 2020) mitigates the imbalance between target and
source classes by gradually adding samples from the source
to the target dataset. Adaptive graph adversarial network
(AGAN) (Kim and Hong 2021) uses an adaptive feature
propagation technique to utilize the inter- and intra-domain
structure and computes the commonness of each sample to
be used in the adaptation process.
It should be noted that although effective, the introduced
models mostly suffer from complicated architectures and
constraints applied to the optimization process. Here, it is
proposed that OSDA and PDA setups can benefit from a
robust method which can effectively mitigate the negative
transfer effect of unseen classes in either target or source by
treating them as outliers. Although interesting, the stream of
robust domain adaptation is not pursued in the literature suf-
ficiently. As discussed before, distance-based methods are
commonly used to align the distributions of source and tar-
get for the purpose of domain adaptation. Kullback-Leibler
divergence (Nguyen et al. 2022) and Wasserstein measure
(Shen et al. 2018) have been previously used for this task.
Despite their promising results in closed-set domain adap-
tation scenarios, both measures are sensitive to the influ-
ence of outliers. There have been attempts to improve the
robustness of the above measures at the cost of adding over-
head and increasing the computational cost of training a
model (Balaji, Chellappa, and Feizi 2020). Here, it is pro-
posed to use a more general parametric family of measures
called α-divergence, which can be tuned by a single param-
eter αto mitigate the effect of outliers (Cichocki and Amari
2010). The benefits of this divergence have been shown in
several studies related to robust principal component anal-
ysis (Rekavandi and Seghouane 2020), robust image pro-
cessing (Rekavandi, Seghouane, and Evans 2021; Iqbal and
Seghouane 2019) and robust signal processing (Seghouane
and Ferrari 2019; Rekavandi, Seghouane, and Evans 2020).
To the best of authors’ knowledge, the current study is the
first attempt to use the α-divergence as a robust measure in
deep learning based domain adaptation.
Background in α-divergence
The α-divergence between two distribution functions, p(z)
and q(z), is defined as (Cichocki and Amari 2010):
Dα(p(z)kq(z)) =