IT-RUDA Information Theory Assisted Robust Unsupervised Domain Adaptation Shima Rashidi1 Ruwan Tennakoon1 Aref Miri Rekavandi2 Papangkorn Jessadatavornwong1 Amanda Freis3 Garret Huff3 Mark Easton1 Adrian Mouritz1

2025-05-05 0 0 1.1MB 9 页 10玖币
侵权投诉
IT-RUDA: Information Theory Assisted Robust Unsupervised Domain Adaptation
Shima Rashidi1, Ruwan Tennakoon1, Aref Miri Rekavandi2,
Papangkorn Jessadatavornwong1, Amanda Freis3, Garret Huff3, Mark Easton1, Adrian Mouritz1,
Reza Hoseinnezhad1, Alireza Bab-Hadiashar1,
1RMIT University, Melbourne, Australia
2The University of Melbourne, Melbourne, Australia
3Ford Motor Company, USA
Abstract
Distribution shift between train (source) and test (target)
datasets is a common problem encountered in machine learn-
ing applications. One approach to resolve this issue is to
use the Unsupervised Domain Adaptation (UDA) technique
that carries out knowledge transfer from a label-rich source
domain to an unlabeled target domain. Outliers that exist
in either source or target datasets can introduce additional
challenges when using UDA in practice. In this paper, α-
divergence is used as a measure to minimize the discrep-
ancy between the source and target distributions while in-
heriting robustness, adjustable with a single parameter α, as
the prominent feature of this measure. Here, it is shown that
the other well-known divergence-based UDA techniques can
be derived as special cases of the proposed method. Further-
more, a theoretical upper bound is derived for the loss in
the target domain in terms of the source loss and the ini-
tial α-divergence between the two domains. The robustness
of the proposed method is validated through testing on sev-
eral benchmarked datasets in open-set and partial UDA se-
tups where extra classes existing in target and source datasets
are considered as outliers.
Introduction
There is increasing interest in the idea of domain adaptation
as it provides a solution for real-world problems where the
training and test data do not necessarily have the same dis-
tributions (Wang and Deng 2018; Wilson and Cook 2020).
In particular, closed-set unsupervised domain adaptation
(UDA) tackles the machine learning problem where the la-
beled training (called source) and unlabeled test (called tar-
get) datasets are sampled from the same classes but shifted
domains (e.g. synthetic vs real-world images or painting vs
photographs). Such a domain shift contradicts the machine
learning assumption that the marginal distributions of source
and target domains are aligned (Ben-David et al. 2010). As
a result, the accuracy of a model solely trained on the source
dataset often drops significantly when tested on the target
dataset. This problem has received considerable attention in
recent years (Long et al. 2018; Ma, Zhang, and Xu 2019;
Nguyen et al. 2022; Shen et al. 2018).
The problem of unsupervised domain adaptation gets
more complicated if outliers exist in either the target or
Copyright © 2023, All rights reserved.
source domains. The outliers can negatively affect the per-
formance of the trained model due to the closed-set assump-
tion of machine learning solutions, especially deep learn-
ing models. These types of problems are usually addressed
under the umbrella of open-set domain adaptation, called
OSDA, (Panareda Busto and Gall 2017) (outliers existing
in the target domain as extra classes private to that domain)
and partial domain adaptation, called PDA, (Cao et al. 2018)
(outliers existing in the source domain as extra classes pri-
vate to that domain) in the literature. Many domain adap-
tation solutions provide complicated algorithms for reject-
ing unknown target samples (Baktashmotlagh et al. 2018;
Feng et al. 2019; Gao et al. 2020; Saito et al. 2018) or arti-
ficially generating them in the source domain to match the
two domains (Baktashmotlagh, Chen, and Salzmann 2022).
A simpler solution, which is somewhat overlooked, is to
treat the unknown samples as outliers and apply a robust
domain-adaptation method (one example of robust UDA can
be found in (Balaji, Chellappa, and Feizi 2020)). The need
for a robust method is to mitigate the negative effect of the
outliers (private classes) on the domain adaptation process
and enable the model to operate on the feature representa-
tions of the shared classes unhindered.
In this paper, a robust domain adaptation method us-
ing a general parametric measure from information theory,
namely α-divergence, is proposed to align the marginal dis-
tributions of source and target representation while ignor-
ing private classes (treating them as outliers). Unlike ex-
isting methods, which often need a separate network or
complicated architectures with some constraints like the 1-
Lipschitz constraint on the weights Gradients (Balaji, Chel-
lappa, and Feizi 2020), our method is simple and can directly
estimate the dissimilarity between the two distributions. The
benefits of using α-divergence are i) The chosen divergence
is a general form of several well-known measures such as
KL and Reverse KL divergences, tunable via a single pa-
rameter α. This feature enables one to take advantage of
desirable divergence characteristics (like robustness to out-
liers) by choosing the hyper-parameter α. ii) It is shown that
the proposed loss function is bounded in the target domain
in proportion with a function of αdivergence of the tar-
get and source distributions. In case of perfect alignment of
these two distributions, loss (in this paper classification loss)
of target and source will be equal, meaning that the network
arXiv:2210.12947v1 [cs.LG] 24 Oct 2022
is adapted to the target domain. iii) In comparison to previ-
ous domain adaptation models, which are mostly limited by
running an iteratively trainable separate network to calculate
the dissimilarity between source and target samples, the α-
divergence can be calculated without any additional network
or a minimax objective. This leads to a theoretical and effi-
cient metric for the alignment of the two distributions. This
is performed by feeding the samples into Gaussian Mixture
Models (GMMs) obtained by putting multivariate Gaussian
kernels around feature representations of the two domains;
i.e. we use the feature embeddings from the encoder as the
means of the Gaussians with ones as the variances. With
the taken approach, the GMMs are estimated using the neu-
ral network directly and separate training of GMMs is no
needed. The proposed method is tested on three benchmark
datasets: Office31 (Saenko et al. 2010), VisDA17 (Peng et al.
2017) and Office-Home (Venkateswara et al. 2017). The re-
sults show that the proposed method outperforms the State
Of The Art (SOTA).
Literature review
Closed-set unsupervised domain adaptation is a well-studied
topic in computer vision literature. There are two main
streams of work in the literature for addressing the above
problem by using deep neural networks, i) Using adversar-
ial networks where a classifier tries to discriminate the target
and source samples while a feature-extractor attempts to fool
it. As the result, the model finds a representation of the in-
put samples which is indifferent to source and target samples
(Long et al. 2018; Ma, Zhang, and Xu 2019). ii) Minimiz-
ing the distance or difference of source and target features
in the feature space by using distance metrics in the loss
function (Nguyen et al. 2022; Balaji, Chellappa, and Feizi
2020). However, real-world machine learning problems are
not always closed-set and unseen classes might exist in ei-
ther source or target domain. Such problems are addressed
as open-set and partial domain adaptation in the literature.
Open-set domain adaption refers to a situation where the
target have unknown samples with different classes than the
ones shared with source domain; classified as the class “un-
known”. The concept of open-set models was first presented
in (Jain, Scheirer, and Boult 2014) where Jain et. al. modi-
fied the SVMs to reject the samples from unknown classes
based on a probability threshold. Another stream of works
proposed various methods or metrics to separate the un-
known classes from known (Panareda Busto and Gall 2017;
Baktashmotlagh et al. 2018; Feng et al. 2019; Gao et al.
2020; Saito et al. 2018; Bucci, Loghmani, and Tommasi
2020). This problem has been approached in multiple ways
(Liu et al. 2021; Fang et al. 2020; Pan et al. 2020; Baktash-
motlagh, Chen, and Salzmann 2022). DAOD (distribution
alignment with open difference) (Fang et al. 2020), consid-
ers the risk of the classifier on unknown classes and tries
to regularize it while aligning the distributions. SE-cc (Pan
et al. 2020) applies clustering on the target domain to ob-
tain domain-specific visual cues as additional guidance for
the open-set domain adaptation. In (Baktashmotlagh, Chen,
and Salzmann 2022), the authors tried a different approach
where they complemented the source domain via regenerat-
ing unknown classes for the source dataset in order to re-
semble the two datasets.
Partial domain adaptation (PDA) refers to the domain
adaptation problem where the source domain contains ex-
tra classes which are private to it (Cao et al. 2018). It was
first introduced in (Cao et al. 2018) where the authors used
an adversarial network to down-weight the outlier source
classes while matching the representations of two domains.
Later, example transfer network (ETN) (Cao et al. 2019)
was proposed where a transferability weight is assigned to
source samples to reduce their negative transfer effect. In
deep residual correction network (DRCN) (Li et al. 2020), a
weight-based method is devised to align the target domain
with the most relevant source subclasses. BA3US (Liang
et al. 2020) mitigates the imbalance between target and
source classes by gradually adding samples from the source
to the target dataset. Adaptive graph adversarial network
(AGAN) (Kim and Hong 2021) uses an adaptive feature
propagation technique to utilize the inter- and intra-domain
structure and computes the commonness of each sample to
be used in the adaptation process.
It should be noted that although effective, the introduced
models mostly suffer from complicated architectures and
constraints applied to the optimization process. Here, it is
proposed that OSDA and PDA setups can benefit from a
robust method which can effectively mitigate the negative
transfer effect of unseen classes in either target or source by
treating them as outliers. Although interesting, the stream of
robust domain adaptation is not pursued in the literature suf-
ficiently. As discussed before, distance-based methods are
commonly used to align the distributions of source and tar-
get for the purpose of domain adaptation. Kullback-Leibler
divergence (Nguyen et al. 2022) and Wasserstein measure
(Shen et al. 2018) have been previously used for this task.
Despite their promising results in closed-set domain adap-
tation scenarios, both measures are sensitive to the influ-
ence of outliers. There have been attempts to improve the
robustness of the above measures at the cost of adding over-
head and increasing the computational cost of training a
model (Balaji, Chellappa, and Feizi 2020). Here, it is pro-
posed to use a more general parametric family of measures
called α-divergence, which can be tuned by a single param-
eter αto mitigate the effect of outliers (Cichocki and Amari
2010). The benefits of this divergence have been shown in
several studies related to robust principal component anal-
ysis (Rekavandi and Seghouane 2020), robust image pro-
cessing (Rekavandi, Seghouane, and Evans 2021; Iqbal and
Seghouane 2019) and robust signal processing (Seghouane
and Ferrari 2019; Rekavandi, Seghouane, and Evans 2020).
To the best of authors’ knowledge, the current study is the
first attempt to use the α-divergence as a robust measure in
deep learning based domain adaptation.
Background in α-divergence
The α-divergence between two distribution functions, p(z)
and q(z), is defined as (Cichocki and Amari 2010):
Dα(p(z)kq(z)) =
摘要:

IT-RUDA:InformationTheoryAssistedRobustUnsupervisedDomainAdaptationShimaRashidi1,RuwanTennakoon1,ArefMiriRekavandi2,PapangkornJessadatavornwong1,AmandaFreis3,GarretHuff3,MarkEaston1,AdrianMouritz1,RezaHoseinnezhad1,AlirezaBab-Hadiashar1,1RMITUniversity,Melbourne,Australia2TheUniversityofMelbourne,Me...

展开>> 收起<<
IT-RUDA Information Theory Assisted Robust Unsupervised Domain Adaptation Shima Rashidi1 Ruwan Tennakoon1 Aref Miri Rekavandi2 Papangkorn Jessadatavornwong1 Amanda Freis3 Garret Huff3 Mark Easton1 Adrian Mouritz1.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:9 页 大小:1.1MB 格式:PDF 时间:2025-05-05

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注