IT-RUDA Information Theory Assisted Robust Unsupervised Domain Adaptation Shima Rashidi1 Ruwan Tennakoon1 Aref Miri Rekavandi2 Papangkorn Jessadatavornwong1 Amanda Freis3 Garret Huff3 Mark Easton1 Adrian Mouritz1

2025-05-05 0 0 1.1MB 9 页 10玖币

侵权投诉

IT-RUDA: Information Theory Assisted Robust Unsupervised Domain Adaptation

Shima Rashidi1, Ruwan Tennakoon1, Aref Miri Rekavandi2,

Papangkorn Jessadatavornwong1, Amanda Freis3, Garret Huff3, Mark Easton1, Adrian Mouritz1,

Reza Hoseinnezhad1, Alireza Bab-Hadiashar1,

1RMIT University, Melbourne, Australia

2The University of Melbourne, Melbourne, Australia

3Ford Motor Company, USA

Abstract

Distribution shift between train (source) and test (target)

datasets is a common problem encountered in machine learn-

ing applications. One approach to resolve this issue is to

use the Unsupervised Domain Adaptation (UDA) technique

that carries out knowledge transfer from a label-rich source

domain to an unlabeled target domain. Outliers that exist

in either source or target datasets can introduce additional

challenges when using UDA in practice. In this paper, α-

divergence is used as a measure to minimize the discrep-

ancy between the source and target distributions while in-

heriting robustness, adjustable with a single parameter α, as

the prominent feature of this measure. Here, it is shown that

the other well-known divergence-based UDA techniques can

be derived as special cases of the proposed method. Further-

more, a theoretical upper bound is derived for the loss in

the target domain in terms of the source loss and the ini-

tial α-divergence between the two domains. The robustness

of the proposed method is validated through testing on sev-

eral benchmarked datasets in open-set and partial UDA se-

tups where extra classes existing in target and source datasets

are considered as outliers.

Introduction

There is increasing interest in the idea of domain adaptation

as it provides a solution for real-world problems where the

training and test data do not necessarily have the same dis-

tributions (Wang and Deng 2018; Wilson and Cook 2020).

In particular, closed-set unsupervised domain adaptation

(UDA) tackles the machine learning problem where the la-

beled training (called source) and unlabeled test (called tar-

get) datasets are sampled from the same classes but shifted

domains (e.g. synthetic vs real-world images or painting vs

photographs). Such a domain shift contradicts the machine

learning assumption that the marginal distributions of source

and target domains are aligned (Ben-David et al. 2010). As

a result, the accuracy of a model solely trained on the source

dataset often drops signiﬁcantly when tested on the target

dataset. This problem has received considerable attention in

recent years (Long et al. 2018; Ma, Zhang, and Xu 2019;

Nguyen et al. 2022; Shen et al. 2018).

The problem of unsupervised domain adaptation gets

more complicated if outliers exist in either the target or

source domains. The outliers can negatively affect the per-

formance of the trained model due to the closed-set assump-

tion of machine learning solutions, especially deep learn-

ing models. These types of problems are usually addressed

under the umbrella of open-set domain adaptation, called

OSDA, (Panareda Busto and Gall 2017) (outliers existing

in the target domain as extra classes private to that domain)

and partial domain adaptation, called PDA, (Cao et al. 2018)

(outliers existing in the source domain as extra classes pri-

vate to that domain) in the literature. Many domain adap-

tation solutions provide complicated algorithms for reject-

ing unknown target samples (Baktashmotlagh et al. 2018;

Feng et al. 2019; Gao et al. 2020; Saito et al. 2018) or arti-

ﬁcially generating them in the source domain to match the

two domains (Baktashmotlagh, Chen, and Salzmann 2022).

A simpler solution, which is somewhat overlooked, is to

treat the unknown samples as outliers and apply a robust

domain-adaptation method (one example of robust UDA can

be found in (Balaji, Chellappa, and Feizi 2020)). The need

for a robust method is to mitigate the negative effect of the

outliers (private classes) on the domain adaptation process

and enable the model to operate on the feature representa-

tions of the shared classes unhindered.

In this paper, a robust domain adaptation method us-

ing a general parametric measure from information theory,

namely α-divergence, is proposed to align the marginal dis-

tributions of source and target representation while ignor-

ing private classes (treating them as outliers). Unlike ex-

isting methods, which often need a separate network or

complicated architectures with some constraints like the 1-

Lipschitz constraint on the weights Gradients (Balaji, Chel-

lappa, and Feizi 2020), our method is simple and can directly

estimate the dissimilarity between the two distributions. The

beneﬁts of using α-divergence are i) The chosen divergence

is a general form of several well-known measures such as

KL and Reverse KL divergences, tunable via a single pa-

rameter α. This feature enables one to take advantage of

desirable divergence characteristics (like robustness to out-

liers) by choosing the hyper-parameter α. ii) It is shown that

the proposed loss function is bounded in the target domain

in proportion with a function of α−divergence of the tar-

get and source distributions. In case of perfect alignment of

these two distributions, loss (in this paper classiﬁcation loss)

of target and source will be equal, meaning that the network

arXiv:2210.12947v1 [cs.LG] 24 Oct 2022

is adapted to the target domain. iii) In comparison to previ-

ous domain adaptation models, which are mostly limited by

running an iteratively trainable separate network to calculate

the dissimilarity between source and target samples, the α-

divergence can be calculated without any additional network

or a minimax objective. This leads to a theoretical and efﬁ-

cient metric for the alignment of the two distributions. This

is performed by feeding the samples into Gaussian Mixture

Models (GMMs) obtained by putting multivariate Gaussian

kernels around feature representations of the two domains;

i.e. we use the feature embeddings from the encoder as the

means of the Gaussians with ones as the variances. With

the taken approach, the GMMs are estimated using the neu-

ral network directly and separate training of GMMs is no

needed. The proposed method is tested on three benchmark

datasets: Ofﬁce31 (Saenko et al. 2010), VisDA17 (Peng et al.

2017) and Ofﬁce-Home (Venkateswara et al. 2017). The re-

sults show that the proposed method outperforms the State

Of The Art (SOTA).

Literature review

Closed-set unsupervised domain adaptation is a well-studied

topic in computer vision literature. There are two main

streams of work in the literature for addressing the above

problem by using deep neural networks, i) Using adversar-

ial networks where a classiﬁer tries to discriminate the target

and source samples while a feature-extractor attempts to fool

it. As the result, the model ﬁnds a representation of the in-

put samples which is indifferent to source and target samples

(Long et al. 2018; Ma, Zhang, and Xu 2019). ii) Minimiz-

ing the distance or difference of source and target features

in the feature space by using distance metrics in the loss

function (Nguyen et al. 2022; Balaji, Chellappa, and Feizi

2020). However, real-world machine learning problems are

not always closed-set and unseen classes might exist in ei-

ther source or target domain. Such problems are addressed

as open-set and partial domain adaptation in the literature.

Open-set domain adaption refers to a situation where the

target have unknown samples with different classes than the

ones shared with source domain; classiﬁed as the class “un-

known”. The concept of open-set models was ﬁrst presented

in (Jain, Scheirer, and Boult 2014) where Jain et. al. modi-

ﬁed the SVMs to reject the samples from unknown classes

based on a probability threshold. Another stream of works

proposed various methods or metrics to separate the un-

known classes from known (Panareda Busto and Gall 2017;

Baktashmotlagh et al. 2018; Feng et al. 2019; Gao et al.

2020; Saito et al. 2018; Bucci, Loghmani, and Tommasi

2020). This problem has been approached in multiple ways

(Liu et al. 2021; Fang et al. 2020; Pan et al. 2020; Baktash-

motlagh, Chen, and Salzmann 2022). DAOD (distribution

alignment with open difference) (Fang et al. 2020), consid-

ers the risk of the classiﬁer on unknown classes and tries

to regularize it while aligning the distributions. SE-cc (Pan

et al. 2020) applies clustering on the target domain to ob-

tain domain-speciﬁc visual cues as additional guidance for

the open-set domain adaptation. In (Baktashmotlagh, Chen,

and Salzmann 2022), the authors tried a different approach

where they complemented the source domain via regenerat-

ing unknown classes for the source dataset in order to re-

semble the two datasets.

Partial domain adaptation (PDA) refers to the domain

adaptation problem where the source domain contains ex-

tra classes which are private to it (Cao et al. 2018). It was

ﬁrst introduced in (Cao et al. 2018) where the authors used

an adversarial network to down-weight the outlier source

classes while matching the representations of two domains.

Later, example transfer network (ETN) (Cao et al. 2019)

was proposed where a transferability weight is assigned to

source samples to reduce their negative transfer effect. In

deep residual correction network (DRCN) (Li et al. 2020), a

weight-based method is devised to align the target domain

with the most relevant source subclasses. BA3US (Liang

et al. 2020) mitigates the imbalance between target and

source classes by gradually adding samples from the source

to the target dataset. Adaptive graph adversarial network

(AGAN) (Kim and Hong 2021) uses an adaptive feature

propagation technique to utilize the inter- and intra-domain

structure and computes the commonness of each sample to

be used in the adaptation process.

It should be noted that although effective, the introduced

models mostly suffer from complicated architectures and

constraints applied to the optimization process. Here, it is

proposed that OSDA and PDA setups can beneﬁt from a

robust method which can effectively mitigate the negative

transfer effect of unseen classes in either target or source by

treating them as outliers. Although interesting, the stream of

robust domain adaptation is not pursued in the literature suf-

ﬁciently. As discussed before, distance-based methods are

commonly used to align the distributions of source and tar-

get for the purpose of domain adaptation. Kullback-Leibler

divergence (Nguyen et al. 2022) and Wasserstein measure

(Shen et al. 2018) have been previously used for this task.

Despite their promising results in closed-set domain adap-

tation scenarios, both measures are sensitive to the inﬂu-

ence of outliers. There have been attempts to improve the

robustness of the above measures at the cost of adding over-

head and increasing the computational cost of training a

model (Balaji, Chellappa, and Feizi 2020). Here, it is pro-

posed to use a more general parametric family of measures

called α-divergence, which can be tuned by a single param-

eter αto mitigate the effect of outliers (Cichocki and Amari

2010). The beneﬁts of this divergence have been shown in

several studies related to robust principal component anal-

ysis (Rekavandi and Seghouane 2020), robust image pro-

cessing (Rekavandi, Seghouane, and Evans 2021; Iqbal and

Seghouane 2019) and robust signal processing (Seghouane

and Ferrari 2019; Rekavandi, Seghouane, and Evans 2020).

To the best of authors’ knowledge, the current study is the

ﬁrst attempt to use the α-divergence as a robust measure in

deep learning based domain adaptation.

Background in α-divergence

The α-divergence between two distribution functions, p(z)

and q(z), is deﬁned as (Cichocki and Amari 2010):

Dα(p(z)kq(z)) =

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

IT-RUDA:InformationTheoryAssistedRobustUnsupervisedDomainAdaptationShimaRashidi1,RuwanTennakoon1,ArefMiriRekavandi2,PapangkornJessadatavornwong1,AmandaFreis3,GarretHuff3,MarkEaston1,AdrianMouritz1,RezaHoseinnezhad1,AlirezaBab-Hadiashar1,1RMITUniversity,Melbourne,Australia2TheUniversityofMelbourne,Me...

展开>> 收起<<

IT-RUDA Information Theory Assisted Robust Unsupervised Domain Adaptation Shima Rashidi1 Ruwan Tennakoon1 Aref Miri Rekavandi2 Papangkorn Jessadatavornwong1 Amanda Freis3 Garret Huff3 Mark Easton1 Adrian Mouritz1.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

IT-RUDA Information Theory Assisted Robust Unsupervised Domain Adaptation Shima Rashidi1 Ruwan Tennakoon1 Aref Miri Rekavandi2 Papangkorn Jessadatavornwong1 Amanda Freis3 Garret Huff3 Mark Easton1 Adrian Mouritz1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: