Fast OT for Latent Domain Adaptation Siddharth Roheda Ashkan Panahiy Hamid Krim Electrical and Computer Engineering Department North Carolina State University

2025-04-27 1 0 5.93MB 6 页 10玖币

侵权投诉

Fast OT for Latent Domain Adaptation

Siddharth Roheda?, Ashkan Panahi†, Hamid Krim?

?Electrical and Computer Engineering Department, North Carolina State University

{sroheda, ahk}@ncsu.edu

†Dept. of Computer Science and Engineering, Chalmers University

ashkan.panahi@chalmers.se

Abstract—In this paper, we address the problem of unsuper-

vised Domain Adaptation. The need for such an adaptation

arises when the distribution of the target data differs from

that which is used to develop the model and the ground truth

information of the target data is unknown. We propose an

algorithm that uses optimal transport theory with a veriﬁably

efﬁcient and implementable solution to learn the best latent

feature representation. This is achieved by minimizing the cost

of transporting the samples from the target domain to the

distribution of the source domain.

Index Terms—Optimal Transport, Unsupervised Domain

Adaptation

I. INTRODUCTION

Adapting a classiﬁer trained on a source domain to rec-

ognize instances from a new target domain is an important

problem of increasing research interest [?], [1], [2]. Difﬁculties

often arise in practice, as is the case when the data is different

from that which is used to train a model. Speciﬁcally, consider

an inference problem where a model is learned using a certain

source domain Xswith the corresponding labels Ysand is

used to classify samples from the target domain Xtwith the

corresponding labels Yt. Domain adaptation is required when

P(Ys|Xs)≈P(Yt|Xt), but P(Xs)is signiﬁcantly different

from P(Xt).

Such a shift in data distribution is seen and addressed in

almost every ﬁeld ranging from Natural Language Processing

(NLP) to Object Recognition. Given labeled samples from

a source domain, there are two groups that any Domain

Adaptation (DA) approach can be classiﬁed into, i) semi-

supervised DA: some samples in the target domain are labeled

or ii) unsupervised DA: none of the samples in the target

domain are labeled.

Several works [4]–[6] have demonstrated the effects of

the divergence between the probability distributions of do-

mains.These works have led to solutions of transforming the

data from the target domain so as to make the associated

distribution as close as possible to that of the source domain.

This allows the application of the classiﬁer trained on the

source domain to classify data from the target domain post

transformation. In [12] an approach for multi-source domain

adaptation was proposed to transfer knowledge learned from

multiple labeled sources to a target domain by aligning mo-

ments of their feature distributions, while [13] uses a GAN

to learn the transformation from the target domain to source

Identify applicable funding agency here. If none, delete this.

domain. In [14], [15], the authors simply align the second

order statistics of the source and target domains.

Contributions: In this paper, we address the problem of

unsupervised DA. We build on the existing works having led

to various techniques including recent generative adversarial

networks [7], to rather propose Optimal Transport for some

of its advantages as a viable path to adapt the model toward

classifying the target domain data. We ﬁrst seek the latent

representations of source and target domains to subsequently

minimize the optimal transport cost. These representations

for the source and target can be classiﬁed using a common

classiﬁer trained on the source data. Furthermore, we also

demonstrate that it is also crucial to ensure optimal perfor-

mance that P(ˆ

Ys|Xs)≈P(ˆ

Yt|Xt), where ˆ

Ysand ˆ

Ytare the

predictions made by the classiﬁer on the source and target

domain respectively.

II. RELATED WORK

A. Generative modeling

The Generative Adversarial Network was ﬁrst introduced by

Goodfellow et al. [7] in 2014. In this framework, a generative

model is pitted against an adversary: the discriminator. The

generator aims to deceive the discriminator by synthesizing

realistic samples from some underlying distribution. The dis-

criminator on the other hand, attempts to discriminate between

a real data sample and that from the generator. Both models are

approximated by neural networks. When trained alternatively,

the generator learns to produce random samples from the data

distribution which are very close to the real data samples.

Following this, Conditional Generative Adversarial Networks

(CGANs) were proposed in [8]. These networks were trained

to generate realistic samples from a class conditional distribu-

tion, by replacing the random noise input to the generator by

some useful information. As a result, the generator now aims

to generate realistic data samples, when given the conditional

information. CGANs have been used to generate random faces

when given facial attributes [9] as well as to produce relevant

images given text descriptions [10].

Many works have recently attempted to use GANs for

performing domain adaptation. In [13] the authors use the

generator to learn the features for classiﬁcation and the

discriminator to differentiate between the source and target

domain features produced by the generator. Figure 1 depicts

the block diagram for this approach. In [16] a cyclic GAN was

used to perform image translation between unpaired images.

arXiv:2210.00479v1 [cs.LG] 2 Oct 2022

Fig. 1: Adversarial Adaptation

In [17] a cyclic GAN was implemented to adapt semantic

segmentation of street images from GTA5 to CityScapes data.

B. Optimal Transport

Optimal Transport [11] is a pointwise comparative analytical

tool that provides a distance measure between two probability

distributions. The distance measure is based on a cost c(·,·)

which is imputed to transporting a source distribution to a

target distribution. Formally, given two densities µsand µton

two measureable spaces Xsand Xt, the Kantorovich-Monge

relaxation/formulation1of the optimal transport problem en-

tails ﬁnding a transport plan which is a probabilistic coupling

γ?deﬁned over Xs× Xtsuch that,

arg min

γ∈ΓZXs×Xt

c(xs, xt)dγ(xs, xt),(1)

where c:Xs× Xt→[0,+∞]and c(x, y)denotes the cost

of transporting a unit of mass from xto y.γ?(x, y)denotes

the coupling that provides the minimum Ex,y˜γ?[c(x,y)].

In most practical applications, one has access only to the

samples of the distribution where discrete measures µs=

PNs

i=1 psiδxsiand µt=PNt

i=1 ptiδxti, where δxsi,psiand

δxti,ptidenote the Dirac function and the probability mass

at xsi∈ Xsand xti∈ Xt, respectively. The optimal

transport plan under the discrete case is the solution to a linear

programming problem which is deﬁned as follows,

γ?=arg min

γ∈Γ<C, γ >=arg min

γ∈Γ

i=1

j=1

γij Cij ,(2)

where C≥0is the cost matrix with Cij =||(xsi−xti)||2

and

Γ = {γ∈RNs×Nt

+|γ1Ns=µs, γT1Nt=µt}.(3)

is the set of probabilistic coupling matrices and 1·is a vector

of ones of appropriate dimension.

III. PROBLEM FORMULATION

Consider data from a source domain, Xs={xsi}i=1,...N

with a corresponding set of labels Ys== {ysi}i=1,...N , where

1We refer the reader to the vast literature retracing the reformulation

Monge’s original problem, and a very readable resource is the manuscript

by Cuturi and Peyre [?]

N is the total number of samples in the dataset. Let gs:Xs→

Lsbe a function that transforms the data into a latent feature

space, Ls=gs(Xs). Following this, a classiﬁer function f(.)

is used to assign a labels to the data samples, ˆ

Ys=f(Ls) =

f(gs(Xs)). If the classiﬁer is well trained, ˆ

Ys≈Ys.

Now, consider target domain data Xtfor which the ground

truth labels are unavailable. One may consider using the

classiﬁer trained on Xsto classify the data Xtif similar classes

as in the source domain are of interest. Such a procedure would

yield optimal performance if and only if the distributions of

Xsand Xtare the same. This usually fails to be the case

in practical applications, and hence resulting in sub optimal

classiﬁcation performance.

In order to mitigate this problem, Domain Adaptation

(DA) is required. Note that our goal here is to take on the

classiﬁcation problem where labels for the target distribution

are completely unknown, and hence to learn the function

gt:Xt→Ltsuch that Yt=f(gt(Xt)) leads to optimal

classiﬁcation performance in the absence of any information

about the target domain.

IV. PROPOSED APPROACH

As noted in the previous section, the inference model must

be optimal for the source domain. In order to ensure this,

we propose to learn the functions gs(.)and f(.)so that they

minimize the cross entropy loss between the ground truth

labels, Ysand those predicted by the model, ˆ

Ys=f(gs(Xs)),

min

f,gs

CLoss(Ys, f(gs(Xs))),(4)

where, CLoss(Ys, f(gs(Xs))) = PN

i=1 −ysilog f(gs(xsi)).

A. Learning the Optimal Latent Space

We ﬁrst aim to learn the optimal latent spaces Lsand Lt

such that the same classiﬁer be used for both the source and

target by minimizing the cost of transporting the samples in

the latent space of source domain to that of the target domain.

This leads to learning of latent spaces Ls=gs(Xs)and Lt=

gt(Xt)with minimum discrepancy. The function that must be

optimized is given as,

min

f,gs,gt

CLoss(Ys,ˆ

Ys) + λ1TLoss(Ls, Lt),(5)

where, TLoss(Ls, Lt) = Pi,j γ?

ij Cij .γ?is the optimal trans-

port mapping for going from Ltito Lsj, and Cij is the cor-

responding cost. The determination of γ?is further discussed

in Section IV-B. λ1in Equation 5 is a hyperparameter which

controls the importance of the second term with respect to the

ﬁrst one.

To ensure an optimal adaptation of the source domain

classiﬁer to that of the target domain, we proceed to min-

imize the cost for transporting between Lsand Lt, while

safeguarding the invariance of the predictive power of the

source domain classiﬁer f(·)when applied to target domain,

i.e. P(Ys|Ls)≈P(Yt|Lt). To best integrate this constraint, we

opt to include the classiﬁcation cost in the loss to be optimized

as a nonlinear transformation of the latent representation

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FastOTforLatentDomainAdaptationSiddharthRoheda?,AshkanPanahiy,HamidKrim??ElectricalandComputerEngineeringDepartment,NorthCarolinaStateUniversityfsroheda,ahkg@ncsu.eduyDept.ofComputerScienceandEngineering,ChalmersUniversityashkan.panahi@chalmers.seAbstractInthispaper,weaddresstheproblemofunsuper-vis...

展开>> 收起<<

Fast OT for Latent Domain Adaptation Siddharth Roheda Ashkan Panahiy Hamid Krim Electrical and Computer Engineering Department North Carolina State University.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Fast OT for Latent Domain Adaptation Siddharth Roheda Ashkan Panahiy Hamid Krim Electrical and Computer Engineering Department North Carolina State University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: