Fast OT for Latent Domain Adaptation Siddharth Roheda Ashkan Panahiy Hamid Krim Electrical and Computer Engineering Department North Carolina State University

2025-04-27 0 0 5.93MB 6 页 10玖币
侵权投诉
Fast OT for Latent Domain Adaptation
Siddharth Roheda?, Ashkan Panahi, Hamid Krim?
?Electrical and Computer Engineering Department, North Carolina State University
{sroheda, ahk}@ncsu.edu
Dept. of Computer Science and Engineering, Chalmers University
ashkan.panahi@chalmers.se
Abstract—In this paper, we address the problem of unsuper-
vised Domain Adaptation. The need for such an adaptation
arises when the distribution of the target data differs from
that which is used to develop the model and the ground truth
information of the target data is unknown. We propose an
algorithm that uses optimal transport theory with a verifiably
efficient and implementable solution to learn the best latent
feature representation. This is achieved by minimizing the cost
of transporting the samples from the target domain to the
distribution of the source domain.
Index Terms—Optimal Transport, Unsupervised Domain
Adaptation
I. INTRODUCTION
Adapting a classifier trained on a source domain to rec-
ognize instances from a new target domain is an important
problem of increasing research interest [?], [1], [2]. Difficulties
often arise in practice, as is the case when the data is different
from that which is used to train a model. Specifically, consider
an inference problem where a model is learned using a certain
source domain Xswith the corresponding labels Ysand is
used to classify samples from the target domain Xtwith the
corresponding labels Yt. Domain adaptation is required when
P(Ys|Xs)P(Yt|Xt), but P(Xs)is significantly different
from P(Xt).
Such a shift in data distribution is seen and addressed in
almost every field ranging from Natural Language Processing
(NLP) to Object Recognition. Given labeled samples from
a source domain, there are two groups that any Domain
Adaptation (DA) approach can be classified into, i) semi-
supervised DA: some samples in the target domain are labeled
or ii) unsupervised DA: none of the samples in the target
domain are labeled.
Several works [4]–[6] have demonstrated the effects of
the divergence between the probability distributions of do-
mains.These works have led to solutions of transforming the
data from the target domain so as to make the associated
distribution as close as possible to that of the source domain.
This allows the application of the classifier trained on the
source domain to classify data from the target domain post
transformation. In [12] an approach for multi-source domain
adaptation was proposed to transfer knowledge learned from
multiple labeled sources to a target domain by aligning mo-
ments of their feature distributions, while [13] uses a GAN
to learn the transformation from the target domain to source
Identify applicable funding agency here. If none, delete this.
domain. In [14], [15], the authors simply align the second
order statistics of the source and target domains.
Contributions: In this paper, we address the problem of
unsupervised DA. We build on the existing works having led
to various techniques including recent generative adversarial
networks [7], to rather propose Optimal Transport for some
of its advantages as a viable path to adapt the model toward
classifying the target domain data. We first seek the latent
representations of source and target domains to subsequently
minimize the optimal transport cost. These representations
for the source and target can be classified using a common
classifier trained on the source data. Furthermore, we also
demonstrate that it is also crucial to ensure optimal perfor-
mance that P(ˆ
Ys|Xs)P(ˆ
Yt|Xt), where ˆ
Ysand ˆ
Ytare the
predictions made by the classifier on the source and target
domain respectively.
II. RELATED WORK
A. Generative modeling
The Generative Adversarial Network was first introduced by
Goodfellow et al. [7] in 2014. In this framework, a generative
model is pitted against an adversary: the discriminator. The
generator aims to deceive the discriminator by synthesizing
realistic samples from some underlying distribution. The dis-
criminator on the other hand, attempts to discriminate between
a real data sample and that from the generator. Both models are
approximated by neural networks. When trained alternatively,
the generator learns to produce random samples from the data
distribution which are very close to the real data samples.
Following this, Conditional Generative Adversarial Networks
(CGANs) were proposed in [8]. These networks were trained
to generate realistic samples from a class conditional distribu-
tion, by replacing the random noise input to the generator by
some useful information. As a result, the generator now aims
to generate realistic data samples, when given the conditional
information. CGANs have been used to generate random faces
when given facial attributes [9] as well as to produce relevant
images given text descriptions [10].
Many works have recently attempted to use GANs for
performing domain adaptation. In [13] the authors use the
generator to learn the features for classification and the
discriminator to differentiate between the source and target
domain features produced by the generator. Figure 1 depicts
the block diagram for this approach. In [16] a cyclic GAN was
used to perform image translation between unpaired images.
arXiv:2210.00479v1 [cs.LG] 2 Oct 2022
Fig. 1: Adversarial Adaptation
In [17] a cyclic GAN was implemented to adapt semantic
segmentation of street images from GTA5 to CityScapes data.
B. Optimal Transport
Optimal Transport [11] is a pointwise comparative analytical
tool that provides a distance measure between two probability
distributions. The distance measure is based on a cost c(·,·)
which is imputed to transporting a source distribution to a
target distribution. Formally, given two densities µsand µton
two measureable spaces Xsand Xt, the Kantorovich-Monge
relaxation/formulation1of the optimal transport problem en-
tails finding a transport plan which is a probabilistic coupling
γ?defined over Xs× Xtsuch that,
arg min
γΓZXs×Xt
c(xs, xt)(xs, xt),(1)
where c:Xs× Xt[0,+]and c(x, y)denotes the cost
of transporting a unit of mass from xto y.γ?(x, y)denotes
the coupling that provides the minimum Ex,y˜γ?[c(x,y)].
In most practical applications, one has access only to the
samples of the distribution where discrete measures µs=
PNs
i=1 psiδxsiand µt=PNt
i=1 ptiδxti, where δxsi,psiand
δxti,ptidenote the Dirac function and the probability mass
at xsi∈ Xsand xti∈ Xt, respectively. The optimal
transport plan under the discrete case is the solution to a linear
programming problem which is defined as follows,
γ?=arg min
γΓ<C, γ >=arg min
γΓ
Ns
X
i=1
Nt
X
j=1
γij Cij ,(2)
where C0is the cost matrix with Cij =||(xsixti)||2
2
and
Γ = {γRNs×Nt
+|γ1Ns=µs, γT1Nt=µt}.(3)
is the set of probabilistic coupling matrices and 1·is a vector
of ones of appropriate dimension.
III. PROBLEM FORMULATION
Consider data from a source domain, Xs={xsi}i=1,...N
with a corresponding set of labels Ys== {ysi}i=1,...N , where
1We refer the reader to the vast literature retracing the reformulation
Monge’s original problem, and a very readable resource is the manuscript
by Cuturi and Peyre [?]
N is the total number of samples in the dataset. Let gs:Xs
Lsbe a function that transforms the data into a latent feature
space, Ls=gs(Xs). Following this, a classifier function f(.)
is used to assign a labels to the data samples, ˆ
Ys=f(Ls) =
f(gs(Xs)). If the classifier is well trained, ˆ
YsYs.
Now, consider target domain data Xtfor which the ground
truth labels are unavailable. One may consider using the
classifier trained on Xsto classify the data Xtif similar classes
as in the source domain are of interest. Such a procedure would
yield optimal performance if and only if the distributions of
Xsand Xtare the same. This usually fails to be the case
in practical applications, and hence resulting in sub optimal
classification performance.
In order to mitigate this problem, Domain Adaptation
(DA) is required. Note that our goal here is to take on the
classification problem where labels for the target distribution
are completely unknown, and hence to learn the function
gt:XtLtsuch that Yt=f(gt(Xt)) leads to optimal
classification performance in the absence of any information
about the target domain.
IV. PROPOSED APPROACH
As noted in the previous section, the inference model must
be optimal for the source domain. In order to ensure this,
we propose to learn the functions gs(.)and f(.)so that they
minimize the cross entropy loss between the ground truth
labels, Ysand those predicted by the model, ˆ
Ys=f(gs(Xs)),
min
f,gs
CLoss(Ys, f(gs(Xs))),(4)
where, CLoss(Ys, f(gs(Xs))) = PN
i=1 ysilog f(gs(xsi)).
A. Learning the Optimal Latent Space
We first aim to learn the optimal latent spaces Lsand Lt
such that the same classifier be used for both the source and
target by minimizing the cost of transporting the samples in
the latent space of source domain to that of the target domain.
This leads to learning of latent spaces Ls=gs(Xs)and Lt=
gt(Xt)with minimum discrepancy. The function that must be
optimized is given as,
min
f,gs,gt
CLoss(Ys,ˆ
Ys) + λ1TLoss(Ls, Lt),(5)
where, TLoss(Ls, Lt) = Pi,j γ?
ij Cij .γ?is the optimal trans-
port mapping for going from Ltito Lsj, and Cij is the cor-
responding cost. The determination of γ?is further discussed
in Section IV-B. λ1in Equation 5 is a hyperparameter which
controls the importance of the second term with respect to the
first one.
To ensure an optimal adaptation of the source domain
classifier to that of the target domain, we proceed to min-
imize the cost for transporting between Lsand Lt, while
safeguarding the invariance of the predictive power of the
source domain classifier f(·)when applied to target domain,
i.e. P(Ys|Ls)P(Yt|Lt). To best integrate this constraint, we
opt to include the classification cost in the loss to be optimized
as a nonlinear transformation of the latent representation
摘要:

FastOTforLatentDomainAdaptationSiddharthRoheda?,AshkanPanahiy,HamidKrim??ElectricalandComputerEngineeringDepartment,NorthCarolinaStateUniversityfsroheda,ahkg@ncsu.eduyDept.ofComputerScienceandEngineering,ChalmersUniversityashkan.panahi@chalmers.seAbstract—Inthispaper,weaddresstheproblemofunsuper-vis...

展开>> 收起<<
Fast OT for Latent Domain Adaptation Siddharth Roheda Ashkan Panahiy Hamid Krim Electrical and Computer Engineering Department North Carolina State University.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:5.93MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注