1 Constrained Maximum Cross-Domain Likelihood for Domain Generalization

2025-04-28 0 0 6.72MB 14 页 10玖币
侵权投诉
1
Constrained Maximum Cross-Domain Likelihood
for Domain Generalization
Jianxin Lin, Yongqiang Tang, Junping Wang and Wensheng Zhang
Abstract—As a recent noticeable topic, domain generalization
aims to learn a generalizable model on multiple source domains,
which is expected to perform well on unseen test domains. Great
efforts have been made to learn domain-invariant features by
aligning distributions across domains. However, existing works
are often designed based on some relaxed conditions which are
generally hard to satisfy and fail to realize the desired joint
distribution alignment. In this paper, we propose a novel domain
generalization method, which originates from an intuitive idea
that a domain-invariant classifier can be learned by minimizing
the KL-divergence between posterior distributions from different
domains. To enhance the generalizability of the learned classifier,
we formalize the optimization objective as an expectation com-
puted on the ground-truth marginal distribution. Nevertheless,
it also presents two obvious deficiencies, one of which is the
side-effect of entropy increase in KL-divergence and the other
is the unavailability of ground-truth marginal distributions. For
the former, we introduce a term named maximum in-domain
likelihood to maintain the discrimination of the learned domain-
invariant representation space. For the latter, we approximate the
ground-truth marginal distribution with source domains under a
reasonable convex hull assumption. Finally, a Constrained Max-
imum Cross-domain Likelihood (CMCL) optimization problem
is deduced, by solving which the joint distributions are naturally
aligned. An alternating optimization strategy is carefully designed
to approximately solve this optimization problem. Extensive ex-
periments on four standard benchmark datasets, i.e., Digits-DG,
PACS, Office-Home and miniDomainNet, highlight the superior
performance of our method.
Index Terms—Domain generalization, domain adaptation, dis-
tribution shift, domain-invariant representation, joint distribu-
tion alignment.
I. INTRODUCTION
DEEP learning methods have achieved remarkable success
in computer vision tasks under the assumption that train
data and test data follow the same distribution. Unfortunately,
this important assumption does not hold in real-world applica-
tions [1]. The distribution shift between train data and test data,
which are widespread in various vision tasks, is unpredictable
and not even static, thus hindering the application of deep
learning in reliability-sensitive scenarios. For example, in the
field of medical image processing, image data from different
hospitals follow different distributions due to discrepancies in
J. Lin, J. Wang and W. Zhang are with the Research Center of Preci-
sion Sensing and Control, Institute of Automation, Chinese Academy of
Sciences, Beijing, 100190, China, and also with the School of Artificial
Intelligence, University of Chinese Academy of Sciences, Beijing, 100049,
China (e-mail: linjianxin2020@ia.ac.cn; junping.wang@ia.ac.cn; zhangwen-
shengia@hotmail.com).
Y. Tang is with the Research Center of Precision Sensing and Control,
Institute of Automation, Chinese Academy of Sciences, Beijing, 100190,
China (e-mail: yongqiang.tang@ia.ac.cn).
imaging protocol, device vendors and patient populations [2].
Hence, the models trained on data from one hospital often
suffer from performance degradation when tested in another
hospital owing to the distribution shift.
To tackle the distribution shift problem, considerable efforts
have been made in domain adaptation and domain general-
ization. Domain adaptation assumes that the target domain
is accessible and attempt to align the distributions between
the source domain and the target domain. However, in the
setting of domain adaptation, the model inevitably needs to be
retrained when the distribution of the target domain changes,
which can be time-consuming and cumbersome [3]. More
importantly, in many cases, there is no way to access the
target domain in advance. Fortunately, domain generalization
has been proposed to improve the generalization ability of
models in out-of-distribution scenarios given multiple source
domains, where the target domain is inaccessible [4].
As an active research area, many domain generalization
methods have been proposed. Let Xdenote an input variable,
i.e., an image, Z=F(X)denote the feature extracted from
Xby a feature extractor F(·)and Ydenote an output variable
i.e., a label. An effective and general solution to domain gen-
eralization is learning a domain-invariant representation space
where the joint distribution P(Z, Y )across all source domains
keeps consistent [4], [5], [6], [7]. Along this line, some works
[4], [8] try to align the marginal distribution P(Z)among
domains assuming that the posterior distribution P(Y|Z)is
stable across domains. Problematically, there is no guarantee
that P(Y|Z)will be invariant when aligning P(Z)[9], [10].
Some methods [11] attempt to align the class-conditional
distribution P(Z|Y). According to P(Z, Y ) = P(Z|Y)P(Y),
only if the categorical distribution P(Y)keeps invariant
across domains, aligning the class-conditional distributions
could achieve domain-invariant joint distribution [7]. But this
requirement is difficult to meet in practical applications.
More recently, the domain-invariant classifier, or the in-
variant predictor, has attracted much interest [12], [13], [14],
[15], [16]. In essence, these works are performing posterior
distribution alignment. Invariant Risk Minimization (IRM)
[13] seeks an invariant causal predictor, which is a simul-
taneously optimal classifier for all environments (domains).
IRM is formalized as a hard-to-solve bi-leveled optimization
problem. The invariant causal predictor realizes the conditional
expectation E[Y|Z]alignment across domains. It is a coarse
posterior distribution alignment due to the insufficiency of
the conditioned expectation. Robey et al [9] propose a novel
definition of invariance called G-invariance, which requires
that the classifier should hold invariant prediction after X
arXiv:2210.04155v1 [cs.CV] 9 Oct 2022
2
is transformed to any another domain by a domain transfor-
mation model G. Li et al [16] propose a new formulation
called Invariant Information Bottleneck (IIB), which achieves
the domain-invariant classifier by minimizing the mutual in-
formation between Yand domain label given Z. Despite
the brilliant achievements, the above methods do not take
marginal distribution alignment into consideration and thus fail
to realize the desired joint distribution alignment. In order to
ensure that the joint distribution is invariant across domains,
both P(Z)and P(Y|Z)must be considered [17].
In this paper, we propose a novel domain generalization
method that can jointly align the posterior distribution and
the marginal distribution. Specifically, we formalize a general
optimization objective, in which for any given sample, except
for the routine empirical risk minimization, the Kullback-
Leibler (KL) divergence [18] between posterior distributions
from different domains is also minimized so that the domain-
invariant classifier can be learned. To enhance the generaliza-
tion ability of the learned classifier, the optimization objective
is designed as an expectation computed on the ground-truth
marginal distribution. Unfortunately, the above optimization
problem still has two deficiencies that must be overcome.
The first issue lies in the side-effect of KL-divergence which
tends to enlarge the entropy of posterior distributions. To
tackle this issue, we add a new term named maximum in-
domain likelihood into the overall optimization objective,
such that the discrimination of the learned domain-invariant
feature space is reinforced. The second issue is that the
ground-truth marginal distribution is not available directly.
In light of this, we propose to approximate the real-world
marginal distribution with source domains under a reasonable
convex hull assumption. Eventually, a concise and intuitive
optimization problem namely Constrained Maximum Cross-
domain Likelihood (CMCL) is deduced, by solving which we
can learn a domain-invariant representation space where the
joint distributions across domains are naturally aligned.
The major contributions of our paper can be summarized as
follows:
1) We propose a new formulation for domain generaliza-
tion, which minimizes the expectation of KL-divergence
between posterior distributions from different domains.
We innovatively compute the expectation on the ground-
truth marginal distribution, such that the generalizability
of the learned model can be enhanced.
2) A constrained maximum cross-domain likelihood opti-
mization problem is deduced by adding an objective term
of maximum in-domain likelihood and a constraint of
marginal distribution alignment. The former eliminates
the side-effect brought by minimizing KL-divergence,
and the latter makes it possible to approximate the
ground-truth marginal distribution with source domains.
3) An effective alternating optimization strategy with multi-
ple optimization stages is elaborately developed to solve
the maximum cross-domain likelihood problem. Com-
prehensive experiments are conducted on four widely
used datasets and the results demonstrate that our CMCL
achieves superior performance on unseen domains.
II. RELATED WORKS
In this section, we review the related works dealing with
the domain (distribution) shift problem in deep learning,
which can be divided into two categories, including domain
adaptation and domain generalization.
A. Domain Adaptation
Domain adaptation aims to tackle the domain shift between
a source domain and a particular target domain [19] [20]. The
goal of domain adaptation is to train models making full use
of a large amount of labeled data from a source domain to
perform well on the unlabeled target domain. Most existing
domain adaptation methods focus on aligning distributions
between the source domain and target domain [21]. They can
be mainly divided into two categories: discrepancy measuring
based methods and domain adversarial based methods.
Discrepancy measuring based methods employ different
metrics to measure the distribution disparities and then min-
imize them, e.g., Maximum Mean Discrepancy (MMD) [22],
Central Moment Discrepancy (CMD) [23], Wasserstein dis-
tance [24]. Deep domain confusion [25] employs MMD to
align marginal distributions in the deep representation space.
Deep CORAL [26] and CMD [23] align marginal distributions
with moment matching. Joint MMD [27] is proposed to
align the joint distributions considering the distribution shifts
may stem from joint distributions. Domain adversarial based
methods use domain discriminators to minimize the distance
between distributions [28]. Feature extractors are optimized to
confuse the discriminators so that the divergence of distribu-
tions is reduced. Domain-adversarial neural network [28] is
proposed to align marginal distributions by adversarial learn-
ing. Multi-adversarial domain adaptation [29] considers the
alignment of multi-mode distributions, i.e., class-conditional
distributions, instead of marginal distributions. Zuo et al [30]
concatenate features and corresponding labels together, and
feed them into a domain classifier, then the joint distributions
are aligned in an adversarial training manner.
The difference between domain adaptation and domain
generalization lies in the accessibility to the target domain.
The former focuses on the alignment between the given source
domain and target domain, but the latter focuses more on the
generalizability on unseen test domains.
B. Domain Generalization
Domain generalization aims to train models on several
source domains and test them on unseen domain [31], [32].
Existing works of domain generalization carry out the research
mainly from three aspects, including learning strategy, data
augmentation and domain invariant representation.
Learning strategy based methods mainly design special
learning strategies to enhance generalizability. Some works
employ meta learning to address domain generalization, which
randomly split the source domains into meta-train and meta-
test to simulate the domain shift. Balaji et al [33] train a
regularizer through meta learning to capture the notion of
domain generalization, which is parameterized by a neural
3
network. Dou et al [34] propose a model-agnostic learning
paradigm based meta learning to enhance the generalizability
of learned features. Global inter-class relationships, local class-
specific cohesion and separation of sample features are also
considered to regularize the semantic structure of the feature
space. In addition to meta learning, Distributionally Robust
Optimization (DRO) [35] is also used for domain generaliza-
tion, which trains models by minimizing the worst-case loss
over pre-defined groups. Sagawa et al [36] find that coupling
DRO with stronger regularization achieves higher worst-case
accuracy in the over-parameterized regime.
The core idea of data augmentation based methods is to in-
crease the diversity of training data. MixStyle [37] is motivated
that the visual domain is closely related to image style, which
is encoded by feature statistics. The domain diversity can be
increased by randomly combining feature statistics between
two training instances. Deep Domain-Adversarial Image Gen-
eration (DDAIG) [38] is proposed to fool the domain classifier
by augmenting images. A domain transformation network is
designed to automatically change image style. Seo et al [39]
propose a Domain-Specific Optimized Normalization (DSON)
to remove domain-specific style. Wang et al [40] design a
feature-based style randomization module, which randomizes
image style by introducing random noise into feature statistics.
These style augmentation based methods actually exploit the
prior knowledge about domain shift, that is, the difference
across source domains lies in image style. Though they
work well in existing benchmarks, style augmentation based
methods would probably fail when the domain shift is caused
by other potential factors. Methods which do not rely on prior
knowledge deserve further study.
Domain-invariant representation based methods often
achieve domain invariance by aligning distributions of dif-
ferent domains as they did in domain adaptation. Li et al
[41] impose MMD to an adversarial autoencoder to align
the marginal distributions P(Z)among domains, and the
aligned distribution is matched with a pre-defined prior dis-
tribution by adversarial training. Motiian et al [42] try to
align the class-conditional distributions P(Z|Y)for finer
alignment. However, class-conditional distributions alignment
based methods hardly deal with the domain shift caused by
the label shift, which requires that categorical distribution
P(Y)remains unchanged among domains. Another important
branch attempts to achieve domain-invariant representation via
domain-invariant classifier learning. IRM [13] tries to learn a
domain-invariant classifier by constraining that the classifier is
simultaneously optimal for all domains. But this optimization
problem is hard to solve. Our method CMCL learns domain-
invariant classifier via posterior distribution alignment, an
effective alternating optimization strategy is proposed to solve
our optimization problem leading to excellent performance.
Zhao et al [44] propose an entropy regularization term to
align posterior distributions. According to our analysis, the
proposed entropy term is a side-effect of minimizing KL-
divergence, severely damaging classification performance. In
our method, a term of maximum in-domain likelihood is
proposed to eliminate this side-effect.
III. PROPOSED METHOD
A. Overview
In this paper, we focus on domain generalization for im-
age classification. Suppose the sample and label spaces are
represented by Xand Yrespectively, then a domain can be
represented by a joint distribution defined on X × Y. There
are Ndatasets D={Si={(xi
j, yi
j)}Mi
j=1}N
i=1 sampled from
domains with different distributions {Pi(X, Y )}N
i=1, where
Midenotes the number of samples of dataset Si,X∈ X
and Y∈ Y. Let P(X, Y )denote the ground-truth joint
distribution in the real world. As shown in Figure 1, we
suppose that P(X, Y )yields distributions of training domains
{Pi(X, Y )}N
i=1 and distribution of unseen domain Pu(X, Y ),
with different domain shift due to different selection bias.
Figure 1: Illustration of the generation process of domain-
specific distributions [8].
Given several training domains following different distri-
butions, domain generalization aims to learn a model which
is expected to overcome the domain shift and maintain its
performance on unseen domains. In order to overcome the
distribution shift across domains, we try to learn a domain-
invariant representation space in which the joint distributions
of different domains are aligned.
Definition 1 (Domain-Invariant Representation).Let Ebe
a set of all possible domains. F(·) : X × Y Rdis a
feature mapping function that transforms raw input to the
domain-invariant representation space. A representation space
is domain-invariant if
i6=j∈ E Pi(Z, Y ) = Pj(Z, Y )(1)
where Z=F(X).
To obtain the domain-invariant representation space, we
firstly focus on aligning the posterior distribution from the
perspective of domain-invariant classifier learning.
Definition 2 (Domain-Invariant Classifier).Given a par-
ticular representation space, a domain-invariant classifier is
simultaneously Bayes optimal classifier on any domain, which
can be obtained when posterior distributions of different
domains are aligned:
i6=j∈ E Pi(Y|Z) = Pj(Y|Z)(2)
We propose an optimization problem to learn the domain-
invariant classifier, which minimizes the KL-divergence be-
tween posterior distributions of different domains and maxi-
mizes the discrimination of the in-domain feature space (see
Section III-B1). The optimization objective is formalized as an
expectation of the KL-divergence computed on ground-truth
摘要:

1ConstrainedMaximumCross-DomainLikelihoodforDomainGeneralizationJianxinLin,YongqiangTang,JunpingWangandWenshengZhangAbstract—Asarecentnoticeabletopic,domaingeneralizationaimstolearnageneralizablemodelonmultiplesourcedomains,whichisexpectedtoperformwellonunseentestdomains.Greateffortshavebeenmadetole...

展开>> 收起<<
1 Constrained Maximum Cross-Domain Likelihood for Domain Generalization.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:6.72MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注