1 Constrained Maximum Cross-Domain Likelihood for Domain Generalization

2025-04-28 0 0 6.72MB 14 页 10玖币

侵权投诉

Constrained Maximum Cross-Domain Likelihood

for Domain Generalization

Jianxin Lin, Yongqiang Tang, Junping Wang and Wensheng Zhang

Abstract—As a recent noticeable topic, domain generalization

aims to learn a generalizable model on multiple source domains,

which is expected to perform well on unseen test domains. Great

efforts have been made to learn domain-invariant features by

aligning distributions across domains. However, existing works

are often designed based on some relaxed conditions which are

generally hard to satisfy and fail to realize the desired joint

distribution alignment. In this paper, we propose a novel domain

generalization method, which originates from an intuitive idea

that a domain-invariant classiﬁer can be learned by minimizing

the KL-divergence between posterior distributions from different

domains. To enhance the generalizability of the learned classiﬁer,

we formalize the optimization objective as an expectation com-

puted on the ground-truth marginal distribution. Nevertheless,

it also presents two obvious deﬁciencies, one of which is the

side-effect of entropy increase in KL-divergence and the other

is the unavailability of ground-truth marginal distributions. For

the former, we introduce a term named maximum in-domain

likelihood to maintain the discrimination of the learned domain-

invariant representation space. For the latter, we approximate the

ground-truth marginal distribution with source domains under a

reasonable convex hull assumption. Finally, a Constrained Max-

imum Cross-domain Likelihood (CMCL) optimization problem

is deduced, by solving which the joint distributions are naturally

aligned. An alternating optimization strategy is carefully designed

to approximately solve this optimization problem. Extensive ex-

periments on four standard benchmark datasets, i.e., Digits-DG,

PACS, Ofﬁce-Home and miniDomainNet, highlight the superior

performance of our method.

Index Terms—Domain generalization, domain adaptation, dis-

tribution shift, domain-invariant representation, joint distribu-

tion alignment.

I. INTRODUCTION

DEEP learning methods have achieved remarkable success

in computer vision tasks under the assumption that train

data and test data follow the same distribution. Unfortunately,

this important assumption does not hold in real-world applica-

tions [1]. The distribution shift between train data and test data,

which are widespread in various vision tasks, is unpredictable

and not even static, thus hindering the application of deep

learning in reliability-sensitive scenarios. For example, in the

ﬁeld of medical image processing, image data from different

hospitals follow different distributions due to discrepancies in

J. Lin, J. Wang and W. Zhang are with the Research Center of Preci-

sion Sensing and Control, Institute of Automation, Chinese Academy of

Sciences, Beijing, 100190, China, and also with the School of Artiﬁcial

Intelligence, University of Chinese Academy of Sciences, Beijing, 100049,

China (e-mail: linjianxin2020@ia.ac.cn; junping.wang@ia.ac.cn; zhangwen-

shengia@hotmail.com).

Y. Tang is with the Research Center of Precision Sensing and Control,

Institute of Automation, Chinese Academy of Sciences, Beijing, 100190,

China (e-mail: yongqiang.tang@ia.ac.cn).

imaging protocol, device vendors and patient populations [2].

Hence, the models trained on data from one hospital often

suffer from performance degradation when tested in another

hospital owing to the distribution shift.

To tackle the distribution shift problem, considerable efforts

have been made in domain adaptation and domain general-

ization. Domain adaptation assumes that the target domain

is accessible and attempt to align the distributions between

the source domain and the target domain. However, in the

setting of domain adaptation, the model inevitably needs to be

retrained when the distribution of the target domain changes,

which can be time-consuming and cumbersome [3]. More

importantly, in many cases, there is no way to access the

target domain in advance. Fortunately, domain generalization

has been proposed to improve the generalization ability of

models in out-of-distribution scenarios given multiple source

domains, where the target domain is inaccessible [4].

As an active research area, many domain generalization

methods have been proposed. Let Xdenote an input variable,

i.e., an image, Z=F(X)denote the feature extracted from

Xby a feature extractor F(·)and Ydenote an output variable

i.e., a label. An effective and general solution to domain gen-

eralization is learning a domain-invariant representation space

where the joint distribution P(Z, Y )across all source domains

keeps consistent [4], [5], [6], [7]. Along this line, some works

[4], [8] try to align the marginal distribution P(Z)among

domains assuming that the posterior distribution P(Y|Z)is

stable across domains. Problematically, there is no guarantee

that P(Y|Z)will be invariant when aligning P(Z)[9], [10].

Some methods [11] attempt to align the class-conditional

distribution P(Z|Y). According to P(Z, Y ) = P(Z|Y)P(Y),

only if the categorical distribution P(Y)keeps invariant

across domains, aligning the class-conditional distributions

could achieve domain-invariant joint distribution [7]. But this

requirement is difﬁcult to meet in practical applications.

More recently, the domain-invariant classiﬁer, or the in-

variant predictor, has attracted much interest [12], [13], [14],

[15], [16]. In essence, these works are performing posterior

distribution alignment. Invariant Risk Minimization (IRM)

[13] seeks an invariant causal predictor, which is a simul-

taneously optimal classiﬁer for all environments (domains).

IRM is formalized as a hard-to-solve bi-leveled optimization

problem. The invariant causal predictor realizes the conditional

expectation E[Y|Z]alignment across domains. It is a coarse

posterior distribution alignment due to the insufﬁciency of

the conditioned expectation. Robey et al [9] propose a novel

deﬁnition of invariance called G-invariance, which requires

that the classiﬁer should hold invariant prediction after X

arXiv:2210.04155v1 [cs.CV] 9 Oct 2022

is transformed to any another domain by a domain transfor-

mation model G. Li et al [16] propose a new formulation

called Invariant Information Bottleneck (IIB), which achieves

the domain-invariant classiﬁer by minimizing the mutual in-

formation between Yand domain label given Z. Despite

the brilliant achievements, the above methods do not take

marginal distribution alignment into consideration and thus fail

to realize the desired joint distribution alignment. In order to

ensure that the joint distribution is invariant across domains,

both P(Z)and P(Y|Z)must be considered [17].

In this paper, we propose a novel domain generalization

method that can jointly align the posterior distribution and

the marginal distribution. Speciﬁcally, we formalize a general

optimization objective, in which for any given sample, except

for the routine empirical risk minimization, the Kullback-

Leibler (KL) divergence [18] between posterior distributions

from different domains is also minimized so that the domain-

invariant classiﬁer can be learned. To enhance the generaliza-

tion ability of the learned classiﬁer, the optimization objective

is designed as an expectation computed on the ground-truth

marginal distribution. Unfortunately, the above optimization

problem still has two deﬁciencies that must be overcome.

The ﬁrst issue lies in the side-effect of KL-divergence which

tends to enlarge the entropy of posterior distributions. To

tackle this issue, we add a new term named maximum in-

domain likelihood into the overall optimization objective,

such that the discrimination of the learned domain-invariant

feature space is reinforced. The second issue is that the

ground-truth marginal distribution is not available directly.

In light of this, we propose to approximate the real-world

marginal distribution with source domains under a reasonable

convex hull assumption. Eventually, a concise and intuitive

optimization problem namely Constrained Maximum Cross-

domain Likelihood (CMCL) is deduced, by solving which we

can learn a domain-invariant representation space where the

joint distributions across domains are naturally aligned.

The major contributions of our paper can be summarized as

follows:

1) We propose a new formulation for domain generaliza-

tion, which minimizes the expectation of KL-divergence

between posterior distributions from different domains.

We innovatively compute the expectation on the ground-

truth marginal distribution, such that the generalizability

of the learned model can be enhanced.

2) A constrained maximum cross-domain likelihood opti-

mization problem is deduced by adding an objective term

of maximum in-domain likelihood and a constraint of

marginal distribution alignment. The former eliminates

the side-effect brought by minimizing KL-divergence,

and the latter makes it possible to approximate the

ground-truth marginal distribution with source domains.

3) An effective alternating optimization strategy with multi-

ple optimization stages is elaborately developed to solve

the maximum cross-domain likelihood problem. Com-

prehensive experiments are conducted on four widely

used datasets and the results demonstrate that our CMCL

achieves superior performance on unseen domains.

II. RELATED WORKS

In this section, we review the related works dealing with

the domain (distribution) shift problem in deep learning,

which can be divided into two categories, including domain

adaptation and domain generalization.

A. Domain Adaptation

Domain adaptation aims to tackle the domain shift between

a source domain and a particular target domain [19] [20]. The

goal of domain adaptation is to train models making full use

of a large amount of labeled data from a source domain to

perform well on the unlabeled target domain. Most existing

domain adaptation methods focus on aligning distributions

between the source domain and target domain [21]. They can

be mainly divided into two categories: discrepancy measuring

based methods and domain adversarial based methods.

Discrepancy measuring based methods employ different

metrics to measure the distribution disparities and then min-

imize them, e.g., Maximum Mean Discrepancy (MMD) [22],

Central Moment Discrepancy (CMD) [23], Wasserstein dis-

tance [24]. Deep domain confusion [25] employs MMD to

align marginal distributions in the deep representation space.

Deep CORAL [26] and CMD [23] align marginal distributions

with moment matching. Joint MMD [27] is proposed to

align the joint distributions considering the distribution shifts

may stem from joint distributions. Domain adversarial based

methods use domain discriminators to minimize the distance

between distributions [28]. Feature extractors are optimized to

confuse the discriminators so that the divergence of distribu-

tions is reduced. Domain-adversarial neural network [28] is

proposed to align marginal distributions by adversarial learn-

ing. Multi-adversarial domain adaptation [29] considers the

alignment of multi-mode distributions, i.e., class-conditional

distributions, instead of marginal distributions. Zuo et al [30]

concatenate features and corresponding labels together, and

feed them into a domain classiﬁer, then the joint distributions

are aligned in an adversarial training manner.

The difference between domain adaptation and domain

generalization lies in the accessibility to the target domain.

The former focuses on the alignment between the given source

domain and target domain, but the latter focuses more on the

generalizability on unseen test domains.

B. Domain Generalization

Domain generalization aims to train models on several

source domains and test them on unseen domain [31], [32].

Existing works of domain generalization carry out the research

mainly from three aspects, including learning strategy, data

augmentation and domain invariant representation.

Learning strategy based methods mainly design special

learning strategies to enhance generalizability. Some works

employ meta learning to address domain generalization, which

randomly split the source domains into meta-train and meta-

test to simulate the domain shift. Balaji et al [33] train a

regularizer through meta learning to capture the notion of

domain generalization, which is parameterized by a neural

network. Dou et al [34] propose a model-agnostic learning

paradigm based meta learning to enhance the generalizability

of learned features. Global inter-class relationships, local class-

speciﬁc cohesion and separation of sample features are also

considered to regularize the semantic structure of the feature

space. In addition to meta learning, Distributionally Robust

Optimization (DRO) [35] is also used for domain generaliza-

tion, which trains models by minimizing the worst-case loss

over pre-deﬁned groups. Sagawa et al [36] ﬁnd that coupling

DRO with stronger regularization achieves higher worst-case

accuracy in the over-parameterized regime.

The core idea of data augmentation based methods is to in-

crease the diversity of training data. MixStyle [37] is motivated

that the visual domain is closely related to image style, which

is encoded by feature statistics. The domain diversity can be

increased by randomly combining feature statistics between

two training instances. Deep Domain-Adversarial Image Gen-

eration (DDAIG) [38] is proposed to fool the domain classiﬁer

by augmenting images. A domain transformation network is

designed to automatically change image style. Seo et al [39]

propose a Domain-Speciﬁc Optimized Normalization (DSON)

to remove domain-speciﬁc style. Wang et al [40] design a

feature-based style randomization module, which randomizes

image style by introducing random noise into feature statistics.

These style augmentation based methods actually exploit the

prior knowledge about domain shift, that is, the difference

across source domains lies in image style. Though they

work well in existing benchmarks, style augmentation based

methods would probably fail when the domain shift is caused

by other potential factors. Methods which do not rely on prior

knowledge deserve further study.

Domain-invariant representation based methods often

achieve domain invariance by aligning distributions of dif-

ferent domains as they did in domain adaptation. Li et al

[41] impose MMD to an adversarial autoencoder to align

the marginal distributions P(Z)among domains, and the

aligned distribution is matched with a pre-deﬁned prior dis-

tribution by adversarial training. Motiian et al [42] try to

align the class-conditional distributions P(Z|Y)for ﬁner

alignment. However, class-conditional distributions alignment

based methods hardly deal with the domain shift caused by

the label shift, which requires that categorical distribution

P(Y)remains unchanged among domains. Another important

branch attempts to achieve domain-invariant representation via

domain-invariant classiﬁer learning. IRM [13] tries to learn a

domain-invariant classiﬁer by constraining that the classiﬁer is

simultaneously optimal for all domains. But this optimization

problem is hard to solve. Our method CMCL learns domain-

invariant classiﬁer via posterior distribution alignment, an

effective alternating optimization strategy is proposed to solve

our optimization problem leading to excellent performance.

Zhao et al [44] propose an entropy regularization term to

align posterior distributions. According to our analysis, the

proposed entropy term is a side-effect of minimizing KL-

divergence, severely damaging classiﬁcation performance. In

our method, a term of maximum in-domain likelihood is

proposed to eliminate this side-effect.

III. PROPOSED METHOD

A. Overview

In this paper, we focus on domain generalization for im-

age classiﬁcation. Suppose the sample and label spaces are

represented by Xand Yrespectively, then a domain can be

represented by a joint distribution deﬁned on X × Y. There

are Ndatasets D={Si={(xi

j, yi

j)}Mi

j=1}N

i=1 sampled from

domains with different distributions {Pi(X, Y )}N

i=1, where

Midenotes the number of samples of dataset Si,X∈ X

and Y∈ Y. Let P(X, Y )denote the ground-truth joint

distribution in the real world. As shown in Figure 1, we

suppose that P(X, Y )yields distributions of training domains

{Pi(X, Y )}N

i=1 and distribution of unseen domain Pu(X, Y ),

with different domain shift due to different selection bias.

Figure 1: Illustration of the generation process of domain-

speciﬁc distributions [8].

Given several training domains following different distri-

butions, domain generalization aims to learn a model which

is expected to overcome the domain shift and maintain its

performance on unseen domains. In order to overcome the

distribution shift across domains, we try to learn a domain-

invariant representation space in which the joint distributions

of different domains are aligned.

Deﬁnition 1 (Domain-Invariant Representation).Let Ebe

a set of all possible domains. F(·) : X × Y → Rdis a

feature mapping function that transforms raw input to the

domain-invariant representation space. A representation space

is domain-invariant if

∀i6=j∈ E Pi(Z, Y ) = Pj(Z, Y )(1)

where Z=F(X).

To obtain the domain-invariant representation space, we

ﬁrstly focus on aligning the posterior distribution from the

perspective of domain-invariant classiﬁer learning.

Deﬁnition 2 (Domain-Invariant Classiﬁer).Given a par-

ticular representation space, a domain-invariant classiﬁer is

simultaneously Bayes optimal classiﬁer on any domain, which

can be obtained when posterior distributions of different

domains are aligned:

∀i6=j∈ E Pi(Y|Z) = Pj(Y|Z)(2)

We propose an optimization problem to learn the domain-

invariant classiﬁer, which minimizes the KL-divergence be-

tween posterior distributions of different domains and maxi-

mizes the discrimination of the in-domain feature space (see

Section III-B1). The optimization objective is formalized as an

expectation of the KL-divergence computed on ground-truth

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1ConstrainedMaximumCross-DomainLikelihoodforDomainGeneralizationJianxinLin,YongqiangTang,JunpingWangandWenshengZhangAbstractAsarecentnoticeabletopic,domaingeneralizationaimstolearnageneralizablemodelonmultiplesourcedomains,whichisexpectedtoperformwellonunseentestdomains.Greateffortshavebeenmadetole...

展开>> 收起<<

1 Constrained Maximum Cross-Domain Likelihood for Domain Generalization.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Constrained Maximum Cross-Domain Likelihood for Domain Generalization

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: