Domain Generalization through the Lens of Angular Invariance Yujie Jin1Xu Chu2Yasha Wang1yand Wenwu Zhu2y

2025-08-18 0 0 966.45KB 15 页 10玖币

侵权投诉

Domain Generalization through the Lens of Angular Invariance

Yujie Jin1∗,Xu Chu2∗,Yasha Wang1†and Wenwu Zhu2†

1Peking University, Beijing, China

2Tsinghua University, Beijing, China

{jyj17pku,wangyasha}@pku.edu.cn, {chu xu,wwzhu}@tsinghua.edu.cn

Abstract

Domain generalization (DG) aims at generalizing

a classiﬁer trained on multiple source domains to

an unseen target domain with domain shift. A com-

mon pervasive theme in existing DG literature is

domain-invariant representation learning with vari-

ous invariance assumptions. However, prior works

restrict themselves to an impractical assumption for

real-world challenges: If a mapping induced by a

deep neural network (DNN) could align the source

domains well, then such a mapping aligns a target

domain as well. In this paper, we simply take DNNs

as feature extractors to relax the requirement of dis-

tribution alignment. Speciﬁcally, we put forward a

novel angular invariance and the accompanied norm

shift assumption. Based on the proposed term of

invariance, we propose a novel deep DG method

dubbed Angular Invariance Domain Generalization

Network (AIDGN). The optimization objective of

AIDGN is developed with a von-Mises Fisher (vMF)

mixture model. Extensive experiments on multiple

DG benchmark datasets validate the effectiveness

of the proposed AIDGN method.

1 Introduction

Over the past few years, supervised deep learning has

achieved remarkable success on many challenging visual

tasks

[

Krizhevsky et al., 2012; Long et al., 2015; He et al.,

2016

]

. An underlying assumption of the popular supervised

DL methods is the identically distributed condition, namely,

the generating functions of training data and testing data are

identical. We say a domain shift exists between the training

data (source domain) and the testing data (target domain) if

the identical condition is violated. When there is a domain

shift, the favored empirical risk minimization (ERM) learning

[

Vapnik, 1999

]

would be ill-posed, since the empirical risk

over the training data is not guaranteed to converge to the risk

of the testing data asymptotically.

Domain generalization (DG) aims at generalizing the model

trained on multiple source domains to perform well on an un-

∗The ﬁrst two authors contributed to this work equally.

†Corresponding authors.

(a) visualization of domains (b) visualization of classes

Figure 1: Feature visualization for a model trained with ERM on the

PACS dataset: (a) different colors indicate different domains, source

domains include cartoon (black), photo (green) and sketch (blue)

while the target domain is art-painting (orange); (b) different colors

represent different classes. Best viewed in color (Zoom in for details).

seen target domain with domain shift

[

Blanchard et al., 2011

]

The inductive setting of DG assumes no target data is avail-

able during training, differentiating DG from the transductive

domain adaptation methodss

[

Ben-David et al., 2007

]

, thus

making DG more practical and challenging.

Intuitively, in order to carry out a successful knowledge

transfer from “seen” source domains to an “unseen” target

domain, there have to be some underlying similarities among

these domains. From a theoretical standpoint, invariance

among the distributions of domains should be investigated.

To this end, a predominant stream in DG is domain-invariant

representation learning, with various invariance and shift as-

sumptions such as covariate shift assumption

[

Li et al., 2018b

]

conditional shift assumption

[

Li et al., 2018c

]

, and label shift

assumption

[

Liu et al., 2021

]

. However, prior works overem-

phasize the importance of joint distribution alignment under

an impractical assumption: an injective mapping (implies a

tendency of losing class discriminative information) aligning

the source joint distributions on the induced space could align

the target joint distribution as well. An easy counter-example

is a constant mapping that aligns any distributions on the in-

duced space. Recently, theoretical analysis has revealed a

fundamental trade-off between achieving well-alignment and

low joint error

[

Zhao et al., 2019

]

on various domains. Em-

pirically, a study

[

Gulrajani and Lopez-Paz, 2021

]

observed

arXiv:2210.15836v1 [cs.LG] 28 Oct 2022

limited performance gain of those invariant learning methods

over ERM under a fair evaluation protocol, demonstrating the

difﬁculty of balancing alignment and generalization.

In this paper, we take a step back from pursuing domain

alignment. We model the relative difference between the target

domain and each source domain instead. Speciﬁcally, we put

forward a novel paradigm of domain shift assumption: the

an-

gular invariance

and

norm shift

assumption. The proposed

assumption says that under the polar reparameterization

[

Blu-

menson, 1960

]

, the relative difference between the DNN push

forward measures is captured by the norm parameters and

invariant to the angular parameters. The insight of angular

invariance and norm shift is inspired by the acknowledged fact

that the internal layers in DNNs capture high-level semantic

concepts (e.g., eye, tail)

[

Zeiler and Fergus, 2014

]

, which are

connected to category-related discriminative features. The

angular parameters capture the correlations between the high-

level semantic concepts, while the norm parameter captures

the magnitude of the high-level semantic concepts. In the

practice of DG, the DNN feature mapping pre-trained on Im-

ageNet is ﬁne-tuned on the source domains. Therefore the

semantic concepts memorized by the internal layers are biased

to the source domains, and leading to higher-level of neuron

activations. Hence we expect a difference of norm distribution

of latent representations between a source domain and a tar-

get domain. Meanwhile the correlations between high-level

concepts in a ﬁxed category are relatively stable. Thus we

expect invariant angular distributions across different domains.

We do t-SNE feature visualization on the PACS dataset for

an ERM-trained model to motivate and substantiate our as-

sumption. Fig 1(a) shows that the norm distribution of the

target domain (orange) signiﬁcantly differs from that of source

domains, while the distributions over angular coordinates are

homogeneous. Fig 1(b) shows that the learned class clusters

are separated well by the angular parameters.

Apart from the novel angular invariance and norm shift as-

sumption, our methodological contribution is manifested by a

novel deep DG algorithm called

ngular

nvariance

omain

eneralization

etwork (

AIDGN

). The designing principle

of the AIDGN method is a minimal level of modiﬁcation of

ERM learning under modest intensity distributional assump-

tions, such as assuming the distribution families of maximum

entropy. Concretely, (1) We show that the angular invariance

enables us to compute the marginals over the norm coordinate

to compare probability density functions of the target distribu-

tion and each source distribution in the latent space. Moreover,

we compute the relative density ratio analytically based on

the maximum entropy principle

[

Jaynes, 1957

]

. (2) Within

a von-Mises Fisher (vMF) mixture model

[

Gopal and Yang,

2014

]

, we connect the target posterior with the density of each

mixture component, re-weighted by the relative density ratio

mentioned above and the label densities. (3) We derive a prac-

tical AIDGN loss from the target posterior. The deduction

adopts the maximum entropy principle for label densities and

solves a constrained optimization problem.

We conduct extensive experiments on multiple DG bench-

marks to validate the effectiveness of the proposed method

and demonstrate that it achieves superior performance over

the existing baselines. Moreover, we show that AIDGN effec-

tively balances the intra-class compactness and the inter-class

separation, and thus reduces the uncertainty of predictions.

2 Related Work

A common pervasive theme in DG literature is domain-

invariant representation learning, which is based on the idea of

aligning feature distributions among different source domains,

with the hope that the learned invariance can be generalized

to target domains. For instance,

[

Li et al., 2018b

]

achieved

distribution alignment in the latent space of an autoencoder by

using adversarial learning and the maximum mean discrepancy

criteria.

[

Li et al., 2018c

]

matched conditional feature dis-

tributions across domains, enabling alignment of multimodal

distributions for all class labels.

[

Liu et al., 2021

]

exploited

both the conditional and label shifts, and proposed a Bayesian

variational inference framework with posterior alignment to re-

duce both the shifts simultaneously. However, existing works

overemphasize the importance of joint distribution alignment

which might hurt class discriminative information. Different

from them, we propose a novel angular invariance as well

as the accompanied norm shift assumption, and develop a

learning framework based on the proposed term of invariance.

Meta-learning was introduced into the DG community

[

Li et al., 2018a

]

and has drawn increasing attention. The

main idea is to divide the source domains into meta-train-

domains and meta-test-domain to simulate domain shift, and

regulate the model trained on meta-train-domains to perform

well on meta-test-domain. Data augmentation has also been ex-

ploited for DG, which augments the source data to increase the

diversity of training data distribution. For instance,

[

Wang et

al., 2020b

]

employed the mixup

[

Zhang et al., 2018

]

technique

across multiple domains and trained model on the augmented

heterogeneous mixup distribution, which implicitly enhanced

invariance to domain shifts.

Different from the above DG methods which focus on train-

ing phase, test-time adaptation is a class of methods focusing

on test phase, i.e., adjusting the model using online unla-

beled data and correcting its prediction by itself during test

time.

[

Wang et al., 2020a

]

proposed fully test-time adaptation,

which modulates the BN parameters by minimizing the pre-

diction entropy using stochastic gradient descent.

[

Iwasawa

and Matsuo, 2021

]

proposed a test-time classiﬁer adjustment

module for DG, which updates pseudo-prototypes for each

class using online unlabeled data augmented by the base clas-

siﬁer trained on the source domains. We empirically show

that AIDGN can effectively make the decision boundaries of

all categories separate from each other and reduce the uncer-

tainty of predictions, so that the existing test-time adaptation

methods based on entropy minimization is not necessary.

We also show that our proposed AIDGN theoretically jus-

tiﬁes and generalizes the recent proposed MAG loss for face

recognition [Meng et al., 2021].

3 Methodology

In this section, we ﬁrst formulate the DG problem. Secondly,

we explain the proposed angular invariance and norm shift as-

sumption. Lastly, we introduce our angular invariance domain

generalization network (AIDGN). (Proofs for this section can

be found in Appendix A of the supplementary material.)

3.1 Problem Formulation

Give

source domains

{Pd

X ×Y }N

d=1

subject to

X ×Y 6=

Pd0

X ×Y

for

{d, d0}⊂{1,2, . . . , N}

, and a target domain

X ×Y

on the input-output space

X × Y

. DG tasks assume

X ×Y 6=Pt

X ×Y

for

d= 1, . . . , N

and focus on

-class sin-

gle label classiﬁcation tasks. Let

H={hθ|θ∈Θ}

be a

hypothesis space parametrized by

θ∈Θ

. For

d= 1, . . . , N

there are

independently identically distributed instances

{(xd

i, yd

i)}nd

i=1

sampled from the

-th source domain

X ×Y

The goal of DG is to output a hypothesis

h∈ H

such that the

target risk is minimized for a given loss `(h(·),·), i.e.,

h= arg min

h∈H

EPt

X ×Y [`(h(X), Y )] .(1)

3.2 Angular Invariance and Norm Shift

Celebrated for capturing empirical universal visual features,

convolutional neural networks (CNNs) pre-trained on the Ima-

geNet dataset

[

Deng et al., 2009

]

have been adopted by a wide

range of visual tasks. To take full advantage of a pre-trained

CNN

, we regard

as a feature extractor from the original

input space

to a latent representation space

. Then a hy-

pothesis

comprises a feature extractor

and a classiﬁer

i.e., h=f◦π.

Studies have shown that each dimension of a CNN

output

is capturing some abstract concepts (e.g., eye, tail)

[

Zeiler and

Fergus, 2014

]

. Considering the relationship among concepts

of the same class objects in the real-world is stable, the angular

invariance and norm shift assumption says that the

-mapped

feature of different domains are invariant in the angular co-

ordinates, but varies in the norm coordinate. For simplicity,

we introduce a random variable

indexing the

-th source

domain if

D=d

. The proposed assumption states as follows.

Assumption 1

(angular invariance)

Suppose the marginal

distributions

{Pd

X}N

d=1 ∪ {Pt

on the input space

are

continuous. Let

π:X → Z ⊂ Rn

be a feature ex-

traction mapping such that the

-push forward probabil-

ity density funcitons (p.d.f.s)

{pd(z)}N

d=1 ∪ {pt(z)}

exist in

the latent space

. Let

(r, φ1, . . . , φn−1) = g(z1, . . . , zn)

be the polar reparametrization

[

Blumenson, 1960

]

of the

Cartesian coordinates

z= (z1, . . . , zn)

. The angular invari-

ance assumption for DG is quantiﬁed by the equations: Let

φ= (φ1, . . . , φn−1), for d= 1, . . . , N,

p(φ|Y, D =d) = pt(φ|Y)(2)

The polar reparametrization

g(·)

is bijective and

p(r, φ|Y, D) = p(φ|Y, D)p(r|φ, Y, D)

, therefore the differ-

ence between the target conditional p.d.f. (c.p.d.f.)

pt(z|Y)

and the

-th source c.p.d.f.

p(z|Y, d)

is captured by the differ-

ence between the norm c.p.d.f.s pt(r|φ, Y )and p(r|φ, d, Y ).

Theorem 1.

Suppose

support(pt(z)) ⊂support(p(z|D))

If the angular invariance assumption 1 holds, then for

1, . . . , N,pt(z|Y)/p(z|D=d, Y )exists and satisﬁes

pt(z|Y)

p(z|d, Y )=pt(r|φ, Y )

p(r|φ, d, Y )

,w(r|φ, d, y).(3)

The theorem 1 says that under the angular invariance as-

sumption, we may reduce the degrees of freedom of comparing

target and source c.p.d.f.s from

. However, the aporia of

DG is that no target instances could be observed during train-

ing. Thus an additional assumption is essential to overcome

the zero-sample dilemma. Following the maximum entropy

principle

[

Jaynes, 1957

]

, we adopt the following distributional

assumptions on the conditional target and source norms.

Assumption 2

(maximum entropy norm distribution)

Condi-

tioned on

Y=y

and

, (I) The target norm in space

fol-

lows a continuous uniform distribution

1Uni[αy,φ, βy,φ]

with

δy,φ=βy,φ−αy,φ>0

, i.e.,

pt(r|y, φ;αy,φ, βy,φ)=1/δy,φ

(II) The

-th source domain norm in space

follows an

exponential distribution

2Exp[1/µd,y,φ], µd,y,φ>0

, i.e.,

p(r|d, y;µd,y,φ)=1/µd,y,φexp(r/µd,y,φ).

With the angular invariance and the maximum entropy as-

sumption, we can compare pt(z|y)and p(z|d, y)analytically.

Corollary 1. When assumption 1 and assumption 2 hold,

w(r|φ, d, y) = µd,y exp(r

µd,y,φ)

δy,φ

≈µd,y,φ+r

δy,φ

.(4)

Recalling that DG aims to learn classiﬁers, next we consider

the behavior when Yvaries, i.e., pt(y|z)and p(y|z, D).

3.3 The AIDGN Method

Before formally introducing the proposed AIDGN method, We

discuss the motivation of adopting the von-Mises Fisher (vMF)

mixture model. Speciﬁcally, we inspect

p(z|Y)

and

p(φ|Y)

where

φ= (φ1, . . . , φn−1)

is the angular coordinates after a

polar reparameterization of

. By the law of total probability,

the source c.p.d.f. ps(z|Y)decomposes as

ps(z|Y) =

d=1

p(z|d, Y )p(d|Y)∝

d=1

p(r, φ|d, Y )p(d|Y)

(5)

When the angular invariance and norm shift assumption 1

holds, the factors

p(z|d, Y )

and

p(d|Y)

might varies w.r.t.

the domain index

. Therefore modeling

ps(z|Y)

urges the

modeling of

p(z|d, Y )

w.r.t.

d= 1, . . . , N

. In sharp con-

trast, the angular invariance guarantees the modeling of the

source c.p.d.f.

ps(φ|Y)

is as easy as any

p(φ|Y, d)

for

1, . . . , N

. By eq.

(2)

ps(φ|Y) = PN

d=1 p(φ|Y, d)p(d|Y) =

p(φ|Y, d)PN

d=1 p(d|Y) = p(φ|Y, d)

. Therefore, the much

simpler assumption choice is a model related to

ps(φ)

. Notice

that the angular coordinates of the latent representation

are

invariant to the

normalization

G(z)

, i.e.,

G ◦ g: (r, φ)7→

(1,φ), where gis the polar reparameterization and G(z)is

G(z) = z/qz2

1+z2

2+. . . +z2

n.(6)

The formulation of the proposed AIDGN begins with the vMF

mixture assumption on the L2normalized G(Z).

The uniform distribution is the maximum (differential) entropy

distribution for a continuous random variable with a ﬁxed range.

The exponential distribution is the maximum (differential) en-

tropy distribution with positive support and a ﬁxed expectation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DomainGeneralizationthroughtheLensofAngularInvarianceYujieJin1,XuChu2,YashaWang1yandWenwuZhu2y1PekingUniversity,Beijing,China2TsinghuaUniversity,Beijing,Chinafjyj17pku,wangyashag@pku.edu.cn,fchuxu,wwzhug@tsinghua.edu.cnAbstractDomaingeneralization(DG)aimsatgeneralizingaclassiertrainedonmultipleso...

展开>> 收起<<

Domain Generalization through the Lens of Angular Invariance Yujie Jin1Xu Chu2Yasha Wang1yand Wenwu Zhu2y.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Domain Generalization through the Lens of Angular Invariance Yujie Jin1Xu Chu2Yasha Wang1yand Wenwu Zhu2y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: