Domain Generalization through the Lens of Angular Invariance Yujie Jin1Xu Chu2Yasha Wang1yand Wenwu Zhu2y

2025-08-18 0 0 966.45KB 15 页 10玖币
侵权投诉
Domain Generalization through the Lens of Angular Invariance
Yujie Jin1,Xu Chu2,Yasha Wang1and Wenwu Zhu2
1Peking University, Beijing, China
2Tsinghua University, Beijing, China
{jyj17pku,wangyasha}@pku.edu.cn, {chu xu,wwzhu}@tsinghua.edu.cn
Abstract
Domain generalization (DG) aims at generalizing
a classifier trained on multiple source domains to
an unseen target domain with domain shift. A com-
mon pervasive theme in existing DG literature is
domain-invariant representation learning with vari-
ous invariance assumptions. However, prior works
restrict themselves to an impractical assumption for
real-world challenges: If a mapping induced by a
deep neural network (DNN) could align the source
domains well, then such a mapping aligns a target
domain as well. In this paper, we simply take DNNs
as feature extractors to relax the requirement of dis-
tribution alignment. Specifically, we put forward a
novel angular invariance and the accompanied norm
shift assumption. Based on the proposed term of
invariance, we propose a novel deep DG method
dubbed Angular Invariance Domain Generalization
Network (AIDGN). The optimization objective of
AIDGN is developed with a von-Mises Fisher (vMF)
mixture model. Extensive experiments on multiple
DG benchmark datasets validate the effectiveness
of the proposed AIDGN method.
1 Introduction
Over the past few years, supervised deep learning has
achieved remarkable success on many challenging visual
tasks
[
Krizhevsky et al., 2012; Long et al., 2015; He et al.,
2016
]
. An underlying assumption of the popular supervised
DL methods is the identically distributed condition, namely,
the generating functions of training data and testing data are
identical. We say a domain shift exists between the training
data (source domain) and the testing data (target domain) if
the identical condition is violated. When there is a domain
shift, the favored empirical risk minimization (ERM) learning
[
Vapnik, 1999
]
would be ill-posed, since the empirical risk
over the training data is not guaranteed to converge to the risk
of the testing data asymptotically.
Domain generalization (DG) aims at generalizing the model
trained on multiple source domains to perform well on an un-
The first two authors contributed to this work equally.
Corresponding authors.
(a) visualization of domains (b) visualization of classes
Figure 1: Feature visualization for a model trained with ERM on the
PACS dataset: (a) different colors indicate different domains, source
domains include cartoon (black), photo (green) and sketch (blue)
while the target domain is art-painting (orange); (b) different colors
represent different classes. Best viewed in color (Zoom in for details).
seen target domain with domain shift
[
Blanchard et al., 2011
]
.
The inductive setting of DG assumes no target data is avail-
able during training, differentiating DG from the transductive
domain adaptation methodss
[
Ben-David et al., 2007
]
, thus
making DG more practical and challenging.
Intuitively, in order to carry out a successful knowledge
transfer from “seen” source domains to an “unseen” target
domain, there have to be some underlying similarities among
these domains. From a theoretical standpoint, invariance
among the distributions of domains should be investigated.
To this end, a predominant stream in DG is domain-invariant
representation learning, with various invariance and shift as-
sumptions such as covariate shift assumption
[
Li et al., 2018b
]
,
conditional shift assumption
[
Li et al., 2018c
]
, and label shift
assumption
[
Liu et al., 2021
]
. However, prior works overem-
phasize the importance of joint distribution alignment under
an impractical assumption: an injective mapping (implies a
tendency of losing class discriminative information) aligning
the source joint distributions on the induced space could align
the target joint distribution as well. An easy counter-example
is a constant mapping that aligns any distributions on the in-
duced space. Recently, theoretical analysis has revealed a
fundamental trade-off between achieving well-alignment and
low joint error
[
Zhao et al., 2019
]
on various domains. Em-
pirically, a study
[
Gulrajani and Lopez-Paz, 2021
]
observed
arXiv:2210.15836v1 [cs.LG] 28 Oct 2022
limited performance gain of those invariant learning methods
over ERM under a fair evaluation protocol, demonstrating the
difficulty of balancing alignment and generalization.
In this paper, we take a step back from pursuing domain
alignment. We model the relative difference between the target
domain and each source domain instead. Specifically, we put
forward a novel paradigm of domain shift assumption: the
an-
gular invariance
and
norm shift
assumption. The proposed
assumption says that under the polar reparameterization
[
Blu-
menson, 1960
]
, the relative difference between the DNN push
forward measures is captured by the norm parameters and
invariant to the angular parameters. The insight of angular
invariance and norm shift is inspired by the acknowledged fact
that the internal layers in DNNs capture high-level semantic
concepts (e.g., eye, tail)
[
Zeiler and Fergus, 2014
]
, which are
connected to category-related discriminative features. The
angular parameters capture the correlations between the high-
level semantic concepts, while the norm parameter captures
the magnitude of the high-level semantic concepts. In the
practice of DG, the DNN feature mapping pre-trained on Im-
ageNet is fine-tuned on the source domains. Therefore the
semantic concepts memorized by the internal layers are biased
to the source domains, and leading to higher-level of neuron
activations. Hence we expect a difference of norm distribution
of latent representations between a source domain and a tar-
get domain. Meanwhile the correlations between high-level
concepts in a fixed category are relatively stable. Thus we
expect invariant angular distributions across different domains.
We do t-SNE feature visualization on the PACS dataset for
an ERM-trained model to motivate and substantiate our as-
sumption. Fig 1(a) shows that the norm distribution of the
target domain (orange) significantly differs from that of source
domains, while the distributions over angular coordinates are
homogeneous. Fig 1(b) shows that the learned class clusters
are separated well by the angular parameters.
Apart from the novel angular invariance and norm shift as-
sumption, our methodological contribution is manifested by a
novel deep DG algorithm called
A
ngular
I
nvariance
D
omain
G
eneralization
N
etwork (
AIDGN
). The designing principle
of the AIDGN method is a minimal level of modification of
ERM learning under modest intensity distributional assump-
tions, such as assuming the distribution families of maximum
entropy. Concretely, (1) We show that the angular invariance
enables us to compute the marginals over the norm coordinate
to compare probability density functions of the target distribu-
tion and each source distribution in the latent space. Moreover,
we compute the relative density ratio analytically based on
the maximum entropy principle
[
Jaynes, 1957
]
. (2) Within
a von-Mises Fisher (vMF) mixture model
[
Gopal and Yang,
2014
]
, we connect the target posterior with the density of each
mixture component, re-weighted by the relative density ratio
mentioned above and the label densities. (3) We derive a prac-
tical AIDGN loss from the target posterior. The deduction
adopts the maximum entropy principle for label densities and
solves a constrained optimization problem.
We conduct extensive experiments on multiple DG bench-
marks to validate the effectiveness of the proposed method
and demonstrate that it achieves superior performance over
the existing baselines. Moreover, we show that AIDGN effec-
tively balances the intra-class compactness and the inter-class
separation, and thus reduces the uncertainty of predictions.
2 Related Work
A common pervasive theme in DG literature is domain-
invariant representation learning, which is based on the idea of
aligning feature distributions among different source domains,
with the hope that the learned invariance can be generalized
to target domains. For instance,
[
Li et al., 2018b
]
achieved
distribution alignment in the latent space of an autoencoder by
using adversarial learning and the maximum mean discrepancy
criteria.
[
Li et al., 2018c
]
matched conditional feature dis-
tributions across domains, enabling alignment of multimodal
distributions for all class labels.
[
Liu et al., 2021
]
exploited
both the conditional and label shifts, and proposed a Bayesian
variational inference framework with posterior alignment to re-
duce both the shifts simultaneously. However, existing works
overemphasize the importance of joint distribution alignment
which might hurt class discriminative information. Different
from them, we propose a novel angular invariance as well
as the accompanied norm shift assumption, and develop a
learning framework based on the proposed term of invariance.
Meta-learning was introduced into the DG community
by
[
Li et al., 2018a
]
and has drawn increasing attention. The
main idea is to divide the source domains into meta-train-
domains and meta-test-domain to simulate domain shift, and
regulate the model trained on meta-train-domains to perform
well on meta-test-domain. Data augmentation has also been ex-
ploited for DG, which augments the source data to increase the
diversity of training data distribution. For instance,
[
Wang et
al., 2020b
]
employed the mixup
[
Zhang et al., 2018
]
technique
across multiple domains and trained model on the augmented
heterogeneous mixup distribution, which implicitly enhanced
invariance to domain shifts.
Different from the above DG methods which focus on train-
ing phase, test-time adaptation is a class of methods focusing
on test phase, i.e., adjusting the model using online unla-
beled data and correcting its prediction by itself during test
time.
[
Wang et al., 2020a
]
proposed fully test-time adaptation,
which modulates the BN parameters by minimizing the pre-
diction entropy using stochastic gradient descent.
[
Iwasawa
and Matsuo, 2021
]
proposed a test-time classifier adjustment
module for DG, which updates pseudo-prototypes for each
class using online unlabeled data augmented by the base clas-
sifier trained on the source domains. We empirically show
that AIDGN can effectively make the decision boundaries of
all categories separate from each other and reduce the uncer-
tainty of predictions, so that the existing test-time adaptation
methods based on entropy minimization is not necessary.
We also show that our proposed AIDGN theoretically jus-
tifies and generalizes the recent proposed MAG loss for face
recognition [Meng et al., 2021].
3 Methodology
In this section, we first formulate the DG problem. Secondly,
we explain the proposed angular invariance and norm shift as-
sumption. Lastly, we introduce our angular invariance domain
generalization network (AIDGN). (Proofs for this section can
be found in Appendix A of the supplementary material.)
3.1 Problem Formulation
Give
N
source domains
{Pd
X ×Y }N
d=1
subject to
Pd
X ×Y 6=
Pd0
X ×Y
for
{d, d0}⊂{1,2, . . . , N}
, and a target domain
Pt
X ×Y
on the input-output space
X × Y
. DG tasks assume
Pd
X ×Y 6=Pt
X ×Y
for
d= 1, . . . , N
and focus on
C
-class sin-
gle label classification tasks. Let
H={hθ|θΘ}
be a
hypothesis space parametrized by
θΘ
. For
d= 1, . . . , N
,
there are
nd
independently identically distributed instances
{(xd
i, yd
i)}nd
i=1
sampled from the
d
-th source domain
Pd
X ×Y
.
The goal of DG is to output a hypothesis
ˆ
h∈ H
such that the
target risk is minimized for a given loss `(h(·),·), i.e.,
ˆ
h= arg min
h∈H
EPt
X ×Y [`(h(X), Y )] .(1)
3.2 Angular Invariance and Norm Shift
Celebrated for capturing empirical universal visual features,
convolutional neural networks (CNNs) pre-trained on the Ima-
geNet dataset
[
Deng et al., 2009
]
have been adopted by a wide
range of visual tasks. To take full advantage of a pre-trained
CNN
π
, we regard
π
as a feature extractor from the original
input space
X
to a latent representation space
Z
. Then a hy-
pothesis
h
comprises a feature extractor
π
and a classifier
f
,
i.e., h=fπ.
Studies have shown that each dimension of a CNN
π
output
is capturing some abstract concepts (e.g., eye, tail)
[
Zeiler and
Fergus, 2014
]
. Considering the relationship among concepts
of the same class objects in the real-world is stable, the angular
invariance and norm shift assumption says that the
π
-mapped
feature of different domains are invariant in the angular co-
ordinates, but varies in the norm coordinate. For simplicity,
we introduce a random variable
D
indexing the
d
-th source
domain if
D=d
. The proposed assumption states as follows.
Assumption 1
(angular invariance)
.
Suppose the marginal
distributions
{Pd
X}N
d=1 ∪ {Pt
X}
on the input space
X
are
continuous. Let
π:X → Z Rn
be a feature ex-
traction mapping such that the
π
-push forward probabil-
ity density funcitons (p.d.f.s)
{pd(z)}N
d=1 ∪ {pt(z)}
exist in
the latent space
Z
. Let
(r, φ1, . . . , φn1) = g(z1, . . . , zn)
be the polar reparametrization
[
Blumenson, 1960
]
of the
Cartesian coordinates
z= (z1, . . . , zn)
. The angular invari-
ance assumption for DG is quantified by the equations: Let
φ= (φ1, . . . , φn1), for d= 1, . . . , N,
p(φ|Y, D =d) = pt(φ|Y)(2)
The polar reparametrization
g(·)
is bijective and
p(r, φ|Y, D) = p(φ|Y, D)p(r|φ, Y, D)
, therefore the differ-
ence between the target conditional p.d.f. (c.p.d.f.)
pt(z|Y)
and the
d
-th source c.p.d.f.
p(z|Y, d)
is captured by the differ-
ence between the norm c.p.d.f.s pt(r|φ, Y )and p(r|φ, d, Y ).
Theorem 1.
Suppose
support(pt(z)) support(p(z|D))
.
If the angular invariance assumption 1 holds, then for
d=
1, . . . , N,pt(z|Y)/p(z|D=d, Y )exists and satisfies
pt(z|Y)
p(z|d, Y )=pt(r|φ, Y )
p(r|φ, d, Y )
,w(r|φ, d, y).(3)
The theorem 1 says that under the angular invariance as-
sumption, we may reduce the degrees of freedom of comparing
target and source c.p.d.f.s from
n
to
1
. However, the aporia of
DG is that no target instances could be observed during train-
ing. Thus an additional assumption is essential to overcome
the zero-sample dilemma. Following the maximum entropy
principle
[
Jaynes, 1957
]
, we adopt the following distributional
assumptions on the conditional target and source norms.
Assumption 2
(maximum entropy norm distribution)
.
Condi-
tioned on
Y=y
and
φ
, (I) The target norm in space
Z
fol-
lows a continuous uniform distribution
1Uni[αy,φ, βy,φ]
with
δy,φ=βy,φαy,φ>0
, i.e.,
pt(r|y, φ;αy,φ, βy,φ)=1y,φ
.
(II) The
d
-th source domain norm in space
Z
follows an
exponential distribution
2Exp[1d,y,φ], µd,y,φ>0
, i.e.,
p(r|d, y;µd,y,φ)=1d,y,φexp(rd,y,φ).
With the angular invariance and the maximum entropy as-
sumption, we can compare pt(z|y)and p(z|d, y)analytically.
Corollary 1. When assumption 1 and assumption 2 hold,
w(r|φ, d, y) = µd,y exp(r
µd,y,φ)
δy,φ
µd,y,φ+r
δy,φ
.(4)
Recalling that DG aims to learn classifiers, next we consider
the behavior when Yvaries, i.e., pt(y|z)and p(y|z, D).
3.3 The AIDGN Method
Before formally introducing the proposed AIDGN method, We
discuss the motivation of adopting the von-Mises Fisher (vMF)
mixture model. Specifically, we inspect
p(z|Y)
and
p(φ|Y)
,
where
φ= (φ1, . . . , φn1)
is the angular coordinates after a
polar reparameterization of
z
. By the law of total probability,
the source c.p.d.f. ps(z|Y)decomposes as
ps(z|Y) =
N
X
d=1
p(z|d, Y )p(d|Y)
N
X
d=1
p(r, φ|d, Y )p(d|Y)
(5)
When the angular invariance and norm shift assumption 1
holds, the factors
p(z|d, Y )
and
p(d|Y)
might varies w.r.t.
the domain index
d
. Therefore modeling
ps(z|Y)
urges the
modeling of
p(z|d, Y )
w.r.t.
d= 1, . . . , N
. In sharp con-
trast, the angular invariance guarantees the modeling of the
source c.p.d.f.
ps(φ|Y)
is as easy as any
p(φ|Y, d)
for
d=
1, . . . , N
. By eq.
(2)
,
ps(φ|Y) = PN
d=1 p(φ|Y, d)p(d|Y) =
p(φ|Y, d)PN
d=1 p(d|Y) = p(φ|Y, d)
. Therefore, the much
simpler assumption choice is a model related to
ps(φ)
. Notice
that the angular coordinates of the latent representation
z
are
invariant to the
L2
normalization
G(z)
, i.e.,
G g: (r, φ)7→
(1,φ), where gis the polar reparameterization and G(z)is
G(z) = z/qz2
1+z2
2+. . . +z2
n.(6)
The formulation of the proposed AIDGN begins with the vMF
mixture assumption on the L2normalized G(Z).
1
The uniform distribution is the maximum (differential) entropy
distribution for a continuous random variable with a fixed range.
2
The exponential distribution is the maximum (differential) en-
tropy distribution with positive support and a fixed expectation.
摘要:

DomainGeneralizationthroughtheLensofAngularInvarianceYujieJin1,XuChu2,YashaWang1yandWenwuZhu2y1PekingUniversity,Beijing,China2TsinghuaUniversity,Beijing,Chinafjyj17pku,wangyashag@pku.edu.cn,fchuxu,wwzhug@tsinghua.edu.cnAbstractDomaingeneralization(DG)aimsatgeneralizingaclassiertrainedonmultipleso...

展开>> 收起<<
Domain Generalization through the Lens of Angular Invariance Yujie Jin1Xu Chu2Yasha Wang1yand Wenwu Zhu2y.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:966.45KB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注