limited performance gain of those invariant learning methods
over ERM under a fair evaluation protocol, demonstrating the
difficulty of balancing alignment and generalization.
In this paper, we take a step back from pursuing domain
alignment. We model the relative difference between the target
domain and each source domain instead. Specifically, we put
forward a novel paradigm of domain shift assumption: the
an-
gular invariance
and
norm shift
assumption. The proposed
assumption says that under the polar reparameterization
[
Blu-
menson, 1960
]
, the relative difference between the DNN push
forward measures is captured by the norm parameters and
invariant to the angular parameters. The insight of angular
invariance and norm shift is inspired by the acknowledged fact
that the internal layers in DNNs capture high-level semantic
concepts (e.g., eye, tail)
[
Zeiler and Fergus, 2014
]
, which are
connected to category-related discriminative features. The
angular parameters capture the correlations between the high-
level semantic concepts, while the norm parameter captures
the magnitude of the high-level semantic concepts. In the
practice of DG, the DNN feature mapping pre-trained on Im-
ageNet is fine-tuned on the source domains. Therefore the
semantic concepts memorized by the internal layers are biased
to the source domains, and leading to higher-level of neuron
activations. Hence we expect a difference of norm distribution
of latent representations between a source domain and a tar-
get domain. Meanwhile the correlations between high-level
concepts in a fixed category are relatively stable. Thus we
expect invariant angular distributions across different domains.
We do t-SNE feature visualization on the PACS dataset for
an ERM-trained model to motivate and substantiate our as-
sumption. Fig 1(a) shows that the norm distribution of the
target domain (orange) significantly differs from that of source
domains, while the distributions over angular coordinates are
homogeneous. Fig 1(b) shows that the learned class clusters
are separated well by the angular parameters.
Apart from the novel angular invariance and norm shift as-
sumption, our methodological contribution is manifested by a
novel deep DG algorithm called
A
ngular
I
nvariance
D
omain
G
eneralization
N
etwork (
AIDGN
). The designing principle
of the AIDGN method is a minimal level of modification of
ERM learning under modest intensity distributional assump-
tions, such as assuming the distribution families of maximum
entropy. Concretely, (1) We show that the angular invariance
enables us to compute the marginals over the norm coordinate
to compare probability density functions of the target distribu-
tion and each source distribution in the latent space. Moreover,
we compute the relative density ratio analytically based on
the maximum entropy principle
[
Jaynes, 1957
]
. (2) Within
a von-Mises Fisher (vMF) mixture model
[
Gopal and Yang,
2014
]
, we connect the target posterior with the density of each
mixture component, re-weighted by the relative density ratio
mentioned above and the label densities. (3) We derive a prac-
tical AIDGN loss from the target posterior. The deduction
adopts the maximum entropy principle for label densities and
solves a constrained optimization problem.
We conduct extensive experiments on multiple DG bench-
marks to validate the effectiveness of the proposed method
and demonstrate that it achieves superior performance over
the existing baselines. Moreover, we show that AIDGN effec-
tively balances the intra-class compactness and the inter-class
separation, and thus reduces the uncertainty of predictions.
2 Related Work
A common pervasive theme in DG literature is domain-
invariant representation learning, which is based on the idea of
aligning feature distributions among different source domains,
with the hope that the learned invariance can be generalized
to target domains. For instance,
[
Li et al., 2018b
]
achieved
distribution alignment in the latent space of an autoencoder by
using adversarial learning and the maximum mean discrepancy
criteria.
[
Li et al., 2018c
]
matched conditional feature dis-
tributions across domains, enabling alignment of multimodal
distributions for all class labels.
[
Liu et al., 2021
]
exploited
both the conditional and label shifts, and proposed a Bayesian
variational inference framework with posterior alignment to re-
duce both the shifts simultaneously. However, existing works
overemphasize the importance of joint distribution alignment
which might hurt class discriminative information. Different
from them, we propose a novel angular invariance as well
as the accompanied norm shift assumption, and develop a
learning framework based on the proposed term of invariance.
Meta-learning was introduced into the DG community
by
[
Li et al., 2018a
]
and has drawn increasing attention. The
main idea is to divide the source domains into meta-train-
domains and meta-test-domain to simulate domain shift, and
regulate the model trained on meta-train-domains to perform
well on meta-test-domain. Data augmentation has also been ex-
ploited for DG, which augments the source data to increase the
diversity of training data distribution. For instance,
[
Wang et
al., 2020b
]
employed the mixup
[
Zhang et al., 2018
]
technique
across multiple domains and trained model on the augmented
heterogeneous mixup distribution, which implicitly enhanced
invariance to domain shifts.
Different from the above DG methods which focus on train-
ing phase, test-time adaptation is a class of methods focusing
on test phase, i.e., adjusting the model using online unla-
beled data and correcting its prediction by itself during test
time.
[
Wang et al., 2020a
]
proposed fully test-time adaptation,
which modulates the BN parameters by minimizing the pre-
diction entropy using stochastic gradient descent.
[
Iwasawa
and Matsuo, 2021
]
proposed a test-time classifier adjustment
module for DG, which updates pseudo-prototypes for each
class using online unlabeled data augmented by the base clas-
sifier trained on the source domains. We empirically show
that AIDGN can effectively make the decision boundaries of
all categories separate from each other and reduce the uncer-
tainty of predictions, so that the existing test-time adaptation
methods based on entropy minimization is not necessary.
We also show that our proposed AIDGN theoretically jus-
tifies and generalizes the recent proposed MAG loss for face
recognition [Meng et al., 2021].
3 Methodology
In this section, we first formulate the DG problem. Secondly,
we explain the proposed angular invariance and norm shift as-
sumption. Lastly, we introduce our angular invariance domain