Learning Less Generalizable Patterns with an Asymmetrically Trained Double Classifier for Better Test-Time Adaptation Thomas Duboudin1Emmanuel Dellandr ea1Corentin Abgrall2Gilles H enaff2Liming Chen1

2025-05-02 0 0 359.63KB 9 页 10玖币
侵权投诉
Learning Less Generalizable Patterns with an Asymmetrically Trained Double
Classifier for Better Test-Time Adaptation
Thomas Duboudin1Emmanuel Dellandr´
ea1Corentin Abgrall2Gilles H´
enaff2Liming Chen1
Univ Lyon, Ecole Centrale de Lyon, CNRS,
INSA Lyon, Univ Claude Bernard Lyon 1,
Univ Louis Lumi`
ere Lyon 2, LIRIS, UMR5205,
69134 Ecully, France
firstname.name@ec-lyon.fr
Thales LAS France,
78990 ´
Elancourt, France
firstname.name@fr.thalesgroup.com
Abstract
Deep neural networks often fail to generalize outside of
their training distribution, in particular when only a single
data domain is available during training. While test-time
adaptation has yielded encouraging results in this setting,
we argue that, to reach further improvements, these ap-
proaches should be combined with training procedure mod-
ifications aiming to learn a more diverse set of patterns. In-
deed, test-time adaptation methods usually have to rely on a
limited representation because of the shortcut learning phe-
nomenon: only a subset of the available predictive patterns
is learned with standard training. In this paper, we first
show that the combined use of existing training-time strate-
gies, and test-time batch normalization, a simple adaptation
method, does not always improve upon the test-time adap-
tation alone on the PACS benchmark. Furthermore, ex-
periments on Office-Home show that very few training-time
methods improve upon standard training, with or without
test-time batch normalization. We therefore propose a novel
approach using a pair of classifiers and a shortcut patterns
avoidance loss that mitigates the shortcut learning behav-
ior by reducing the generalization ability of the secondary
classifier, using the additional shortcut patterns avoidance
loss that encourages the learning of samples specific pat-
terns. The primary classifier is trained normally, resulting
in the learning of both the natural and the more complex,
less generalizable, features. Our experiments show that our
method improves upon the state-of-the-art results on both
benchmarks and benefits the most to test-time batch nor-
malization.
1. Introduction
Deep neural networks performance falls sharply when
they are confronted, at test-time, with data coming from
a different distribution, or domain, than the training one.
A change in lighting, sensor, weather conditions, or
geographical location can result in a dramatic performance
drop [14, 2, 5]. Such environmental changes are commonly
encountered when an embedded network is deployed in
the wild, and exist in such diversity that it is impossible to
gather enough data to cover all possible domain shifts. This
lack of cross-domain robustness prevents the widespread
deployment of deep networks in safety critical applications.
Domain generalization algorithms have been inves-
tigated to mitigate the test-time performance drop by
modifying the training procedure. Contrary to the domain
adaptation research field, in which unlabeled samples of
the target distribution are available at training time [7, 34],
no information about the target domain is assumed to be
known in domain generalization. Most of them assume
to have access to data coming from several identified
different domains, and try to create a domain invariant
representation by finding common predictive patterns
[24, 25, 4, 23, 20, 17]. However, such an assumption
is quite generous and in many real-life applications one
does not have access to several data domains but only
a single one. As a result, a number of methods study
single-source domain generalization [38, 31, 44, 43, 27].
A majority of methods were however found to perform
only marginally better than the standard training procedure
when the evaluation is done rigorously on several bench-
marks [9, 42]. Another recent paradigm, called test-time
adaptation, proposes to use a normally trained network,
and adapt it with a quick procedure at test-time, using
arXiv:2210.09834v1 [cs.LG] 17 Oct 2022
only a batch of unlabeled target samples. This paradigm
yielded promising results in the domain generalization
setting [40, 39] because they alleviate the main challenges
of domain generalization: the lack of information about the
target domain, and the requirement to be simultaneously
robust in advance to every possible shift.
Test-time adaptation methods however suffer from a
drawback that limits their adaptation capability and that
can only be corrected at training-time. Indeed, using a
standard training procedure, only a subset of predictive
patterns is learned, corresponding to the most obvious
and efficient ones, while the less predictive patterns are
disregarded entirely [32, 13, 28, 12, 30, 2, 8]. This apparent
flaw, named shortcut learning, originates from the gradient
descent optimization [28] and prevents a test-time method
to use all the available patterns. The combination of a
training-time patterns’ diversity seeking approach with a
test-time adaptation method may thus lead to improved
results. In this paper, we show that the combined use of
test-time batch normalization, a simple test-time adaptation
method, with the state-of-the-art single-source domain
generalization methods (that are often designed to discover
normally unused patterns) does not systematically yield
increased results on the PACS benchmark [22] in the
single-source setting. Similar experiments on Office-Home
[35] yield a similar result, with only few methods perform-
ing better than the standard training procedure. We thus
propose a new method, namely L2GP, which encourages
a network to learn new predictive patterns rather than ex-
ploiting and refining already learned ones and demonstrate
its effectiveness on both the PACS and the Office-Home
benchmarks. To find such patterns, we propose to look for
predictive patterns that are less generalizable than the natu-
rally learned ones through a secondary classifier endowed
with a shortcut avoidance loss, thereby leading to learning
semantically different patterns. These less generalizable
patterns match the ones normally ignored because of the
simplicity bias of deep networks that promotes the learning
of a representation with a high generalization capability
[18, 6]. Our method requires two classifiers added to a
features extractor and trains them asymmetrically, using
a data-dependant regularization, e.g., shortcut avoidance
loss, that slightly encourages memorization rather than
generalization by learning batch specific patterns, i.e.
patterns that lower the loss on the running batch but with a
limited effect on the other batches of data.
To summarize, our contribution is threefold:
To the best of our knowledge, we are the first to inves-
tigate the effect of training-time single-source methods
on a test-time adaptation strategy. We show that it usu-
ally does not increase performance and can even have
Figure 1. Schema of our bi-headed architecture. The naming con-
vention is the same as the one used in algorithm 1.
an adverse effect.
We apply, for the first time, several state-of-the-art
single-source domain generalization algorithms on the
more challenging and rarely used Office-Home bench-
mark and show that very few yield a robust cross-
domain representation.
We propose an original algorithm to learn a larger than
usual subset of predictive features and show that it
yields results over the existing state-of-the-art with the
combination of test-time batch normalization.
2. Related Works
2.1. Single-Source Domain Generalization
Most domain generalization algorithms require several
identified domains to enforce some level of distributional
invariance. Because this is an unrealistic hypothesis in some
situations (such as in healthcare or defense related tasks),
methods were developed to deal with a domain shift is-
sue with only one single domain available during training.
Some of them rely on a domain shift invariance hypothe-
sis. A commonly used invariance hypothesis is the texture
shift hypothesis: a lot of domain shifts are primarily tex-
tures shifts, and using style transfer based data augmenta-
tion will improve the generalization, whether it is explicitly
by training a model on stylized images [38, 19] or implicitly
in the internal representation of the network [43, 27]. Such
methods are limited to situations where it is indeed a shift of
the hypothesized nature that is encountered. Others wish to
learn a larger set of predictive patterns to make the network
more robust should one or several training-time predictive
patterns be missing at test-time. Volpi et al. [36] and Zhang
et al. [44] propose to incrementally add adversarial images
crafted to maximize the classification error of the network
to the training dataset. These images no longer contain the
original obvious predictive patterns which forces the learn-
ing of new patterns. These strategies are inspired by ad-
摘要:

LearningLessGeneralizablePatternswithanAsymmetricallyTrainedDoubleClassierforBetterTest-TimeAdaptationThomasDuboudin1EmmanuelDellandr´ea1CorentinAbgrall2GillesH´enaff2LimingChen1UnivLyon,EcoleCentraledeLyon,CNRS,INSALyon,UnivClaudeBernardLyon1,UnivLouisLumiereLyon2,LIRIS,UMR5205,69134Ecully,France...

展开>> 收起<<
Learning Less Generalizable Patterns with an Asymmetrically Trained Double Classifier for Better Test-Time Adaptation Thomas Duboudin1Emmanuel Dellandr ea1Corentin Abgrall2Gilles H enaff2Liming Chen1.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:359.63KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注