Attention Diversification for Domain Generalization Rang Meng1 Xianfeng Li1 Weijie Chen21 Shicai Yang1 Jie Song2

2025-05-02 0 0 1.19MB 19 页 10玖币
侵权投诉
Attention Diversification for Domain
Generalization
Rang Meng1,⋆, Xianfeng Li1,⋆, Weijie Chen2,1,, Shicai Yang1,, Jie Song2,
Xinchao Wang3, Lei Zhang4, Mingli Song2, Di Xie1, and Shiliang Pu1
1Hikvision Research Institute, Hangzhou, China
2Zhejiang University, Hangzhou, China
3National University of Singapore, Singapore
4Chongqing University, Chongqing, China
{mengrang, lixianfeng6, chenweijie5, yangshicai, xiedi,
pushiliang.hri}@hikvision.com, {sjie, songml}@zju.edu.cn,
xinchao@nus.edu.sg, leizhang@cqu.edu.cn
Abstract. Convolutional neural networks (CNNs) have demonstrated
gratifying results at learning discriminative features. However, when ap-
plied to unseen domains, state-of-the-art models are usually prone to
errors due to domain shift. After investigating this issue from the per-
spective of shortcut learning, we find the devils lie in the fact that models
trained on different domains merely bias to different domain-specific fea-
tures yet overlook diverse task-related features. Under this guidance, a
novel Attention Diversification framework is proposed, in which Intra-
Model and Inter-Model Attention Diversification Regularization are col-
laborated to reassign appropriate attention to diverse task-related fea-
tures. Briefly, Intra-Model Attention Diversification Regularization is
equipped on the high-level feature maps to achieve in-channel discrimi-
nation and cross-channel diversification via forcing different channels to
pay their most salient attention to different spatial locations. Besides,
Inter-Model Attention Diversification Regularization is proposed to fur-
ther provide task-related attention diversification and domain-related at-
tention suppression, which is a paradigm of “simulate, divide and assem-
ble”: simulate domain shift via exploiting multiple domain-specific mod-
els, divide attention maps into task-related and domain-related groups,
and assemble them within each group respectively to execute regular-
ization. Extensive experiments and analyses are conducted on various
benchmarks to demonstrate that our method achieves state-of-the-art
performance over other competing methods. Code is available at https:
//github.com/hikvision-research/DomainGeneralization.
Keywords: Domain Generalization, Attention Diversification
1 Introduction
Domain is clarified as the feature space and marginal probability distribution
for a specific dataset [2, 3]. And domain shift reveals the discrepancy between
Equal contribution. Corresponding authors.
arXiv:2210.04206v1 [cs.CV] 9 Oct 2022
2 R. Meng et al.
Domain Attention Bias
Model
C
Model
C
Model
AModel
S
Model
S
Domain Art Domain Cartoon
Model
CModel
S
Model
S
Domain Sketch
Domain
Photo
Model
A
Fig. 1. The visualization of do-
main attention bias on PACS
dataset. Domain-specific mod-
els trained on different domains
(ACS) pay attention to different
regions when they are tested on
an unseen domain (P).
source and target domains [2, 3, 57], which in-
duces the models trained on source domains
to perform defectively on an unseen target do-
main. Domain adaptation (DA) aims to remedy
this issue of domain shift for various tasks in
cases that target data is available [7, 8, 29, 32,
36, 39, 49, 62, 72, 73]. However, the domain shift
is usually agnostic in real-world scenarios since
the target data is not available for training. This
issue inspires the research area of domain gen-
eralization (DG) [1, 22, 27, 28, 30, 34, 41, 43, 45,
47, 51, 52, 54, 74, 75, 78–80], which is aimed to
make models trained on seen domains achieve
accurate predictions on unseen domainss, i.e.,
the conditional distribution P(Y|X) is robust
with shifted marginal distribution P(X).
Canonical DG focuses on learning a domain-
invariant feature distribution P(F(X)) across
domains for the robustness of conditional dis-
tribution P(Y|F(X)). In fact, the domain issue
can be revisited from the perspective of shortcut learning [15], which indicates
that models attempt to find the simplest solution to solve a given task. Models
trained on specific domains merely pay attention to salient domain-related fea-
tures while overlooking other diverse task-related information. When the domain
shifts, the discrimination of the biased features will not be held on the unseen
domain, leading to the shift of the conditional distribution. This problematic
phenomenon is dubbed as “domain attention bias” as shown in Fig. 1.
In this paper, we propose the Attention Diversification framework, in which
the attention mechanism is served as the bridge to achieve the invariance of
conditional distribution. In our framework, the proposed Intra-Model Attention
Diversification Regularization (Intra-ADR) and Inter-Model Attention Diversi-
fication Regularization (Inter-ADR) are collaborated to rearrange appropriate
spatial attention to diverse task-related features from coarse to fine. The reasons
why the two components are designed in our framework are detailed as follows:
Intra-Model Attention Diversification Regularization. According to the
principle of maximum entropy [18], when estimating the probability distribution,
we should select that distribution which leaves us the largest uncertainty under
our constraints, so that we cannot bring any additional assumptions into our
computation. That is, when testing the unseen domains, each task-related feature
is equally-useful (i.e., the maximum entropy), driving us to propose Intra-ADR,
which coarsely recalls overlooked features outside the domain attention bias as
much as possible. This is done via forcing different channels to pay attention to
different spatial locations, leading all spatial locations to be activated. To this
end, in-channel discrimination and cross-channel diversification are facilitated.
Attention Diversification for Domain Generalization 3
Although the Intra-ADR is equipped upon the high-level features, not all
spatial regions are consistent with the semantics of the categories. As stated
in [15], the background regions mainly involve domain-related features, and some
parts of foreground regions are also affected by domain-specific styles [21, 23].
Since the Intra-ADR fails to distinguish features at the finer level into task-
related and domain-related ones, the excessive attention is incidentally imposed
upon domain-related features, leading to the conditional distribution shift. Thus,
an attention diversification paradigm at a finer level is necessary.
Inter-Model Attention Diversification Regularization. To handle the
aforementioned issue, features that Intra-ADR coarsely recalls ought to be fur-
ther refined by Inter-ADR. Thus, the diverse attention for task-related features
is encouraged, yet the excessive attention for domain-related ones is suppressed.
Inter-ADR is a paradigm of “simulate, divide and assemble”. Specifically, 1)
simulate”: we train multiple domain-specific models for each seen domain, and
then infer these models on samples from other training domains to simulate do-
main shift. In addition, the attention maps and predictions for agnostic domains
are generated; 2) “divide”: we divide attention maps from domain-specific models
and domain-aggregated model into the task-related and domain-related groups,
according to whether the model predictions is consistent with the corresponding
ground truth; 3) “assemble”: attention maps from different models are assembled
within each group as the task-related and domain-related inter-model attention
maps, respectively. Finally, the attention maps of the domain-aggregated model
can be regularized with the task-related and domain-related inter-model atten-
tion maps, to diversify task-related attention regions yet suppress domain-related
attention regions.
Extensive experiments and analyses are conducted on multiple domain gen-
eralization datasets. Our optimization method achieves state-of-the-art results.
It is worth emphasizing that our method can bring further performance improve-
ment in conjunction with other DG methods.
2 Related Works
Domain Generalization. The analysis in [2] proves that the features tend
to be general and can be transferred to unseen domains if they are invariant
across different domains. Following this research, a sequence of domain align-
ment methods is proposed, which reduce the feature discrepancy among multi-
ple source domains via aligning domain-invariant features. These methods enable
models to generalize well to unseen target domains. Specifically, they use explicit
feature alignment by minimizing the maximum mean discrepancy (MMD) [58]
or using Instance Normalization (IN) layers [43]. Alternatively, [22, 47] adopt
domain adversarial learning for domain alignment, which trains a discriminator
to distinguish the domains while training feature extractors to cheat the do-
main discriminator for learning domain-invariant features. Besides, the ability
of generalizing to unseen domains will increase as training data covering more
4 R. Meng et al.
diverse domains. Several domain diversification attempts had been implemented
in previous works: swapping the shape or style information of two images [25],
mixing instance-level features of training samples across domains [78], altering
the location and scene of objects [46], and simulating the actual environment
for generating more training data [56]. In contrast, we investigate the issue of
DG inspired by shortcut learning and maximum entropy principle. Besides, we
introduce visual attention in our proposed method to boost DG, which is seldom
studied in prior works.
Visual Attention. Visual attention has been widely used in deep learning
and achieves remarkable advances [59, 69]. It has been exploited in computer
vision tasks such as image recognition [9, 10, 33, 48, 61, 71] and object detection
among others [6, 16, 35, 66, 67]. CAM [77] provides the attention visualization
of feature maps for model interpretable analysis. In essence, visual attention
can be interpreted as an allocation mechanism for the model learning resource:
it assigns high weights to what the model considers valuable, and vice versa,
assigns low weight to what the model considered negligible [70]. Motivated by
this mechanism, many computer vision tasks achieve breakthrough. For example,
many fine-grained image classification methods learn multi-attention to capture
sufficient subtle inter-category differences [14,53, 68,76]. Recently, self-attention
[13, 20, 64] has emerged to model the long-range dependencies. In the field of
transfer learning, Attentional Heterogeneous Transfer (AHT) [40] designed a
new heterogeneous transfer learning approach to transfer knowledge from an
optimized subset of source domain samples to a target domain. Transferable
Attention for Domain Adaptation (TADA) [63] is proposed to use transferable
global and local attention with multi-region-level domain discriminators to pick
out the images and the transferable areas of the image.
Our work finds that CNN allocates sufficient attention to domain-related
features, but insufficient attention to task-related features conversely. Under this
consideration, we adopt spatial attention as a bridge to learn diverse transferable
features to mitigate domain shifts.
3 Method
Our proposed Attention Diversification framework is composed of Intra-ADR
and Inter-ADR as shown in Fig. 2. Our framework aims to deny shortcut learn-
ing, which ignores numerous task-related features. The Intra-ADR and Inter-
ADR are collaborated to diversify attention regions for task-related features.
Notations. Given Straining domains {Dd}S
d=1, where Dd={(xd
i, yd
i)}Nd
i=1 with
Ndlabeled samples covering Zcategories. Let Mdenote the CNN model used
for image classification. Suppose Xb
jRCb×Hb×Wbdenote the feature maps
output from the b-th block of the model Mj, where Cb,Hband Wbdenote
the channel number, height and width of Xb
j, and b∈ {1, ..., B}. We denote
the domain-specific models and domain-aggregated model as {Mj}S
j=0, where
M1, ..., MSrepresent the former which is trained on the corresponding single
摘要:

AttentionDiversificationforDomainGeneralizationRangMeng1,⋆,XianfengLi1,⋆,WeijieChen2,1,,ShicaiYang1,,JieSong2,XinchaoWang3,LeiZhang4,MingliSong2,DiXie1,andShiliangPu11HikvisionResearchInstitute,Hangzhou,China2ZhejiangUniversity,Hangzhou,China3NationalUniversityofSingapore,Singapore4ChongqingUniversi...

展开>> 收起<<
Attention Diversification for Domain Generalization Rang Meng1 Xianfeng Li1 Weijie Chen21 Shicai Yang1 Jie Song2.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:1.19MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注