Attention Diversification for Domain Generalization Rang Meng1 Xianfeng Li1 Weijie Chen21 Shicai Yang1 Jie Song2

2025-05-02 0 0 1.19MB 19 页 10玖币

侵权投诉

Attention Diversiﬁcation for Domain

Generalization

Rang Meng1,⋆, Xianfeng Li1,⋆, Weijie Chen2,1,, Shicai Yang1,, Jie Song2,

Xinchao Wang3, Lei Zhang4, Mingli Song2, Di Xie1, and Shiliang Pu1

1Hikvision Research Institute, Hangzhou, China

2Zhejiang University, Hangzhou, China

3National University of Singapore, Singapore

4Chongqing University, Chongqing, China

{mengrang, lixianfeng6, chenweijie5, yangshicai, xiedi,

pushiliang.hri}@hikvision.com, {sjie, songml}@zju.edu.cn,

xinchao@nus.edu.sg, leizhang@cqu.edu.cn

Abstract. Convolutional neural networks (CNNs) have demonstrated

gratifying results at learning discriminative features. However, when ap-

plied to unseen domains, state-of-the-art models are usually prone to

errors due to domain shift. After investigating this issue from the per-

spective of shortcut learning, we ﬁnd the devils lie in the fact that models

trained on diﬀerent domains merely bias to diﬀerent domain-speciﬁc fea-

tures yet overlook diverse task-related features. Under this guidance, a

novel Attention Diversiﬁcation framework is proposed, in which Intra-

Model and Inter-Model Attention Diversiﬁcation Regularization are col-

laborated to reassign appropriate attention to diverse task-related fea-

tures. Brieﬂy, Intra-Model Attention Diversiﬁcation Regularization is

equipped on the high-level feature maps to achieve in-channel discrimi-

nation and cross-channel diversiﬁcation via forcing diﬀerent channels to

pay their most salient attention to diﬀerent spatial locations. Besides,

Inter-Model Attention Diversiﬁcation Regularization is proposed to fur-

ther provide task-related attention diversiﬁcation and domain-related at-

tention suppression, which is a paradigm of “simulate, divide and assem-

ble”: simulate domain shift via exploiting multiple domain-speciﬁc mod-

els, divide attention maps into task-related and domain-related groups,

and assemble them within each group respectively to execute regular-

ization. Extensive experiments and analyses are conducted on various

benchmarks to demonstrate that our method achieves state-of-the-art

performance over other competing methods. Code is available at https:

//github.com/hikvision-research/DomainGeneralization.

Keywords: Domain Generalization, Attention Diversiﬁcation

1 Introduction

Domain is clariﬁed as the feature space and marginal probability distribution

for a speciﬁc dataset [2, 3]. And domain shift reveals the discrepancy between

⋆Equal contribution. Corresponding authors.

arXiv:2210.04206v1 [cs.CV] 9 Oct 2022

2 R. Meng et al.

Domain Attention Bias

Model

AModel

Model

Domain Art Domain Cartoon

Model

CModel

Model

Domain Sketch

Domain

Photo

Model

Fig. 1. The visualization of do-

main attention bias on PACS

dataset. Domain-speciﬁc mod-

els trained on diﬀerent domains

(ACS) pay attention to diﬀerent

regions when they are tested on

an unseen domain (P).

source and target domains [2, 3, 57], which in-

duces the models trained on source domains

to perform defectively on an unseen target do-

main. Domain adaptation (DA) aims to remedy

this issue of domain shift for various tasks in

cases that target data is available [7, 8, 29, 32,

36, 39, 49, 62, 72, 73]. However, the domain shift

is usually agnostic in real-world scenarios since

the target data is not available for training. This

issue inspires the research area of domain gen-

eralization (DG) [1, 22, 27, 28, 30, 34, 41, 43, 45,

47, 51, 52, 54, 74, 75, 78–80], which is aimed to

make models trained on seen domains achieve

accurate predictions on unseen domainss, i.e.,

the conditional distribution P(Y|X) is robust

with shifted marginal distribution P(X).

Canonical DG focuses on learning a domain-

invariant feature distribution P(F(X)) across

domains for the robustness of conditional dis-

tribution P(Y|F(X)). In fact, the domain issue

can be revisited from the perspective of shortcut learning [15], which indicates

that models attempt to ﬁnd the simplest solution to solve a given task. Models

trained on speciﬁc domains merely pay attention to salient domain-related fea-

tures while overlooking other diverse task-related information. When the domain

shifts, the discrimination of the biased features will not be held on the unseen

domain, leading to the shift of the conditional distribution. This problematic

phenomenon is dubbed as “domain attention bias” as shown in Fig. 1.

In this paper, we propose the Attention Diversiﬁcation framework, in which

the attention mechanism is served as the bridge to achieve the invariance of

conditional distribution. In our framework, the proposed Intra-Model Attention

Diversiﬁcation Regularization (Intra-ADR) and Inter-Model Attention Diversi-

ﬁcation Regularization (Inter-ADR) are collaborated to rearrange appropriate

spatial attention to diverse task-related features from coarse to ﬁne. The reasons

why the two components are designed in our framework are detailed as follows:

Intra-Model Attention Diversiﬁcation Regularization. According to the

principle of maximum entropy [18], when estimating the probability distribution,

we should select that distribution which leaves us the largest uncertainty under

our constraints, so that we cannot bring any additional assumptions into our

computation. That is, when testing the unseen domains, each task-related feature

is equally-useful (i.e., the maximum entropy), driving us to propose Intra-ADR,

which coarsely recalls overlooked features outside the domain attention bias as

much as possible. This is done via forcing diﬀerent channels to pay attention to

diﬀerent spatial locations, leading all spatial locations to be activated. To this

end, in-channel discrimination and cross-channel diversiﬁcation are facilitated.

Attention Diversiﬁcation for Domain Generalization 3

Although the Intra-ADR is equipped upon the high-level features, not all

spatial regions are consistent with the semantics of the categories. As stated

in [15], the background regions mainly involve domain-related features, and some

parts of foreground regions are also aﬀected by domain-speciﬁc styles [21, 23].

Since the Intra-ADR fails to distinguish features at the ﬁner level into task-

related and domain-related ones, the excessive attention is incidentally imposed

upon domain-related features, leading to the conditional distribution shift. Thus,

an attention diversiﬁcation paradigm at a ﬁner level is necessary.

Inter-Model Attention Diversiﬁcation Regularization. To handle the

aforementioned issue, features that Intra-ADR coarsely recalls ought to be fur-

ther reﬁned by Inter-ADR. Thus, the diverse attention for task-related features

is encouraged, yet the excessive attention for domain-related ones is suppressed.

Inter-ADR is a paradigm of “simulate, divide and assemble”. Speciﬁcally, 1)

“simulate”: we train multiple domain-speciﬁc models for each seen domain, and

then infer these models on samples from other training domains to simulate do-

main shift. In addition, the attention maps and predictions for agnostic domains

are generated; 2) “divide”: we divide attention maps from domain-speciﬁc models

and domain-aggregated model into the task-related and domain-related groups,

according to whether the model predictions is consistent with the corresponding

ground truth; 3) “assemble”: attention maps from diﬀerent models are assembled

within each group as the task-related and domain-related inter-model attention

maps, respectively. Finally, the attention maps of the domain-aggregated model

can be regularized with the task-related and domain-related inter-model atten-

tion maps, to diversify task-related attention regions yet suppress domain-related

attention regions.

Extensive experiments and analyses are conducted on multiple domain gen-

eralization datasets. Our optimization method achieves state-of-the-art results.

It is worth emphasizing that our method can bring further performance improve-

ment in conjunction with other DG methods.

2 Related Works

Domain Generalization. The analysis in [2] proves that the features tend

to be general and can be transferred to unseen domains if they are invariant

across diﬀerent domains. Following this research, a sequence of domain align-

ment methods is proposed, which reduce the feature discrepancy among multi-

ple source domains via aligning domain-invariant features. These methods enable

models to generalize well to unseen target domains. Speciﬁcally, they use explicit

feature alignment by minimizing the maximum mean discrepancy (MMD) [58]

or using Instance Normalization (IN) layers [43]. Alternatively, [22, 47] adopt

domain adversarial learning for domain alignment, which trains a discriminator

to distinguish the domains while training feature extractors to cheat the do-

main discriminator for learning domain-invariant features. Besides, the ability

of generalizing to unseen domains will increase as training data covering more

4 R. Meng et al.

diverse domains. Several domain diversiﬁcation attempts had been implemented

in previous works: swapping the shape or style information of two images [25],

mixing instance-level features of training samples across domains [78], altering

the location and scene of objects [46], and simulating the actual environment

for generating more training data [56]. In contrast, we investigate the issue of

DG inspired by shortcut learning and maximum entropy principle. Besides, we

introduce visual attention in our proposed method to boost DG, which is seldom

studied in prior works.

Visual Attention. Visual attention has been widely used in deep learning

and achieves remarkable advances [59, 69]. It has been exploited in computer

vision tasks such as image recognition [9, 10, 33, 48, 61, 71] and object detection

among others [6, 16, 35, 66, 67]. CAM [77] provides the attention visualization

of feature maps for model interpretable analysis. In essence, visual attention

can be interpreted as an allocation mechanism for the model learning resource:

it assigns high weights to what the model considers valuable, and vice versa,

assigns low weight to what the model considered negligible [70]. Motivated by

this mechanism, many computer vision tasks achieve breakthrough. For example,

many ﬁne-grained image classiﬁcation methods learn multi-attention to capture

suﬃcient subtle inter-category diﬀerences [14,53, 68,76]. Recently, self-attention

[13, 20, 64] has emerged to model the long-range dependencies. In the ﬁeld of

transfer learning, Attentional Heterogeneous Transfer (AHT) [40] designed a

new heterogeneous transfer learning approach to transfer knowledge from an

optimized subset of source domain samples to a target domain. Transferable

Attention for Domain Adaptation (TADA) [63] is proposed to use transferable

global and local attention with multi-region-level domain discriminators to pick

out the images and the transferable areas of the image.

Our work ﬁnds that CNN allocates suﬃcient attention to domain-related

features, but insuﬃcient attention to task-related features conversely. Under this

consideration, we adopt spatial attention as a bridge to learn diverse transferable

features to mitigate domain shifts.

3 Method

Our proposed Attention Diversiﬁcation framework is composed of Intra-ADR

and Inter-ADR as shown in Fig. 2. Our framework aims to deny shortcut learn-

ing, which ignores numerous task-related features. The Intra-ADR and Inter-

ADR are collaborated to diversify attention regions for task-related features.

Notations. Given Straining domains {Dd}S

d=1, where Dd={(xd

i, yd

i)}Nd

i=1 with

Ndlabeled samples covering Zcategories. Let Mdenote the CNN model used

for image classiﬁcation. Suppose Xb

j∈RCb×Hb×Wbdenote the feature maps

output from the b-th block of the model Mj, where Cb,Hband Wbdenote

the channel number, height and width of Xb

j, and b∈ {1, ..., B}. We denote

the domain-speciﬁc models and domain-aggregated model as {Mj}S

j=0, where

M1, ..., MSrepresent the former which is trained on the corresponding single

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AttentionDiversificationforDomainGeneralizationRangMeng1,⋆,XianfengLi1,⋆,WeijieChen2,1,,ShicaiYang1,,JieSong2,XinchaoWang3,LeiZhang4,MingliSong2,DiXie1,andShiliangPu11HikvisionResearchInstitute,Hangzhou,China2ZhejiangUniversity,Hangzhou,China3NationalUniversityofSingapore,Singapore4ChongqingUniversi...

展开>> 收起<<

Attention Diversification for Domain Generalization Rang Meng1 Xianfeng Li1 Weijie Chen21 Shicai Yang1 Jie Song2.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Attention Diversification for Domain Generalization Rang Meng1 Xianfeng Li1 Weijie Chen21 Shicai Yang1 Jie Song2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: