WUDA Unsupervised Domain Adaptation Based on Weak Source Domain Labels Shengjie Liu1 Chuang Zhu1 Wenqi Tang1 1Beijing University of Posts and Telecommunications Beijing China

2025-04-29 0 0 7.76MB 10 页 10玖币

侵权投诉

WUDA: Unsupervised Domain Adaptation Based on Weak Source Domain Labels

Shengjie Liu1, Chuang Zhu*1, Wenqi Tang1

1Beijing University of Posts and Telecommunications, Beijing, China

{shengjie.Liu, czhu, tangwenqi}@bupt.edu.cn

Abstract

Unsupervised domain adaptation (UDA) for semantic seg-

mentation addresses the cross-domain problem with ﬁne

source domain labels. However, the acquisition of seman-

tic labels has always been a difﬁcult step, many scenarios

only have weak labels (e.g. bounding boxes). For scenar-

ios where weak supervision and cross-domain problems co-

exist, this paper deﬁnes a new task: unsupervised domain

adaptation based on weak source domain labels (WUDA).

To explore solutions for this task, this paper proposes two

intuitive frameworks: 1) Perform weakly supervised seman-

tic segmentation in the source domain, and then implement

unsupervised domain adaptation; 2) Train an object detec-

tion model using source domain data, then detect objects

in the target domain and implement weakly supervised se-

mantic segmentation. We observe that the two frameworks

behave differently when the datasets change. Therefore, we

construct dataset pairs with a wide range of domain shifts

and conduct extended experiments to analyze the impact of

different domain shifts on the two frameworks. In addition,

to measure domain shift, we apply the metric representa-

tion shift to urban landscape image segmentation for the ﬁrst

time. The source code and constructed datasets are available

at https://github.com/bupt-ai-cz/WUDA.

Introduction

As one of the most popular computer vision technologies,

semantic segmentation has developed very mature and is

widely used in many scenarios such as autonomous driving,

remote sensing image recognition, and medical image pro-

cessing. However, like most deep learning models, a seri-

ous problem faced by semantic segmentation models in the

application is the existence of domain shift. Since the data

distribution of the target domain may be very different from

the source domain, a model that performs well in the source

domain may experience catastrophic performance degrada-

tion in the target domain. For cross-domain problems, many

studies on domain adaptation have improved the generaliza-

tion ability of the model. Among these studies, unsupervised

domain adaptation (UDA) for semantic segmentation can

ﬁne-tune a trained model by using self-training, adversar-

ial training, or image style transfer without any additional

annotations of the target domain. Methods of self-training

*Corresponding author.

image + bounding boxes image

source domain

target domain

car truck

person vegetation

pole

traffic light

building

……

Figure 1: Schematic diagram of WUDA. The source domain

has only bounding boxes and the target domain has no anno-

tations. WUDA achieves semantic segmentation of the target

domain images under these conditions.

(Zou et al. 2018,2019) and adversarial training (Tsai et al.

2018;Luo et al. 2019b) can adapt the model to the distribu-

tion of the target domain images. Image style transfer meth-

ods (Yang and Soatto 2020;Yang et al. 2020a) can bring the

target domain data closer to the distribution of the source

domain.

In deep learning tasks, massive amounts of samples are

required for training to improve the robustness of the model,

therefore, the annotation of large-scale datasets is another

difﬁculty in training deep models. In the semantic segmen-

tation task, in order to obtain pixel-wise mask annotations,

it takes a lot of time for a single image (e.g. it takes 1.5

hours to label one image in the Cityscapes (Cordts et al.

2016) dataset), and the annotation of the entire dataset re-

quires huge manpower. Therefore, many computer vision

datasets have only weak annotations (e.g. bounding boxes,

points etc.)

When the cross-domain problem and the weak labels co-

exist (only source domain bounding boxes and target do-

main images are available), the domain shift and the weak

supervision both bring a negative contribution to the pixel-

level semantic segmentation. In this case, it is a challenge

to achieve accurate semantic segmentation in the target do-

main. We deﬁne this task as Unsupervised Domain Adapta-

tion Based on Weak Source Domain Labels (WUDA). The

arXiv:2210.02088v1 [cs.CV] 5 Oct 2022

schematic diagram of WUDA is shown in Figure 1. For the

newly deﬁned task, it is necessary to explore a framework to

tackle the problems of transfer learning and weakly super-

vised segmentation at the same time. The realization of this

task can reduce the requirements for source domain labels in

future UDA tasks.

In summary, this paper makes the following contributions:

• We deﬁne a novel task: unsupervised domain adapta-

tion based on weak source domain labels (WUDA). For

this task, we propose two intuitive frameworks: Weakly

Supervised Semantic Segmentation + Unsupervised Do-

main Adaptation (WSSS-UDA) and Target Domain Ob-

ject Detection + Weakly Supervised Semantic Segmen-

tation (TDOD-WSSS).

• We benchmark typical weakly supervised semantic seg-

mentation, unsupervised domain adaptation, and object

detection techniques under our two proposed frame-

works, and ﬁnd that the results of framework WSSS-

UDA can reach 83% of the UDA method with ﬁne source

domain labels.

• We construct a series of datasets with different domain

shifts. To the best of our knowledge, we are the ﬁrst to

use representation shift for domain shift measurement in

urban landscape datasets. The constructed dataset will be

open for research on WUDA/UDA under multiple do-

main shifts.

• To further analyze the impact of different degrees of do-

main shift on our proposed frameworks, we conduct ex-

tended experiments using our constructed datasets and

ﬁnd that framework TDOD-WSSS is more sensitive to

changes in domain shift than framework WSSS-UDA.

Related Work

WUDA will involve weakly supervised semantic segmenta-

tion, unsupervised domain adaptation, object detection, and

the measure of domain shift techniques. In this section, we

will review these related previous works.

Weakly Supervised Semantic Segmentation

In computer vision tasks, pixel-wise mask annotations takes

far more time compared to weak annotations (Lin et al.

2014), and the need for time-saving motivates weakly super-

vised semantic segmentation. Labels for weakly supervised

segmentation can be bounding boxes, points, scribbles and

image-level tags. Methods (Dai, He, and Sun 2015;Khoreva

et al. 2017;Li, Arnab, and Torr 2018;Song et al. 2019;

Kulharia et al. 2020) using bounding boxes as supervision

usually employ GrabCut (Rother, Kolmogorov, and Blake

2004) or segment proposals techniques to get more accu-

rate semantic labels and can achieve results close (95% or

even higher) to fully supervised methods. Point-supervisied

and scribble-supervised methods (Bearman et al. 2016;Qian

et al. 2019;Lin et al. 2016;Vernaza and Chandraker 2017;

Tang et al. 2018a,b) take advantage of location and category

information in annotations and achieve excellent segmen-

tation results. Tag-supervised methods (Jiang et al. 2019;

Wang et al. 2020b;Lee et al. 2021b;Li et al. 2021b) of-

ten use class activation mapping (CAM) (Zhou et al. 2016)

algorithm to obtain localization maps of the main objects in

the images.

Unsupervised Domain Adaptation for Semantic

Segmentation

Unsupervised Domain Adaptation (UDA) is committed to

solving the problem of poor model generalization caused by

inconsistent data distribution in the source and target do-

mains. Self-training (ST) and adversarial training (AT) are

key schemes of UDA: self-training schemes (Zou et al. 2018,

2019;Lian et al. 2019;Li et al. 2020;Lv et al. 2020;Melas-

Kyriazi and Manrai 2021;Tranheden et al. 2021;Guo et al.

2021;Araslanov and Roth 2021) typically set a threshold to

ﬁlter pseudo-labels with high conﬁdence on the target do-

main, and use the pseudo-labels to supervise target domain

training; adversarial training methods (Tsai et al. 2018;Luo

et al. 2019b,a;Du et al. 2019;Vu et al. 2019;Tsai et al. 2019;

Yang et al. 2020b;Wang et al. 2020a;Li et al. 2021a) usually

add a domain discriminator to the model. The adversarial

game of the segmenter and the discriminator can make the

segmentation results of the source and target domains tend

to be consistent. There are also works (Zhang et al. 2019;

Pan et al. 2020;Wang et al. 2020c;Yu et al. 2021;Wang,

Peng, and Zhang 2021;Mei et al. 2020) that perform both

self-training and adversarial training to achieve good seg-

mentation results on the target domain.

Object Detection

Autonomous driving technology has greatly promoted the

development of object detection. There are many pioneering

works that can be widely used in various object detection

tasks, such as some two-stage methods (Girshick et al. 2014;

Girshick 2015;Ren et al. 2015) that ﬁrst perform object ex-

traction, and then classify the extracted objects. Yolo series

of algorithms (Redmon et al. 2016;Redmon and Farhadi

2017,2018;Bochkovskiy, Wang, and Liao 2020;Jocher

et al. 2020) can simultaneously achieve object extraction

and classiﬁcation in one network. The current popular ob-

ject detection method Yolov5 (Jocher et al. 2020) has been

able to achieve 72.7% mean average precision (mAP) on the

coco2017 val dataset. Object detection techniques also help

to extract bounding boxes in weakly supervised segmenta-

tion methods (Lan et al. 2021;Lee et al. 2021a).

Domain Shift Assessment

Domain shift comes from the difference between the source

and target domain data. There are various factors (e.g. im-

age content, view angle, image texture, etc.) that contribute

to domain shift. While for Convolutional Neural Networks

(CNN), texture is the most critical factor. Many studies

(Geirhos et al. 2018;Nam et al. 2019) suggest that the focus

of Convolutional Neural Networks (CNN) and human eyes

is different when processing images: human eyes are sensi-

tive to the content information of the image (e.g. shapes),

while CNN is more sensitive to the style of the image (e.g.

texture). Actually, if it involves the calculation of image tex-

ture differences, most methods are based on the output fea-

tures of the middle layer of the neural network. For example,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

WUDA:UnsupervisedDomainAdaptationBasedonWeakSourceDomainLabelsShengjieLiu1,ChuangZhu*1,WenqiTang11BeijingUniversityofPostsandTelecommunications,Beijing,Chinafshengjie.Liu,czhu,tangwenqig@bupt.edu.cnAbstractUnsuperviseddomainadaptation(UDA)forsemanticseg-mentationaddressesthecross-domainproblemwithn...

展开>> 收起<<

WUDA Unsupervised Domain Adaptation Based on Weak Source Domain Labels Shengjie Liu1 Chuang Zhu1 Wenqi Tang1 1Beijing University of Posts and Telecommunications Beijing China.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

WUDA Unsupervised Domain Adaptation Based on Weak Source Domain Labels Shengjie Liu1 Chuang Zhu1 Wenqi Tang1 1Beijing University of Posts and Telecommunications Beijing China

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: