WUDA Unsupervised Domain Adaptation Based on Weak Source Domain Labels Shengjie Liu1 Chuang Zhu1 Wenqi Tang1 1Beijing University of Posts and Telecommunications Beijing China

2025-04-29 0 0 7.76MB 10 页 10玖币
侵权投诉
WUDA: Unsupervised Domain Adaptation Based on Weak Source Domain Labels
Shengjie Liu1, Chuang Zhu*1, Wenqi Tang1
1Beijing University of Posts and Telecommunications, Beijing, China
{shengjie.Liu, czhu, tangwenqi}@bupt.edu.cn
Abstract
Unsupervised domain adaptation (UDA) for semantic seg-
mentation addresses the cross-domain problem with fine
source domain labels. However, the acquisition of seman-
tic labels has always been a difficult step, many scenarios
only have weak labels (e.g. bounding boxes). For scenar-
ios where weak supervision and cross-domain problems co-
exist, this paper defines a new task: unsupervised domain
adaptation based on weak source domain labels (WUDA).
To explore solutions for this task, this paper proposes two
intuitive frameworks: 1) Perform weakly supervised seman-
tic segmentation in the source domain, and then implement
unsupervised domain adaptation; 2) Train an object detec-
tion model using source domain data, then detect objects
in the target domain and implement weakly supervised se-
mantic segmentation. We observe that the two frameworks
behave differently when the datasets change. Therefore, we
construct dataset pairs with a wide range of domain shifts
and conduct extended experiments to analyze the impact of
different domain shifts on the two frameworks. In addition,
to measure domain shift, we apply the metric representa-
tion shift to urban landscape image segmentation for the first
time. The source code and constructed datasets are available
at https://github.com/bupt-ai-cz/WUDA.
Introduction
As one of the most popular computer vision technologies,
semantic segmentation has developed very mature and is
widely used in many scenarios such as autonomous driving,
remote sensing image recognition, and medical image pro-
cessing. However, like most deep learning models, a seri-
ous problem faced by semantic segmentation models in the
application is the existence of domain shift. Since the data
distribution of the target domain may be very different from
the source domain, a model that performs well in the source
domain may experience catastrophic performance degrada-
tion in the target domain. For cross-domain problems, many
studies on domain adaptation have improved the generaliza-
tion ability of the model. Among these studies, unsupervised
domain adaptation (UDA) for semantic segmentation can
fine-tune a trained model by using self-training, adversar-
ial training, or image style transfer without any additional
annotations of the target domain. Methods of self-training
*Corresponding author.
image + bounding boxes image
source domain
target domain
car truck
person vegetation
pole
traffic light
building
……
Figure 1: Schematic diagram of WUDA. The source domain
has only bounding boxes and the target domain has no anno-
tations. WUDA achieves semantic segmentation of the target
domain images under these conditions.
(Zou et al. 2018,2019) and adversarial training (Tsai et al.
2018;Luo et al. 2019b) can adapt the model to the distribu-
tion of the target domain images. Image style transfer meth-
ods (Yang and Soatto 2020;Yang et al. 2020a) can bring the
target domain data closer to the distribution of the source
domain.
In deep learning tasks, massive amounts of samples are
required for training to improve the robustness of the model,
therefore, the annotation of large-scale datasets is another
difficulty in training deep models. In the semantic segmen-
tation task, in order to obtain pixel-wise mask annotations,
it takes a lot of time for a single image (e.g. it takes 1.5
hours to label one image in the Cityscapes (Cordts et al.
2016) dataset), and the annotation of the entire dataset re-
quires huge manpower. Therefore, many computer vision
datasets have only weak annotations (e.g. bounding boxes,
points etc.)
When the cross-domain problem and the weak labels co-
exist (only source domain bounding boxes and target do-
main images are available), the domain shift and the weak
supervision both bring a negative contribution to the pixel-
level semantic segmentation. In this case, it is a challenge
to achieve accurate semantic segmentation in the target do-
main. We define this task as Unsupervised Domain Adapta-
tion Based on Weak Source Domain Labels (WUDA). The
arXiv:2210.02088v1 [cs.CV] 5 Oct 2022
schematic diagram of WUDA is shown in Figure 1. For the
newly defined task, it is necessary to explore a framework to
tackle the problems of transfer learning and weakly super-
vised segmentation at the same time. The realization of this
task can reduce the requirements for source domain labels in
future UDA tasks.
In summary, this paper makes the following contributions:
We define a novel task: unsupervised domain adapta-
tion based on weak source domain labels (WUDA). For
this task, we propose two intuitive frameworks: Weakly
Supervised Semantic Segmentation + Unsupervised Do-
main Adaptation (WSSS-UDA) and Target Domain Ob-
ject Detection + Weakly Supervised Semantic Segmen-
tation (TDOD-WSSS).
We benchmark typical weakly supervised semantic seg-
mentation, unsupervised domain adaptation, and object
detection techniques under our two proposed frame-
works, and find that the results of framework WSSS-
UDA can reach 83% of the UDA method with fine source
domain labels.
We construct a series of datasets with different domain
shifts. To the best of our knowledge, we are the first to
use representation shift for domain shift measurement in
urban landscape datasets. The constructed dataset will be
open for research on WUDA/UDA under multiple do-
main shifts.
To further analyze the impact of different degrees of do-
main shift on our proposed frameworks, we conduct ex-
tended experiments using our constructed datasets and
find that framework TDOD-WSSS is more sensitive to
changes in domain shift than framework WSSS-UDA.
Related Work
WUDA will involve weakly supervised semantic segmenta-
tion, unsupervised domain adaptation, object detection, and
the measure of domain shift techniques. In this section, we
will review these related previous works.
Weakly Supervised Semantic Segmentation
In computer vision tasks, pixel-wise mask annotations takes
far more time compared to weak annotations (Lin et al.
2014), and the need for time-saving motivates weakly super-
vised semantic segmentation. Labels for weakly supervised
segmentation can be bounding boxes, points, scribbles and
image-level tags. Methods (Dai, He, and Sun 2015;Khoreva
et al. 2017;Li, Arnab, and Torr 2018;Song et al. 2019;
Kulharia et al. 2020) using bounding boxes as supervision
usually employ GrabCut (Rother, Kolmogorov, and Blake
2004) or segment proposals techniques to get more accu-
rate semantic labels and can achieve results close (95% or
even higher) to fully supervised methods. Point-supervisied
and scribble-supervised methods (Bearman et al. 2016;Qian
et al. 2019;Lin et al. 2016;Vernaza and Chandraker 2017;
Tang et al. 2018a,b) take advantage of location and category
information in annotations and achieve excellent segmen-
tation results. Tag-supervised methods (Jiang et al. 2019;
Wang et al. 2020b;Lee et al. 2021b;Li et al. 2021b) of-
ten use class activation mapping (CAM) (Zhou et al. 2016)
algorithm to obtain localization maps of the main objects in
the images.
Unsupervised Domain Adaptation for Semantic
Segmentation
Unsupervised Domain Adaptation (UDA) is committed to
solving the problem of poor model generalization caused by
inconsistent data distribution in the source and target do-
mains. Self-training (ST) and adversarial training (AT) are
key schemes of UDA: self-training schemes (Zou et al. 2018,
2019;Lian et al. 2019;Li et al. 2020;Lv et al. 2020;Melas-
Kyriazi and Manrai 2021;Tranheden et al. 2021;Guo et al.
2021;Araslanov and Roth 2021) typically set a threshold to
filter pseudo-labels with high confidence on the target do-
main, and use the pseudo-labels to supervise target domain
training; adversarial training methods (Tsai et al. 2018;Luo
et al. 2019b,a;Du et al. 2019;Vu et al. 2019;Tsai et al. 2019;
Yang et al. 2020b;Wang et al. 2020a;Li et al. 2021a) usually
add a domain discriminator to the model. The adversarial
game of the segmenter and the discriminator can make the
segmentation results of the source and target domains tend
to be consistent. There are also works (Zhang et al. 2019;
Pan et al. 2020;Wang et al. 2020c;Yu et al. 2021;Wang,
Peng, and Zhang 2021;Mei et al. 2020) that perform both
self-training and adversarial training to achieve good seg-
mentation results on the target domain.
Object Detection
Autonomous driving technology has greatly promoted the
development of object detection. There are many pioneering
works that can be widely used in various object detection
tasks, such as some two-stage methods (Girshick et al. 2014;
Girshick 2015;Ren et al. 2015) that first perform object ex-
traction, and then classify the extracted objects. Yolo series
of algorithms (Redmon et al. 2016;Redmon and Farhadi
2017,2018;Bochkovskiy, Wang, and Liao 2020;Jocher
et al. 2020) can simultaneously achieve object extraction
and classification in one network. The current popular ob-
ject detection method Yolov5 (Jocher et al. 2020) has been
able to achieve 72.7% mean average precision (mAP) on the
coco2017 val dataset. Object detection techniques also help
to extract bounding boxes in weakly supervised segmenta-
tion methods (Lan et al. 2021;Lee et al. 2021a).
Domain Shift Assessment
Domain shift comes from the difference between the source
and target domain data. There are various factors (e.g. im-
age content, view angle, image texture, etc.) that contribute
to domain shift. While for Convolutional Neural Networks
(CNN), texture is the most critical factor. Many studies
(Geirhos et al. 2018;Nam et al. 2019) suggest that the focus
of Convolutional Neural Networks (CNN) and human eyes
is different when processing images: human eyes are sensi-
tive to the content information of the image (e.g. shapes),
while CNN is more sensitive to the style of the image (e.g.
texture). Actually, if it involves the calculation of image tex-
ture differences, most methods are based on the output fea-
tures of the middle layer of the neural network. For example,
摘要:

WUDA:UnsupervisedDomainAdaptationBasedonWeakSourceDomainLabelsShengjieLiu1,ChuangZhu*1,WenqiTang11BeijingUniversityofPostsandTelecommunications,Beijing,Chinafshengjie.Liu,czhu,tangwenqig@bupt.edu.cnAbstractUnsuperviseddomainadaptation(UDA)forsemanticseg-mentationaddressesthecross-domainproblemwithn...

展开>> 收起<<
WUDA Unsupervised Domain Adaptation Based on Weak Source Domain Labels Shengjie Liu1 Chuang Zhu1 Wenqi Tang1 1Beijing University of Posts and Telecommunications Beijing China.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:7.76MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注