Synthetic Data Supervised Salient Object Detection
2025-05-02
0
0
6.44MB
9 页
10玖币
侵权投诉
Synthetic Data Supervised Salient Object Detection
Zhenyu Wu
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
Lin Wang
School of Transportation Science and
Engineering, Beihang University
Beijing, China
Wei Wang
School of Computer Science and
Technology, Harbin Institute of
Technology
Shenzhen, China
Tengfei Shi
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
Chenglizhao Chen∗
College of Computer Science and
Technology, China University of
Petroleum (East China)
Qingdao, China
Aimin Hao
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University, Beijing
Peng Cheng Laboratory, Shenzhen
China
Shuo Li
Department of Medical Imaging,
Western University
London, Canada
ABSTRACT
Although deep salient object detection (SOD) has achieved remark-
able progress, deep SOD models are extremely data-hungry, requir-
ing large-scale pixel-wise annotations to deliver such promising re-
sults. In this paper, we propose a novel yet eective method for SOD,
coined SODGAN, which can generate innite high-quality image-
mask pairs requiring only a few labeled data, and these synthesized
pairs can replace the human-labeled DUTS-TR to train any o-the-
shelf SOD model. Its contribution is three-fold.
1)
Our proposed
diusion embedding network can address the manifold mismatch
and is tractable for the latent code generation, better matching
with the ImageNet latent space.
2)
For the rst time, our proposed
few-shot saliency mask generator can synthesize innite accurate
image synchronized saliency masks with a few labeled data.
3)
Our
proposed quality-aware discriminator can select highquality synthe-
sized image-mask pairs from noisy synthetic data pool, improving
the quality of synthetic data. For the rst time, our SODGAN tackles
SOD with synthetic data directly generated from the generative
model, which opens up a new research paradigm for SOD. Exten-
sive experimental results show that the saliency model trained
on synthetic data can achieve
98.4%
F-measure of the saliency
model trained on the DUTS-TR. Moreover, our approach achieves a
new SOTA performance in semi/weakly-supervised methods, and
even outperforms several fully-supervised SOTA methods. Code is
available at https://github.com/wuzhenyubuaa/SODGAN
∗Corresponding Author: Chenglizhao Chen, cclz123@163.com
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MM ’22, October 10–14, 2022, Lisboa, Portugal.
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00
https://doi.org/10.1145/3503161.3547930
CCS CONCEPTS
•Computer methodologies →Articial intelligence
;
Machine
learning.
KEYWORDS
Salient object detection, Synthetic data, Semi-supervised learning
ACM Reference Format:
Zhenyu Wu, Lin Wang, Wei Wang, Tengfei Shi, Chenglizhao Chen, Aimin
Hao, and Shuo Li. 2022. Synthetic Data Supervised Salient Object Detection.
In Proceedings of the 30th ACM International Conference on Multimedia (MM
’22), October 10–14, 2022, Lisboa, Portugal. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3503161.3547930
1 INTRODUCTION
Salient object detection (SOD) aims to segment interesting objects
that attract human attention in an image. As a fundamental tool,
it can be leveraged to various applications including scene under-
standing [
60
], semantic segmentation [
59
] and image editing [
5
,
15
].
Recently, SOD has achieved signicant progress [
12
,
19
,
33
,
36
,
42
,
53
,
57
] due to the development of deep model. However, deep net-
works are extremely data-hungry, typically requiring pixel-level
humanannotated datasets to achieve high performance (see Fig.
1.a). Labeling large-scale datasets with pixel-level annotations for
SOD is very time-consuming, e.g., generally more than ve peo-
ple were asked to annotate the same image to guarantee the label
consistency and another ten viewers were asked to cross-check the
quality of annotations in the SOC dataset [10].
To alleviate the dependency on pixel-wise annotation, many
weakly-supervised SOD methods [
17
,
37
,
49
] have been devised.
Typically, image-level labels (see Fig. 1.c) are utilized in [
17
,
37
] for
saliency localization, and then iteratively netune their models with
predicted saliency maps. Additionally, scribble annotations (see Fig.
1.b) has been proposed recently in [
52
] to reduce the uncertainty of
image-level labels. Although these methods are free of pixel-level
annotations, they suer from various disadvantages, including low
arXiv:2210.13835v1 [cs.CV] 25 Oct 2022
MM ’22, October 10–14, 2022, Lisboa, Portugal. Zhenyu Wu et al.
(a) PFSN [22] (b) SCWS [48] (c) MWS [49] (d) Ours
Figure 1: The saliency model trained on synthetic data out-
performs SOTA weakly-supervised methods, and is even
competitive with fully-supervised models.
prediction accuracy, complex training strategy, dedicated network
architecture, and extra data information (e.g., edge) to obtain high-
quality saliency maps.
In this paper, we propose a new paradigm SODGAN (see Fig. 1.d)
for SOD, which can generate innite high-quality image-mask pairs
with a few labeled data to replace the human-labeled DUTS-TR
[
37
] dataset. Concretely, our SODGAN has three stages: Stage 1.
Learning a few-shot saliency mask generator to synthesize image-
synchronous mask, while utilizing the existing generative adver-
sarial networks (BigGAN [
3
]) to generate realistic images. Stage 2.
Selecting high-quality image-mask pairs from the synthetic data
pool. Stage 3. Training a saliency network on these ltered image-
mask pairs. However, there are three main challenges with this
approach:
1) Lacking pixel-wise labeled data
as the training
dataset to learn a segmentor because BigGAN was trained on the
ImageNet that was designed to classication tasks without the pixel-
level label.
2) Discovering a meaningful direction
in GAN latent
space to disentangle foreground saliency objects from backgrounds
is nontrivial, which often requires domain knowledge and labori-
ous engineering.
3) Low-quality image-mask pairs
exist in the
synthesized datasets.
To tackle these three challenges,
rst
, we present a diusion
embedding network (DEN) (see Sec. 3.2) to utilize the existing well-
annotated dataset (i.e., DUTS-TR), which can infer the image’s latent
code that match with the ImageNet latent code space; thus, the
existing labeled DUTS-TR dataset can provide the pixel-wise label
for ImageNet.
Second
, in contrast to the existing works [
13
,
26
,
31
]
focusing on latent space, we propose a few-shot saliency mask
generator to automatically discover meaningful directions in the
GANs feature space (see Sec. 3.3), which can synthesize innite
high-quality image synchronized saliency masks with a few labeled
data.
Third
, we propose a quality-aware discriminator (see Sec. 3.4)
to select high-quality synthesized image-mask pairs from the noisy
synthetic data pool, improving the quality of synthetic data.
Our SODGAN has several desirable properties.
a) Fewer labels.
Our approach eliminates large-scale pixel-level supervision requir-
ing only a few labeled data, which reduces the annotation costs.
b) High performance.
We demonstrate that the saliency model
trained on synthetic data directly generated from GANs achieves
an average
98.4%
F-measure of the saliency model trained on the
DUTS-TR dataset. Moreover, our SODGAN achieves new SOTA
performance in semi/weakly-supervised methods, and even outper-
forms some fully supervised methods.
c) Generality.
The synthetic
data can be used to train any o-the-shelf SOD model without the
need of special architectures, showing strong generalization capabil-
ities on the real test datasets. We summarize the key contributions
as follows:
•
For the rst time, our SODGAN tackles SOD with synthetic
data directly generated from the generative model, which
opens up a new research paradigm for semi-supervised SOD
and signicantly reduces the annotation costs.
•
Our proposed the DEN can address manifold mismatch and
is tractable for the latent code generation, better matching
with the ImageNet latent space.
•
Our lightweight few-shot saliency mask generator can syn-
thesize innite accurate image-synchronous saliency masks
with a few labeled data.
•
Our proposed quality-aware discriminator can select high-
quality synthesized image-mask pairs from the noisy syn-
thetic data pool, improving the quality of synthetic data.
2 RELATED WORK
Semi/Weakly-supervised SOD Approaches.
With recent advances
in semi/weakly-supervised learning, a few existing works exploit
the potential of training saliency detectors on image-level [
17
,
37
,
49
], region-level [
48
,
51
,
52
], and limited pixel-level [
41
,
44
,
50
,
58
]
labeled data to relax the dependency of manually annotated pixel-
level saliency masks. For image-level supervision, these approaches
[
17
,
37
,
49
] follow the same technical route, i.e., producing initial
saliency maps with image-level labels and then further rening it
via iterative training. Recently, scribble annotation was proposed
in [
48
,
52
], but it requires large-scale scribble annotations (10553
images) and extra data information (e.g., edge) to recover integral
object structure.
Dierences.
Distinct from all these works, our
approach provides a new paradigm for semi-supervised SOD. In
particular, we introduce SODGAN, a generative model, which can
generate innite high-quality image-mask pairs requiring minimal
manual intervention. These generated pairs can then be used for
training any existing SOD approaches.
Latent Interpretability of GANs.
The previous works have shown
that the GANs latent spaces are endowed with human-interpretable
semantic arithmetic. A line of recent works [
6
,
13
,
26
,
31
,
32
,
47
] em-
ploy explicit human-provided supervision to identify interpretable
directions in the latent space. For instance, [
13
,
31
] use the classi-
ers pretrained on the CelebA [
21
] dataset to produce pseudo labels
for the generated images and their latent codes. Another active line
of study on GANs [
1
,
2
,
4
,
23
,
34
,
35
,
55
] targets the object segmen-
tation task. [
1
] and [
4
] are based on the idea of decomposing the
generative process in a layer-wise fashion. Other works [
2
,
23
,
35
]
exploit the idea that the object’s location or appearance can be
perturbed without aecting image realism.
Dierences.
In con-
trast to existing works manipulating the latent space, our approach
is able to discover interpretable directions in the GANs features
space, which allows complete control over the diversity of object
categories and can automatically nd the expected directions.
摘要:
展开>>
收起<<
SyntheticDataSupervisedSalientObjectDetectionZhenyuWuStateKeyLaboratoryofVirtualRealityTechnologyandSystems,BeihangUniversityBeijing,ChinaLinWangSchoolofTransportationScienceandEngineering,BeihangUniversityBeijing,ChinaWeiWangSchoolofComputerScienceandTechnology,HarbinInstituteofTechnologyShenzhen,C...
声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
相关推荐
-
公司营销部领导述职述廉报告VIP免费
2024-12-03 4 -
100套述职述廉述法述学框架提纲VIP免费
2024-12-03 3 -
20220106政府党组班子党史学习教育专题民主生活会“五个带头”对照检查材料VIP免费
2024-12-03 3 -
20220106县纪委监委领导班子党史学习教育专题民主生活会对照检查材料VIP免费
2024-12-03 6 -
A文秘笔杆子工作资料汇编手册(近70000字)VIP免费
2024-12-03 3 -
20220106县领导班子党史学习教育专题民主生活会对照检查材料VIP免费
2024-12-03 4 -
经济开发区党工委书记管委会主任述学述职述廉述法报告VIP免费
2024-12-03 34 -
20220106政府领导专题民主生活会五个方面对照检查材料VIP免费
2024-12-03 11 -
派出所教导员述职述廉报告6篇VIP免费
2024-12-03 8 -
民主生活会对县委班子及其成员批评意见清单VIP免费
2024-12-03 50
分类:图书资源
价格:10玖币
属性:9 页
大小:6.44MB
格式:PDF
时间:2025-05-02


渝公网安备50010702506394