Synthetic Data Supervised Salient Object Detection

2025-05-02 0 0 6.44MB 9 页 10玖币

侵权投诉

Zhenyu Wu

State Key Laboratory of Virtual

Reality Technology and Systems,

Beihang University

Beijing, China

Lin Wang

School of Transportation Science and

Engineering, Beihang University

Beijing, China

Wei Wang

School of Computer Science and

Technology, Harbin Institute of

Technology

Shenzhen, China

Tengfei Shi

State Key Laboratory of Virtual

Reality Technology and Systems,

Beihang University

Beijing, China

Chenglizhao Chen∗

College of Computer Science and

Technology, China University of

Petroleum (East China)

Qingdao, China

Aimin Hao

State Key Laboratory of Virtual

Reality Technology and Systems,

Beihang University, Beijing

Peng Cheng Laboratory, Shenzhen

China

Shuo Li

Department of Medical Imaging,

Western University

London, Canada

ABSTRACT

Although deep salient object detection (SOD) has achieved remark-

able progress, deep SOD models are extremely data-hungry, requir-

ing large-scale pixel-wise annotations to deliver such promising re-

sults. In this paper, we propose a novel yet eective method for SOD,

coined SODGAN, which can generate innite high-quality image-

mask pairs requiring only a few labeled data, and these synthesized

pairs can replace the human-labeled DUTS-TR to train any o-the-

shelf SOD model. Its contribution is three-fold.

Our proposed

diusion embedding network can address the manifold mismatch

and is tractable for the latent code generation, better matching

with the ImageNet latent space.

For the rst time, our proposed

few-shot saliency mask generator can synthesize innite accurate

image synchronized saliency masks with a few labeled data.

Our

proposed quality-aware discriminator can select highquality synthe-

sized image-mask pairs from noisy synthetic data pool, improving

the quality of synthetic data. For the rst time, our SODGAN tackles

SOD with synthetic data directly generated from the generative

model, which opens up a new research paradigm for SOD. Exten-

sive experimental results show that the saliency model trained

on synthetic data can achieve

98.4%

F-measure of the saliency

model trained on the DUTS-TR. Moreover, our approach achieves a

new SOTA performance in semi/weakly-supervised methods, and

even outperforms several fully-supervised SOTA methods. Code is

available at https://github.com/wuzhenyubuaa/SODGAN

∗Corresponding Author: Chenglizhao Chen, cclz123@163.com

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

MM ’22, October 10–14, 2022, Lisboa, Portugal.

ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00

https://doi.org/10.1145/3503161.3547930

CCS CONCEPTS

•Computer methodologies →Articial intelligence

;

Machine

learning.

KEYWORDS

Salient object detection, Synthetic data, Semi-supervised learning

ACM Reference Format:

Zhenyu Wu, Lin Wang, Wei Wang, Tengfei Shi, Chenglizhao Chen, Aimin

Hao, and Shuo Li. 2022. Synthetic Data Supervised Salient Object Detection.

In Proceedings of the 30th ACM International Conference on Multimedia (MM

’22), October 10–14, 2022, Lisboa, Portugal. ACM, New York, NY, USA, 9 pages.

https://doi.org/10.1145/3503161.3547930

1 INTRODUCTION

Salient object detection (SOD) aims to segment interesting objects

that attract human attention in an image. As a fundamental tool,

it can be leveraged to various applications including scene under-

standing [

], semantic segmentation [

] and image editing [

Recently, SOD has achieved signicant progress [

] due to the development of deep model. However, deep net-

works are extremely data-hungry, typically requiring pixel-level

humanannotated datasets to achieve high performance (see Fig.

1.a). Labeling large-scale datasets with pixel-level annotations for

SOD is very time-consuming, e.g., generally more than ve peo-

ple were asked to annotate the same image to guarantee the label

consistency and another ten viewers were asked to cross-check the

quality of annotations in the SOC dataset [10].

To alleviate the dependency on pixel-wise annotation, many

weakly-supervised SOD methods [

] have been devised.

Typically, image-level labels (see Fig. 1.c) are utilized in [

] for

saliency localization, and then iteratively netune their models with

predicted saliency maps. Additionally, scribble annotations (see Fig.

1.b) has been proposed recently in [

] to reduce the uncertainty of

image-level labels. Although these methods are free of pixel-level

annotations, they suer from various disadvantages, including low

arXiv:2210.13835v1 [cs.CV] 25 Oct 2022

MM ’22, October 10–14, 2022, Lisboa, Portugal. Zhenyu Wu et al.

(a) PFSN [22] (b) SCWS [48] (c) MWS [49] (d) Ours

Figure 1: The saliency model trained on synthetic data out-

performs SOTA weakly-supervised methods, and is even

competitive with fully-supervised models.

prediction accuracy, complex training strategy, dedicated network

architecture, and extra data information (e.g., edge) to obtain high-

quality saliency maps.

In this paper, we propose a new paradigm SODGAN (see Fig. 1.d)

for SOD, which can generate innite high-quality image-mask pairs

with a few labeled data to replace the human-labeled DUTS-TR

[

] dataset. Concretely, our SODGAN has three stages: Stage 1.

Learning a few-shot saliency mask generator to synthesize image-

synchronous mask, while utilizing the existing generative adver-

sarial networks (BigGAN [

]) to generate realistic images. Stage 2.

Selecting high-quality image-mask pairs from the synthetic data

pool. Stage 3. Training a saliency network on these ltered image-

mask pairs. However, there are three main challenges with this

approach:

1) Lacking pixel-wise labeled data

as the training

dataset to learn a segmentor because BigGAN was trained on the

ImageNet that was designed to classication tasks without the pixel-

level label.

2) Discovering a meaningful direction

in GAN latent

space to disentangle foreground saliency objects from backgrounds

is nontrivial, which often requires domain knowledge and labori-

ous engineering.

3) Low-quality image-mask pairs

exist in the

synthesized datasets.

To tackle these three challenges,

rst

, we present a diusion

embedding network (DEN) (see Sec. 3.2) to utilize the existing well-

annotated dataset (i.e., DUTS-TR), which can infer the image’s latent

code that match with the ImageNet latent code space; thus, the

existing labeled DUTS-TR dataset can provide the pixel-wise label

for ImageNet.

Second

, in contrast to the existing works [

]

focusing on latent space, we propose a few-shot saliency mask

generator to automatically discover meaningful directions in the

GANs feature space (see Sec. 3.3), which can synthesize innite

high-quality image synchronized saliency masks with a few labeled

data.

Third

, we propose a quality-aware discriminator (see Sec. 3.4)

to select high-quality synthesized image-mask pairs from the noisy

synthetic data pool, improving the quality of synthetic data.

Our SODGAN has several desirable properties.

a) Fewer labels.

Our approach eliminates large-scale pixel-level supervision requir-

ing only a few labeled data, which reduces the annotation costs.

b) High performance.

We demonstrate that the saliency model

trained on synthetic data directly generated from GANs achieves

an average

98.4%

F-measure of the saliency model trained on the

DUTS-TR dataset. Moreover, our SODGAN achieves new SOTA

performance in semi/weakly-supervised methods, and even outper-

forms some fully supervised methods.

c) Generality.

The synthetic

data can be used to train any o-the-shelf SOD model without the

need of special architectures, showing strong generalization capabil-

ities on the real test datasets. We summarize the key contributions

as follows:

•

For the rst time, our SODGAN tackles SOD with synthetic

data directly generated from the generative model, which

opens up a new research paradigm for semi-supervised SOD

and signicantly reduces the annotation costs.

•

Our proposed the DEN can address manifold mismatch and

is tractable for the latent code generation, better matching

with the ImageNet latent space.

•

Our lightweight few-shot saliency mask generator can syn-

thesize innite accurate image-synchronous saliency masks

with a few labeled data.

•

Our proposed quality-aware discriminator can select high-

quality synthesized image-mask pairs from the noisy syn-

thetic data pool, improving the quality of synthetic data.

2 RELATED WORK

Semi/Weakly-supervised SOD Approaches.

With recent advances

in semi/weakly-supervised learning, a few existing works exploit

the potential of training saliency detectors on image-level [

], region-level [

], and limited pixel-level [

]

labeled data to relax the dependency of manually annotated pixel-

level saliency masks. For image-level supervision, these approaches

[

] follow the same technical route, i.e., producing initial

saliency maps with image-level labels and then further rening it

via iterative training. Recently, scribble annotation was proposed

in [

], but it requires large-scale scribble annotations (10553

images) and extra data information (e.g., edge) to recover integral

object structure.

Dierences.

Distinct from all these works, our

approach provides a new paradigm for semi-supervised SOD. In

particular, we introduce SODGAN, a generative model, which can

generate innite high-quality image-mask pairs requiring minimal

manual intervention. These generated pairs can then be used for

training any existing SOD approaches.

Latent Interpretability of GANs.

The previous works have shown

that the GANs latent spaces are endowed with human-interpretable

semantic arithmetic. A line of recent works [

] em-

ploy explicit human-provided supervision to identify interpretable

directions in the latent space. For instance, [

] use the classi-

ers pretrained on the CelebA [

] dataset to produce pseudo labels

for the generated images and their latent codes. Another active line

of study on GANs [

] targets the object segmen-

tation task. [

] and [

] are based on the idea of decomposing the

generative process in a layer-wise fashion. Other works [

]

exploit the idea that the object’s location or appearance can be

perturbed without aecting image realism.

Dierences.

In con-

trast to existing works manipulating the latent space, our approach

is able to discover interpretable directions in the GANs features

space, which allows complete control over the diversity of object

categories and can automatically nd the expected directions.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SyntheticDataSupervisedSalientObjectDetectionZhenyuWuStateKeyLaboratoryofVirtualRealityTechnologyandSystems,BeihangUniversityBeijing,ChinaLinWangSchoolofTransportationScienceandEngineering,BeihangUniversityBeijing,ChinaWeiWangSchoolofComputerScienceandTechnology,HarbinInstituteofTechnologyShenzhen,C...

展开>> 收起<<

Synthetic Data Supervised Salient Object Detection.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Synthetic Data Supervised Salient Object Detection

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: