Few-Shot Segmentation via Rich Prototype Generation and Recurrent Prediction Enhancement

2025-04-27 0 0 2MB 12 页 10玖币

侵权投诉

Few-Shot Segmentation via

Rich Prototype Generation and

Recurrent Prediction Enhancement

Hongsheng Wang, Xiaoqi Zhao, Youwei Pang and Jinqing Qi*

Dalian University of Technology, Dalian, China

{wanghongsheng,zxq,lartpang}@mail.dlut.edu.cn,jinqing@dlut.edu.cn

Abstract. Prototype learning and decoder construction are the keys for

few-shot segmentation. However, existing methods use only a single pro-

totype generation mode, which can not cope with the intractable prob-

lem of objects with various scales. Moreover, the one-way forward prop-

agation adopted by previous methods may cause information dilution

from registered features during the decoding process. In this research,

we propose a rich prototype generation module (RPGM) and a recur-

rent prediction enhancement module (RPEM) to reinforce the prototype

learning paradigm and build a uniﬁed memory-augmented decoder for

few-shot segmentation, respectively. Speciﬁcally, the RPGM combines

superpixel and K-means clustering to generate rich prototype features

with complementary scale relationships and adapt the scale gap between

support and query images. The RPEM utilizes the recurrent mechanism

to design a round-way propagation decoder. In this way, registered fea-

tures can provide object-aware information continuously. Experiments

show that our method consistently outperforms other competitors on

two popular benchmarks PASCAL-5iand COCO-20i.

Keywords: Few-shot segmentation ·Rich prototype ·Recurrent pre-

diction.

1 Introduction

In recent years, with the use of deep neural networks and large-scale datasets,

signiﬁcant progress has been made in fully-supervised semantic segmentation [4,

6, 12, 14, 32]. However, the labor cost of acquiring a large number of labeled

datasets is very expensive. To address this challenge, the few-shot segmentation

task [20] has been proposed. It aims to segment a new object class with only one

or a few annotated examples, which is agnostic to the network at the training

phase. Most methods adopt the general structure as shown in Fig. 1. Prototype

learning and decoder construction play an important role in few-shot segmenta-

tion. The prototype represents only object-related features and does not contain

*Corresponding author

arXiv:2210.00765v1 [cs.CV] 3 Oct 2022

2 H. Wang et al.

any background information. Some eﬀorts [7, 17, 25, 26, 30] investigate diﬀerent

prototype feature generation mechanisms to provide an eﬀective reference for

query images. Both CANet [30] and PFENet [22] generate a single prototype by

the masked average pooling operation to represent all features in the foreground

of the support image. SCL [27] uses a self-guided mechanism to produce an

auxiliary feature prototype. ASGNet [13] is proposed to split support features

adaptively into several feature prototypes and select the most relevant proto-

type to match the query image. However, the aforementioned methods all adopt

a single approach to construct prototype features and ignore complex scale diﬀer-

ences between support images and query images, which may introduce scale-level

interference for the subsequent similarity measure. The decoder can ﬁnish the

feature aggregation and transfer them into the task-required mode. Nevertheless,

many methods [13,16,22,27,29] focus on designing the feature enrichment mod-

ule or applying the multi-scale structure (e.g. ASPP [5]) directly to aggregate

the query features through a one-way forward propagation and obtain the ﬁnal

prediction results. This limitation not only makes the semantic information of

the probability map generated by mid-level features insuﬃcient, but also results

in truly useful features not being adequately utilized due to information dilution.

In response to these challenges, we propose a rich prototype generation mod-

ule (RPGM) and a recurrent prediction enhancement module (RPEM) to im-

prove the performance for few-shot segmentation. The RPGM combines two

clustering strategies, superpixel and K-means, to generate rich prototype fea-

tures that are complete representations of the supporting feature information.

Superpixel clustering can generate Ns∈ {1, . . . , N}prototypes depending on the

size of the image, while K-means clustering generates speciﬁc Nk=Nprototypes

regardless of the image size. The RPEM is a round-way feedback propagation

module based on the original forward propagation decoder and is motivated by

the recurrent mechanism. Speciﬁcally, it is composed of a multi-scale iterative

enhancement (MSIE) module and a query self-contrast enhancement (QSCE)

module. The former produces multi-scale information for the registered features

of each stage, while the latter performs the self-contrast operation on query pro-

totype features and then corrects those registered features. In this way, object-

aware information can be constantly obtained from the registered features. In

addition, taking into account the parameter-free nature, the proposed RPEM

can also be considered as a ﬂexible post-processing technology by using it only

during the inference phase.

Our main contributions can be summarized as follows:

–For few-shot segmentation, we design two simple yet eﬀective improvement

strategies from the perspectives of prototype learning and decoder construc-

tion.

–We put forward a rich prototype generation module, which generates comple-

mentary prototype features at two scales through two clustering algorithms

with diﬀerent characteristics.

Few-Shot Segmentation via Rich Prototype Generation and Recurrent Prediction Enhancement 3

Encoder Matching+

Feature Enhancement

Decoder

Single Type

of Prototype

Support image

Query image

Support mask

Prediction

Fig. 1. A popular few-shot segmentation architecture.

–An more eﬃcient semantic decoder is powered by the proposed novel recur-

rent prediction enhancement module, where multi-scale and discriminative

information is adequately propagate to each decoder block.

–Extensive experiments on two benchmark datasets demonstrate that the pro-

posed model outperforms other existing competitors under the same metrics.

2 Related Work

Semantic Segmentation is a fundamental computer vision task that aims to

accurately predict the label of each pixel. Currently, the encoder-decoder archi-

tecture [1,6] is widely used. The encoder extracts high-level semantic features at

low resolution, while the decoder progressively recovers the resolution of feature

maps to obtain the segmentation mask. Besides, many semantic segmentation

methods adopt the pyramid pooling structure [12, 31, 33] to capture semantic

context from multiple perspectives. Although these methods achieve good per-

formance, they rely on pixel-level annotation of all classes in the training phase

and can not be generalized to those new classes with only a few number of labels.

Few-shot Learning is proposed to leverage limited prior knowledge to predict

new classes. Current solutions are mainly based on meta-learning [3, 9, 19] and

metric learning [18, 23, 28]. Meta-learning aims to obtain a model that can be

quickly adapted to new tasks using previous experience, while metric learning

models the similarity among objects to generate discriminative representations

for new categories.

Few-shot Segmentation aims to segment query images containing new cate-

gories through utilizing useful information from a small number of labeled data.

PL [7] is the ﬁrst to introduce prototype learning into few-shot segmentation

and obtains segmentation results by comparing support prototypes and query

features. Prototype alignment regularization is used in PANet [25], which encour-

ages mutual guidance between support and query images. PGNet [29] introduces

a graph attention unit to explore the local similarity between support and query

features. PPNet [16] moves away from the limitations of the overall prototype

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Few-ShotSegmentationviaRichPrototypeGenerationandRecurrentPredictionEnhancementHongshengWang,XiaoqiZhao,YouweiPangandJinqingQi*DalianUniversityofTechnology,Dalian,Chinafwanghongsheng,zxq,lartpangg@mail.dlut.edu.cn,jinqing@dlut.edu.cnAbstract.Prototypelearninganddecoderconstructionarethekeysforfew-sh...

展开>> 收起<<

Few-Shot Segmentation via Rich Prototype Generation and Recurrent Prediction Enhancement.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Few-Shot Segmentation via Rich Prototype Generation and Recurrent Prediction Enhancement

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: