Few-Shot Segmentation via Rich Prototype Generation and Recurrent Prediction Enhancement

2025-04-27 0 0 2MB 12 页 10玖币
侵权投诉
Few-Shot Segmentation via
Rich Prototype Generation and
Recurrent Prediction Enhancement
Hongsheng Wang, Xiaoqi Zhao, Youwei Pang and Jinqing Qi*
Dalian University of Technology, Dalian, China
{wanghongsheng,zxq,lartpang}@mail.dlut.edu.cn,jinqing@dlut.edu.cn
Abstract. Prototype learning and decoder construction are the keys for
few-shot segmentation. However, existing methods use only a single pro-
totype generation mode, which can not cope with the intractable prob-
lem of objects with various scales. Moreover, the one-way forward prop-
agation adopted by previous methods may cause information dilution
from registered features during the decoding process. In this research,
we propose a rich prototype generation module (RPGM) and a recur-
rent prediction enhancement module (RPEM) to reinforce the prototype
learning paradigm and build a unified memory-augmented decoder for
few-shot segmentation, respectively. Specifically, the RPGM combines
superpixel and K-means clustering to generate rich prototype features
with complementary scale relationships and adapt the scale gap between
support and query images. The RPEM utilizes the recurrent mechanism
to design a round-way propagation decoder. In this way, registered fea-
tures can provide object-aware information continuously. Experiments
show that our method consistently outperforms other competitors on
two popular benchmarks PASCAL-5iand COCO-20i.
Keywords: Few-shot segmentation ·Rich prototype ·Recurrent pre-
diction.
1 Introduction
In recent years, with the use of deep neural networks and large-scale datasets,
significant progress has been made in fully-supervised semantic segmentation [4,
6, 12, 14, 32]. However, the labor cost of acquiring a large number of labeled
datasets is very expensive. To address this challenge, the few-shot segmentation
task [20] has been proposed. It aims to segment a new object class with only one
or a few annotated examples, which is agnostic to the network at the training
phase. Most methods adopt the general structure as shown in Fig. 1. Prototype
learning and decoder construction play an important role in few-shot segmenta-
tion. The prototype represents only object-related features and does not contain
*Corresponding author
arXiv:2210.00765v1 [cs.CV] 3 Oct 2022
2 H. Wang et al.
any background information. Some efforts [7, 17, 25, 26, 30] investigate different
prototype feature generation mechanisms to provide an effective reference for
query images. Both CANet [30] and PFENet [22] generate a single prototype by
the masked average pooling operation to represent all features in the foreground
of the support image. SCL [27] uses a self-guided mechanism to produce an
auxiliary feature prototype. ASGNet [13] is proposed to split support features
adaptively into several feature prototypes and select the most relevant proto-
type to match the query image. However, the aforementioned methods all adopt
a single approach to construct prototype features and ignore complex scale differ-
ences between support images and query images, which may introduce scale-level
interference for the subsequent similarity measure. The decoder can finish the
feature aggregation and transfer them into the task-required mode. Nevertheless,
many methods [13,16,22,27,29] focus on designing the feature enrichment mod-
ule or applying the multi-scale structure (e.g. ASPP [5]) directly to aggregate
the query features through a one-way forward propagation and obtain the final
prediction results. This limitation not only makes the semantic information of
the probability map generated by mid-level features insufficient, but also results
in truly useful features not being adequately utilized due to information dilution.
In response to these challenges, we propose a rich prototype generation mod-
ule (RPGM) and a recurrent prediction enhancement module (RPEM) to im-
prove the performance for few-shot segmentation. The RPGM combines two
clustering strategies, superpixel and K-means, to generate rich prototype fea-
tures that are complete representations of the supporting feature information.
Superpixel clustering can generate Ns∈ {1, . . . , N}prototypes depending on the
size of the image, while K-means clustering generates specific Nk=Nprototypes
regardless of the image size. The RPEM is a round-way feedback propagation
module based on the original forward propagation decoder and is motivated by
the recurrent mechanism. Specifically, it is composed of a multi-scale iterative
enhancement (MSIE) module and a query self-contrast enhancement (QSCE)
module. The former produces multi-scale information for the registered features
of each stage, while the latter performs the self-contrast operation on query pro-
totype features and then corrects those registered features. In this way, object-
aware information can be constantly obtained from the registered features. In
addition, taking into account the parameter-free nature, the proposed RPEM
can also be considered as a flexible post-processing technology by using it only
during the inference phase.
Our main contributions can be summarized as follows:
For few-shot segmentation, we design two simple yet effective improvement
strategies from the perspectives of prototype learning and decoder construc-
tion.
We put forward a rich prototype generation module, which generates comple-
mentary prototype features at two scales through two clustering algorithms
with different characteristics.
Few-Shot Segmentation via Rich Prototype Generation and Recurrent Prediction Enhancement 3
Encoder Matching+
Feature Enhancement
Decoder
Single Type
of Prototype
Support image
Query image
Support mask
Prediction
Fig. 1. A popular few-shot segmentation architecture.
An more efficient semantic decoder is powered by the proposed novel recur-
rent prediction enhancement module, where multi-scale and discriminative
information is adequately propagate to each decoder block.
Extensive experiments on two benchmark datasets demonstrate that the pro-
posed model outperforms other existing competitors under the same metrics.
2 Related Work
Semantic Segmentation is a fundamental computer vision task that aims to
accurately predict the label of each pixel. Currently, the encoder-decoder archi-
tecture [1,6] is widely used. The encoder extracts high-level semantic features at
low resolution, while the decoder progressively recovers the resolution of feature
maps to obtain the segmentation mask. Besides, many semantic segmentation
methods adopt the pyramid pooling structure [12, 31, 33] to capture semantic
context from multiple perspectives. Although these methods achieve good per-
formance, they rely on pixel-level annotation of all classes in the training phase
and can not be generalized to those new classes with only a few number of labels.
Few-shot Learning is proposed to leverage limited prior knowledge to predict
new classes. Current solutions are mainly based on meta-learning [3, 9, 19] and
metric learning [18, 23, 28]. Meta-learning aims to obtain a model that can be
quickly adapted to new tasks using previous experience, while metric learning
models the similarity among objects to generate discriminative representations
for new categories.
Few-shot Segmentation aims to segment query images containing new cate-
gories through utilizing useful information from a small number of labeled data.
PL [7] is the first to introduce prototype learning into few-shot segmentation
and obtains segmentation results by comparing support prototypes and query
features. Prototype alignment regularization is used in PANet [25], which encour-
ages mutual guidance between support and query images. PGNet [29] introduces
a graph attention unit to explore the local similarity between support and query
features. PPNet [16] moves away from the limitations of the overall prototype
摘要:

Few-ShotSegmentationviaRichPrototypeGenerationandRecurrentPredictionEnhancementHongshengWang,XiaoqiZhao,YouweiPangandJinqingQi*DalianUniversityofTechnology,Dalian,Chinafwanghongsheng,zxq,lartpangg@mail.dlut.edu.cn,jinqing@dlut.edu.cnAbstract.Prototypelearninganddecoderconstructionarethekeysforfew-sh...

展开>> 收起<<
Few-Shot Segmentation via Rich Prototype Generation and Recurrent Prediction Enhancement.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:2MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注