2 H. Wang et al.
any background information. Some efforts [7, 17, 25, 26, 30] investigate different
prototype feature generation mechanisms to provide an effective reference for
query images. Both CANet [30] and PFENet [22] generate a single prototype by
the masked average pooling operation to represent all features in the foreground
of the support image. SCL [27] uses a self-guided mechanism to produce an
auxiliary feature prototype. ASGNet [13] is proposed to split support features
adaptively into several feature prototypes and select the most relevant proto-
type to match the query image. However, the aforementioned methods all adopt
a single approach to construct prototype features and ignore complex scale differ-
ences between support images and query images, which may introduce scale-level
interference for the subsequent similarity measure. The decoder can finish the
feature aggregation and transfer them into the task-required mode. Nevertheless,
many methods [13,16,22,27,29] focus on designing the feature enrichment mod-
ule or applying the multi-scale structure (e.g. ASPP [5]) directly to aggregate
the query features through a one-way forward propagation and obtain the final
prediction results. This limitation not only makes the semantic information of
the probability map generated by mid-level features insufficient, but also results
in truly useful features not being adequately utilized due to information dilution.
In response to these challenges, we propose a rich prototype generation mod-
ule (RPGM) and a recurrent prediction enhancement module (RPEM) to im-
prove the performance for few-shot segmentation. The RPGM combines two
clustering strategies, superpixel and K-means, to generate rich prototype fea-
tures that are complete representations of the supporting feature information.
Superpixel clustering can generate Ns∈ {1, . . . , N}prototypes depending on the
size of the image, while K-means clustering generates specific Nk=Nprototypes
regardless of the image size. The RPEM is a round-way feedback propagation
module based on the original forward propagation decoder and is motivated by
the recurrent mechanism. Specifically, it is composed of a multi-scale iterative
enhancement (MSIE) module and a query self-contrast enhancement (QSCE)
module. The former produces multi-scale information for the registered features
of each stage, while the latter performs the self-contrast operation on query pro-
totype features and then corrects those registered features. In this way, object-
aware information can be constantly obtained from the registered features. In
addition, taking into account the parameter-free nature, the proposed RPEM
can also be considered as a flexible post-processing technology by using it only
during the inference phase.
Our main contributions can be summarized as follows:
–For few-shot segmentation, we design two simple yet effective improvement
strategies from the perspectives of prototype learning and decoder construc-
tion.
–We put forward a rich prototype generation module, which generates comple-
mentary prototype features at two scales through two clustering algorithms
with different characteristics.