
Query Semantic Reconstruction for Background in Few-Shot
Segmentation
Haoyan Guana,Michael Spratlinga
aDepartment of Informatics, King’s College London, London, WC2B 4BG, UK
ARTICLE INFO
Keywords:
few-shot learning
semantic segmentation
ABSTRACT
Few-shot segmentation (FSS) aims to segment unseen classes using a few annotated samples.
Typically, a prototype representing the foreground class is extracted from annotated support image(s)
and is matched to features representing each pixel in the query image. However, models learnt in this
way are insufficiently discriminatory, and often produce false positives: misclassifying background
pixels as foreground. Some FSS methods try to address this issue by using the background in the
support image(s) to help identify the background in the query image. However, the backgrounds
of theses images is often quite distinct, and hence, the support image background information is
uninformative. This article proposes a method, QSR, that extracts the background from the query
image itself, and as a result is better able to discriminate between foreground and background features
in the query image. This is achieved by modifying the training process to associate prototypes with
class labels including known classes from the training data and latent classes representing unknown
background objects. This class information is then used to extract a background prototype from
the query image. To successfully associate prototypes with class labels and extract a background
prototype that is capable of predicting a mask for the background regions of the image, the machinery
for extracting and using foreground prototypes is induced to become more discriminative between
different classes. Experiments achieves state-of-the-art results for both 1-shot and 5-shot FSS on the
PASCAL-5𝑖and COCO-20𝑖dataset. As QSR operates only during training, results are produced with
no extra computational complexity during testing.
1. Introduction
The ability to segment objects is a long-standing goal of
computer vision, and recent methods have achieved extraor-
dinary results (He, Zhang, Ren and Sun,2016;He, Deng,
Zhou, Wang and Qiao,2019;Long, Shelhamer and Darrell,
2015). These results depend on a large number of pixel-
level annotations which are time-consuming and costly to
produce. When facing the situation where few exemplars
from a novel class are available, these methods overfit and
perform poorly. To deal with this situation, few-shot seg-
mentation (FSS) methods aim to predict a segmentation
mask for a novel category using only a few images and their
corresponding segmentation ground-truths.
Most current FSS algorithms (Zhang, Lin, Liu, Yao and
Shen,2019b;Siam, Oreshkin and Jagersand,2019;Zhang,
Lin, Liu, Guo, Wu and Yao,2019a;Lu, He, Zhu, Zhang,
Song and Xiang,2021;Liu, Ding, Jiao, Ji and Ye,2021;
Li, Jampani, Sevilla-Lara, Sun, Kim and Kim,2021;Wu,
Shi, Lin and Cai,2021;Zhang, Xiao and Qin,2021) follow
a similar sequence of steps. Features are extracted from
support and query images by a shared convolutional neural
network (CNN) which is pre-trained on ImageNet (Rus-
sakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpa-
thy, Khosla, Bernstein et al.,2015;Yang, Liu, Li, Jiao and
Ye,2020;Siam, Doraiswamy, Oreshkin, Yao and Jagersand,
2020;Zhang et al.,2019b). Then the support image ground-
truth segmentation mask is used to identity the foreground
information in the support features. Generally, the object
haoyan.guan@kcl.ac.uk (H. Guan); michael.spratling@kcl.ac.uk
(M. Spratling)
class is represented by a single foreground prototype feature
vector (Wang, Liew, Zou, Zhou and Feng,2019;Yang et al.,
2020;Tian, Zhao, Shu, Yang, Li and Jia,2020;Zhang et al.,
2021;Li et al.,2021). Finally, a decoder is used to calculate
the similarity of the foreground prototype and every pixel
in the query feature-set to predict the locations occupied
by the foreground object in the query image. This standard
approach ignores the importance of background features that
can be mined for negative samples in order to reduce false-
positives, and hence, make the model more discriminative.
Some FSS methods (Yang et al.,2020;Boudiaf, Ker-
vadec, Masud, Piantanida, Ayed and Dolz,2021;Wang
et al.,2019) extract background information from support
images by using the support masks to identify the support
image background. RPMMs (Yang et al.,2020) uses the
Expectation-Maximization (EM) algorithm to mine more
background information in the support images. MLC (Yang,
Zhuo, Qi, Shi and Gao,2021) extracts a global background
prototype by averaging together the backgrounds extracted
from the whole training data in an offline process, then
updates this global background prototype with the support
background during training. However, the same category
object may appear against different backgrounds in differ-
ent images. The background information extracted from or
aligned with the support image(s) is, therefore, unlikely to
be useful for segmenting the query image. Existing FSS
methods ignore the fact that the background information of
an image is most relevant for segmenting that specific image.
In this paper, we are motivated by the issue illustrated
in Fig. 1and design a method that can extract background
information from the query image itself to make existing
H. Guan, M. Spratling: Preprint submitted to Elsevier Page 1 of 11
arXiv:2210.12055v2 [cs.CV] 21 Dec 2022