BoxTeacher Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation Tianheng Cheng1 Xinggang Wang1y Shaoyu Chen1 Qian Zhang2 Wenyu Liu1_2

2025-04-30 0 0 2.91MB 11 页 10玖币
侵权投诉
BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised
Instance Segmentation
Tianheng Cheng1,?, Xinggang Wang1,, Shaoyu Chen1,?, Qian Zhang2, Wenyu Liu1
1School of EIC, Huazhong University of Science & Technology
2Horizon Robotics
https://github.com/hustvl/BoxTeacher
Abstract
Labeling objects with pixel-wise segmentation requires a
huge amount of human labor compared to bounding boxes.
Most existing methods for weakly supervised instance seg-
mentation focus on designing heuristic losses with priors
from bounding boxes. While, we find that box-supervised
methods can produce some fine segmentation masks and we
wonder whether the detectors could learn from these fine
masks while ignoring low-quality masks. To answer this
question, we present BoxTeacher, an efficient and end-to-
end training framework for high-performance weakly su-
pervised instance segmentation, which leverages a sophis-
ticated teacher to generate high-quality masks as pseudo
labels. Considering the massive noisy masks hurt the train-
ing, we present a mask-aware confidence score to esti-
mate the quality of pseudo masks, and propose the noise-
aware pixel loss and noise-reduced affinity loss to adap-
tively optimize the student with pseudo masks. Extensive
experiments can demonstrate effectiveness of the proposed
BoxTeacher. Without bells and whistles, BoxTeacher re-
markably achieves 35.0mask AP and 36.5mask AP with
ResNet-50 and ResNet-101 respectively on the challenging
COCO dataset, which outperforms the previous state-of-
the-art methods by a significant margin and bridges the gap
between box-supervised and mask-supervised methods.
1. Introduction
Instance segmentation, aiming at recognizing and seg-
menting objects in images, is a fairly challenging task in
computer vision. Fortunately, the rapid development of
object detection methods [7,40,50] has greatly advanced
the emergence of numbers of successful methods [5,6,23,
49,54,55] for effective and efficient instance segmenta-
?This work was done when Tianheng Cheng and Shaoyu Chen were
interns at Horizon Robotics. Xinggang Wang is the corresponding au-
thor: xgwang@hust.edu.cn
BoxInst, 30.7 AP Ground Truth
79031
BoxInst, 30.7 AP Ground Truth
30.7 31.0
32.6
31.8
31.3
34.2
28
29
30
31
32
33
34
35
BoxInst
Self-Training
BoxTeacher
Mask AP
1× Schedule
3× Schedule
(a) (b)
000000377486
Figure 1. (a) Segmentation Masks from BoxInst. BoxInst
(ResNet-50 [24]) can produce some fine segmentation masks with
weak supervisions from bounding boxes and images. (b) Self-
Training with Pseudo Masks on COCO val.We explore the
self-training to train a CondInst [49] with the pseudo labels gener-
ated by BoxInst. However, the improvements are limited
tion. With the fine-grained human annotations, recent in-
stance segmentation methods can achieve impressive re-
sults on challenging the COCO dataset [34]. Nevertheless,
labeling instance-level segmentation is much complicated
and time-consuming, e.g., labeling an object with polygon-
based masks requires 10.3×more time than that with a 4-
point bounding box [11].
Recently, a few works [25,3133,51,53] explore weakly
supervised instance segmentation with box annotations or
low-level colors. These weakly supervised methods can ef-
fectively train instance segmentation methods [23,49,55]
without pixel-wise or polygon-based annotations and ob-
tain fine segmentation masks. As shown in Fig. 1(a), Box-
Inst [51] can output a few high-quality segmentation masks
and segment well on the object boundary, e.g., the person,
even performs better than the ground-truth mask in details
though other objects may be badly segmented. Naturally,
we wonder if the generated masks of box-supervised meth-
ods, especially the high-quality masks, could be qualified
as pseudo segmentation labels to further improve the per-
formance of weakly supervised instance segmentation.
To answer this question, we first employ the naive
1
arXiv:2210.05174v2 [cs.CV] 17 Mar 2023
self-training to evaluate the performance of using box-
supervised pseudo masks. Given the generated instance
masks from BoxInst, we propose a simple yet effective box-
based pseudo mask assignment to assign pseudo masks to
ground-truth boxes. And then we train the CondInst [49]
with the pseudo masks, which has the same architecture
with BoxInst and consists of a detector [50] and a dynamic
mask head. Fig. 1(b) shows that using self-training brings
minor improvements and fails to unleash the power of high-
quality pseudo masks, which can be attributed to two obsta-
cles, i.e., (1) the naive self-training fails to filter low-quality
masks, and (2) the noisy pseudo masks hurt the training
using fully-supervised pixel-wise loss. Besides, the multi-
stage self-training is inefficient.
To address these problems, we present BoxTeacher, an
end-to-end training framework, which takes advantage of
high-quality pseudo masks produced by box supervision.
BoxTeacher is composed of a sophisticated Teacher and
a perturbed Student, in which the teacher generates high-
quality pseudo instance masks along with the mask-aware
confidence scores to estimate the quality of masks. Then the
proposed box-based pseudo mask assignment will assign
the pseudo masks to the ground-truth boxes. The student is
normally optimized with the ground-truth boxes and pseudo
masks through box-based loss and noise-aware pseudo
mask loss, and then progressively updates the teacher via
Exponential Moving Average (EMA). In contrast to the
naive multi-stage self-training, BoxTeacher is more simple
and efficient. The proposed mask-aware confidence score
effectively reduces the impact of low-quality masks. More
importantly, pseudo labeling can mutually improve the stu-
dent and further enforce the teacher to generate higher-
quality masks, hence pushing the limits of the box supervi-
sion. BoxTeacher can serve as a general training paradigm
and is agnostic to the methods for instance segmentation.
To benchmark the proposed BoxTeacher, we adopt
CondInst [49] as the basic segmentation method. On the
challenging COCO dataset [34], BoxTeacher surprisingly
achieves 35.0and 36.5mask AP based on ResNet-50 [24]
and ResNet-101 respectively, which remarkably outper-
forms the counterparts. We provide extensive experiments
on PASCAL VOC and Cityscapes to demonstrate its ef-
fectiveness and generalization ability. Furthermore, Box-
Teacher with Swin Transformer [37] obtains 40.6 mask AP
as a weakly approach for instance segmentation.
Overall, the contribution can be summarized as follows:
• We solve the box-supervised instance segmentation
problem from a new perspective, i.e., self-training with
pseudo masks, and illustrate its effectiveness.
We present BoxTeacher, a simple yet effective frame-
work, which leverages pseudo masks with the mask-
aware confidence score and noise-aware pseudo masks
loss. Besides, we propose a pseudo mask assignment
to assign pseudo masks to ground-truth boxes.
We improve the weakly supervised instance segmenta-
tion by large margins and bridge the gap between box-
supervised and mask-supervised methods, e.g., Box-
Teacher achieves 36.5mask AP on COCO compared
to 39.1AP obtained by CondInst.
2. Related Work
Instance Segmentation. Methods for instance segmenta-
tion can be roughly divided into two groups, i.e., single-
stage methods and two-stage methods. Single-stage meth-
ods [5,49,58,62] tend to adopt single-stage object detec-
tors [35,50], to localize and recognize objects, and then
generate segmentation masks through object enmbeddings
or dynamic convolution [9]. Wang et al. present box-free
SOLO [54] and SOLOv2 [55], which are independent of ob-
ject detectors. SparseInst [13] and YOLACT [5], aiming for
real-time inference, achieve great trade-off between speed
and accuracy. Two-stage methods [14,23,27,29] adopt
bounding boxes from object detectors and RoIAlign [23] to
extract the RoI (region-of-interest) features for object seg-
mentation, e.g., Mask R-CNN [23]. Several methods [14,
27,29] based on Mask R-CNN are proposed to refine the
segmentation masks for high-quality instance segmentation.
Recently, many approaches [7,10,12,17,20,63] based on
transformers [18,52] or the Hungarian algorithm [46] have
made great progress in instance segmentation.
Weakly Supervised Instance Segmentation. Considering
the huge cost of labeling instance segmentation, weakly
supervised instance segmentation using image-level labels
or bounding boxes gets lots of attention. Several meth-
ods [1,2,64,66] exploit image-level labels to generate
pseudo masks from activation maps. Khoreva et.al. [28]
propose to generate pseudo masks with GrabCut [42] from
given bounding boxes. BoxCaseg [53] leverages a saliency
model to generate pseudo object masks for training Mask R-
CNN along with the multiple instance learning (MIL) loss.
Recently, many box-supervised methods [25,31,33,51]
combines the MIL loss or pairwise relation loss from low-
level features obtain impressing results with box annota-
tions. In comparison with BoxInst [51], BoxTeacher inher-
its the box supervision [51] but concentrates more on the
novel training paradigm and exploiting noisy pseudo masks
for high-performance box-supervised instance segmenta-
tion with box annotations. Different from DiscoBox [31]
based on mean teacher [48], BoxTeacher aims at a simple
yet effective training framework with obtaining high-quality
pseudo masks and learning from noisy masks.
Semi-supervised Learning. Pseudo labeling [3,21,41] and
consistency regularization [4,30,43,44,59] have greatly
2
advanced the semi-supervised learning, which enables the
training on large-scale unlabeled datasets. Recently, semi-
supervised learning has been widely used in object de-
tection [36,45,60] and semantic segmentation [8,56,61]
and demonstrated its effectiveness. Motivated by high-
quality masks from box supervision, we adopt the success-
ful pseudo labeling and consistency regularization to de-
velop a new training framework for weakly supervised in-
stance segmentation. Compared to [22] which has simi-
lar motivation but aims for semi-supervised object detec-
tion with labeled images and extra point annotations, Box-
Teacher addresses box-supervised instance segmentation
with box-only annotations. Compared to [26,47] which
adopt multi-stage training and combine weakly supervised
and semi-supervised learning, BoxTeacher is a one-stage
framework without pre-trained labelers.
3. Naive Self-Training with Pseudo Masks
Revisiting Box-supervised Methods. Note that box-only
annotations is sufficient to train an object detector, which
can accurately localize and recognize objects. Box-
supervised methods [31,33,51] based on object detectors
mainly exploit two exquisite losses to supervise mask pre-
dictions, i.e., the multiple instance learning (MIL) loss and
the pairwise relation loss. Concretely, according to the
bounding boxes, the MIL loss can determine the positive
and negative bags of pixels of the predicted masks. Pair-
wise relation loss concentrates on the local relations of pix-
els from low-level colors or features, in which neighboring
pixels have the similar color will be regarded as a positive
pair and should output similar probabilities. The MIL loss
and pairwise relation loss enables the box-supervised meth-
ods to produce the complete segmentation masks, and even
some high-quality masks with fine details.
Naive Self-Training. Considering that the box-supervised
methods can produce some high-quality masks without
mask annotations, we adopt self-training to utilize the high-
quality masks as pseudo labels to train an instance seg-
mentation method with full supervision. Specifically, we
adopt the successful BoxInst [51] to generate pseudo in-
stance masks on the given dataset X={X ,Bg}, which
only contains the box annotations. For each input image
X, let {Bp,Cp,Mp}denote the predicted bounding boxes,
confidence scores, and predicted instance masks, respec-
tively. We propose a simple yet effective Box-based Pseudo
Mask Assignment algorithm in Alg. 1to assign the pre-
dicted instance masks to the box annotations via the con-
fidence scores and intersection-over-union (IoU) between
ground-truth boxes Bgand predicted boxes Bp. The hyper-
parameters τiou and τcare set to 0.5and 0.05, respectively.
The assigned instance masks will be rectified by removing
the parts beyond the bounding boxes. Then, we adopt the
dataset ˆ
X={X ,Bg,Mg}with pseudo instance masks to
train an approach, e.g., CondInst [49].
Naive Self-Training is Limited. Fig. 1(b) and Tab. 7pro-
vide the experimental results of using naive self-training
pseudo masks. Compared to the pseudo labeler, using self-
training brings minor improvements and even fails to sur-
pass the pseudo labeler. We attribute the limited perfor-
mance to two issues, i.e., the naive self-training fails to
exclude low-quality masks and the fully-supervised loss is
sensitive to the noisy pseudo masks.
Algorithm 1: Box-based Pseudo Mask Assignment
Input: predicted boxes BpRN×4, predicted masks
MpRN×H×W, confidence score CpRN,
ground-truth boxes BgRK×4.
Parameter: IoU threshold τiou, confidence
threshold τc.
Output: assigned pseudo masks MgRK×H×W.
1Initialize output masks Mgwith empty (0),
assignment index ARKwith 1;
2Filter the predictions by the confidence threshold τc;
3Sort the confidence score Cpin descending order
with output indices SNN;
4foreach prediction iin Sdo
5Initialize u← −1,v← −1;
6for j= 1 to Kdo
7iouij =ComputeIoU(Bp
i,Bg
j);
8if Aj>0then
9continue;
10 end
11 if iouij τiou and iouij uthen
12 uiouij ,vi;
13 end
14 if v > 0then
15 Assign mask Mp
ito mask Mg
v;
16 Aji;
17 end
18 end
19 end
4. BoxTeacher
In this section, we present BoxTeacher, an end-to-end
training framework, which aims to unleash the power of
pseudo masks. In contrast to multi-stage self-training, Box-
Teacher, consisting of a teacher and a student, simultane-
ously facilitates the training of the student and pseudo la-
beling of the teacher. The mutual optimization is beneficial
to both the teacher and the student, thus leading to higher
performance for box-supervised instance segmentation.
3
摘要:

BoxTeacher:ExploringHigh-QualityPseudoLabelsforWeaklySupervisedInstanceSegmentationTianhengCheng1;?,XinggangWang1;y,ShaoyuChen1;?,QianZhang2,WenyuLiu11SchoolofEIC,HuazhongUniversityofScience&Technology2HorizonRoboticshttps://github.com/hustvl/BoxTeacherAbstractLabelingobjectswithpixel-wisesegmentati...

展开>> 收起<<
BoxTeacher Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation Tianheng Cheng1 Xinggang Wang1y Shaoyu Chen1 Qian Zhang2 Wenyu Liu1_2.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.91MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注