TriangleNet Edge Prior Augmented Network for Semantic Segmentation through Cross-Task Consistency

2025-05-06 0 0 6.07MB 31 页 10玖币
侵权投诉
TriangleNet: Edge Prior Augmented Network
for Semantic Segmentation through
Cross-Task Consistency
Dan Zhang1, Rui Zheng 1, Luosang Gadeng2and Pei Yang3
1School of Information and Engineering, Minzu University of China, Beijing 100081, China
2Department of Information Science and Technology, Tibet University, Lhasa 850012, China
3Department of Computer Technology and Application, Qinghai University, Xining 810016, China
zhangdan@muc.edu.cn, rzhengbj@163.com, lsgd@utibet.edu.cn, yangpeinmgdx@sina.com
Abstract
This paper addresses the task of semantic segmentation in com-
puter vision, aiming to achieve precise pixel-wise classification. We
investigate the joint training of models for semantic edge detection
and semantic segmentation, which has shown promise. However, im-
plicit cross-task consistency learning in multi-task networks is limited.
To address this, we propose a novel "decoupled cross-task consistency
loss" that explicitly enhances cross-task consistency. Our semantic
segmentation network, TriangleNet, achieves a substantial 2.88% im-
provement over the Baseline in mean Intersection over Union (mIoU)
on the Cityscapes test set. Notably, TriangleNet operates at 77.4%
mIoU/46.2 FPS on Cityscapes, showcasing real-time inference capabil-
ities at full resolution. With multi-scale inference, performance is fur-
ther enhanced to 77.8%. Furthermore, TriangleNet consistently out-
performs the Baseline on the FloodNet dataset, demonstrating its ro-
bust generalization capabilities. The proposed method underscores the
significance of multi-task learning and explicit cross-task consistency
enhancement for advancing semantic segmentation and highlights the
potential of multitasking in real-time semantic segmentation.
Keywords: Semantic Segmentation; Real-Time Semantic Segmenta-
tion; Multi-Task Learning; Cross-Task Consistency
Corresponding author
1
arXiv:2210.05152v5 [cs.CV] 30 Aug 2023
1 Introduction
The combination of image semantic segmentation and deep learning has gone
through a long period of time, accumulating a large number of excellent
works such as FCN [1], U-Net [2], FastFCN [3], Gated-SCNN [4], DeepLab
Series [5,6,7], Mask R-CNN [8] and so on, as well as leaving unsolved prob-
lems. The main challenge is the fine-grained localization of pixel labels [9].
The prevailing structure of semantic segmentation networks mostly follows
the encoder-decoder structure adopted by FCN [1]. First, downsampling is
used to expand the receptive field to extract high-level semantics, and then
upsampling is used to recover low-level details. The edge details lost by con-
ventional downsampling operations in semantic segmentation networks are
difficult to recover during upsampling. A compensatory solution is to intro-
duce additional knowledge among which edge priors are intuitive and easily
accessible. In order to inject edge priors into semantic segmentation net-
works, one way is to train a semantic edge detection model and a semantic
segmentation model jointly. General practice is a two-stream framework that
trains a semantic edge detection branch and a semantic segmentation branch
in a hard parameter-sharing manner [10]. The predictions of the semantic
edge detection branch on edge points may differ from those of the semantic
segmentation branch, which implies the existence of cross-task inconsistency.
Conventionally, a fusion module is introduced to cope with this conflict, such
as [11,12] do, which intends to fuse features from the semantic edge detection
branch to improve the semantic segmentation branch. However, the effects of
these fusion modules are sometimes not as effective as expected. As the abla-
tion experiments of [11] point out, the improvement of the mean of class-wise
intersection-over-union (mIoU) on the Cityscapes validation set mainly de-
pends on duality loss(+1.44%) rather than semantic edge fusion(+0.22%), or
pyramid context module(+0.62%). A considerable amount of segmentation
errors along object boundaries still exist, which means the mutual consistency
between the semantic segmentation branch and the semantic edge detection
branch should be further studied to improve the quality of segmentation
results.
We have observed that many semantic segmentation works can be loosely
viewed as semantic edge detection tasks, since applying edge detectors to
semantic segmentation outputs can yield semantic edge results. Their rela-
tionship can be modeled as Figure. 2. Logically, in order to conserve con-
sistency among tasks, the results of inferring semantic edges from an input
image should be the same regardless of the inference paths. That is, predict-
ing semantic edges by first predicting semantic segmentation maps from an
input image should achieve similar predictions as directly predicting seman-
2
tic edges from the input image. This observation aligns with the concept of
inference-path invariance, which serves as the guiding ideology in the work
by Zamir et al. [13]. The concept emphasizes that predictions should re-
main consistent regardless of the specific inference paths. The input image
domain, the semantic segmentation domain, and the semantic edge domain
form an Elementary Consistency Unit proposed by Zamir et al. [13], which
can be illustrated by Figure. 2.
By imposing a cross-task consistency loss on the endpoint outputs of the
two paths, the consistency between semantic segmentation and semantic edge
detection can be explicitly learned. Based on these analyses, we propose a
new framework to simultaneously train a semantic segmentation branch and a
semantic edge detection branch, and the overall process is shown in Figure. 3.
The highlights of this paper are as follows.
1. Figure 1illustrates the superior balance between speed and accuracy
achieved by our framework on the Cityscapes dataset, distinguishing it
as one of the few models capable of real-time inference at full resolu-
tion. Notably, our model operates at an impressive 77.4% mIoU while
maintaining a fast frame rate of 46.2 FPS on Cityscapes.
2. We introduce a novel approach, "decoupled cross-task consistency loss,"
to explicitly enhance cross-task consistency between semantic edge de-
tection and semantic segmentation, resulting in 1.83% improvement in
mIoU on the Cityscapes test set. The decoupled loss effectively enforces
consistency across tasks, facilitating the learning of shared representa-
tions and leading to improved overall performance.
3. Our model demonstrates exceptional efficacy in categories character-
ized by distinct edges and boundaries, as evidenced by some categories
achieving significant IoU improvements, with "train" nearly reaching
an 18% increase in IoU on the Cityscapes test set. These results further
reinforce the importance of incorporating edge information through our
approach, highlighting its impact on enhancing segmentation perfor-
mance.
4. The decoupled architecture we have designed allows for joint training
of multiple tasks without the need for fusion modules during inference,
thereby avoiding the introduction of extra inference overhead. This ef-
ficient and practical approach enables us to leverage the advantages of
multitasking for real-time semantic segmentation without compromis-
ing on performance.
3
Figure 1: Run-time/accuracy trade-off comparison on the Cityscapes test
set. Our models (in red) achieves an excellent run-time vs. accuracy trade-
off among all previous real-time methods. FPS=30 is the red line dividing
real-time and non-real-time performance in the graph. The asterisk after
the model name indicates that the inference speeds of these models were
obtained using the same deep learning framework, PaddlePaddle, and the
same hardware platform, A100 40G device.
2 Related Work
2.1 Semantic Segmentation
Strengths, weaknesses, and major challenges of semantic segmentation are
extensively discussed in the literature [14,15,16,9]. There are currently two
approaches to semantic segmentation: improving the object’s inner consis-
tency or refining details along objects’ boundaries.
The inner inconsistency of the object is attributed to the limited receptive
field, by which the longer-range relationships of pixels in an image cannot
be fully modeled. Consequently, dilated convolution [17] or high-resolution
network [18] is introduced to enlarge the receptive field. Furthermore, many
attempts have been made to capture contextual information, such as recur-
4
... ...
... ...
S
E
C
Input
Semantic Edge Detection
Semantic Segmentation
Edge Detector
Figure 2: The multi-task learning framework of semantic segmentation and
semantic edge detection coincides with the Elementary Consistency Unit the-
ory where the prediction χψ1is enforced to be consistent with χψ2
using a function that relates ψ1to ψ2.S,E, and Crespectively denote the
outputs processed through Γχψ1,Γχψ2, and Γψ1ψ2.
rent networks [19,20], pyramid pooling module [21], graph convolution net-
works [22], CRF related networks [5,6,23], non-local operator [24], attention
mechanism [25,26], etc.
The ambiguity along edges is caused by down-sampling operations in the
FCNs that result in blurred predictions. It is difficult to recover spatial
information lost during down-sampling through simple up-sampling. Thus,
previous papers have made efforts to add priors to guide the upsampling
process, many of which focus on the use of edge priors. The general practice is
a two-stream framework that trains an edge detection branch and a semantic
segmentation branch jointly, which will be elaborated later.
2.2 Multi-Task Learning
Driven by deep learning, many dense prediction tasks such as semantic seg-
mentation, instance segmentation, etc. have achieved significant performance
improvements. Typically, tasks are learned in isolation, i.e. each task is
trained with a separate neural network. Recently, multi-task learning (MTL)
5
摘要:

TriangleNet:EdgePriorAugmentedNetworkforSemanticSegmentationthroughCross-TaskConsistencyDanZhang1,RuiZheng∗1,LuosangGadeng2andPeiYang31SchoolofInformationandEngineering,MinzuUniversityofChina,Beijing100081,China2DepartmentofInformationScienceandTechnology,TibetUniversity,Lhasa850012,China3Department...

展开>> 收起<<
TriangleNet Edge Prior Augmented Network for Semantic Segmentation through Cross-Task Consistency.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:6.07MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注