TriangleNet Edge Prior Augmented Network for Semantic Segmentation through Cross-Task Consistency

2025-05-06 0 0 6.07MB 31 页 10玖币

侵权投诉

TriangleNet: Edge Prior Augmented Network

for Semantic Segmentation through

Cross-Task Consistency

Dan Zhang1, Rui Zheng ∗1, Luosang Gadeng2and Pei Yang3

1School of Information and Engineering, Minzu University of China, Beijing 100081, China

2Department of Information Science and Technology, Tibet University, Lhasa 850012, China

3Department of Computer Technology and Application, Qinghai University, Xining 810016, China

zhangdan@muc.edu.cn, rzhengbj@163.com, lsgd@utibet.edu.cn, yangpeinmgdx@sina.com

Abstract

This paper addresses the task of semantic segmentation in com-

puter vision, aiming to achieve precise pixel-wise classiﬁcation. We

investigate the joint training of models for semantic edge detection

and semantic segmentation, which has shown promise. However, im-

plicit cross-task consistency learning in multi-task networks is limited.

To address this, we propose a novel "decoupled cross-task consistency

loss" that explicitly enhances cross-task consistency. Our semantic

segmentation network, TriangleNet, achieves a substantial 2.88% im-

provement over the Baseline in mean Intersection over Union (mIoU)

on the Cityscapes test set. Notably, TriangleNet operates at 77.4%

mIoU/46.2 FPS on Cityscapes, showcasing real-time inference capabil-

ities at full resolution. With multi-scale inference, performance is fur-

ther enhanced to 77.8%. Furthermore, TriangleNet consistently out-

performs the Baseline on the FloodNet dataset, demonstrating its ro-

bust generalization capabilities. The proposed method underscores the

signiﬁcance of multi-task learning and explicit cross-task consistency

enhancement for advancing semantic segmentation and highlights the

potential of multitasking in real-time semantic segmentation.

Keywords: Semantic Segmentation; Real-Time Semantic Segmenta-

tion; Multi-Task Learning; Cross-Task Consistency

∗Corresponding author

arXiv:2210.05152v5 [cs.CV] 30 Aug 2023

1 Introduction

The combination of image semantic segmentation and deep learning has gone

through a long period of time, accumulating a large number of excellent

works such as FCN [1], U-Net [2], FastFCN [3], Gated-SCNN [4], DeepLab

Series [5,6,7], Mask R-CNN [8] and so on, as well as leaving unsolved prob-

lems. The main challenge is the ﬁne-grained localization of pixel labels [9].

The prevailing structure of semantic segmentation networks mostly follows

the encoder-decoder structure adopted by FCN [1]. First, downsampling is

used to expand the receptive ﬁeld to extract high-level semantics, and then

upsampling is used to recover low-level details. The edge details lost by con-

ventional downsampling operations in semantic segmentation networks are

diﬃcult to recover during upsampling. A compensatory solution is to intro-

duce additional knowledge among which edge priors are intuitive and easily

accessible. In order to inject edge priors into semantic segmentation net-

works, one way is to train a semantic edge detection model and a semantic

segmentation model jointly. General practice is a two-stream framework that

trains a semantic edge detection branch and a semantic segmentation branch

in a hard parameter-sharing manner [10]. The predictions of the semantic

edge detection branch on edge points may diﬀer from those of the semantic

segmentation branch, which implies the existence of cross-task inconsistency.

Conventionally, a fusion module is introduced to cope with this conﬂict, such

as [11,12] do, which intends to fuse features from the semantic edge detection

branch to improve the semantic segmentation branch. However, the eﬀects of

these fusion modules are sometimes not as eﬀective as expected. As the abla-

tion experiments of [11] point out, the improvement of the mean of class-wise

intersection-over-union (mIoU) on the Cityscapes validation set mainly de-

pends on duality loss(+1.44%) rather than semantic edge fusion(+0.22%), or

pyramid context module(+0.62%). A considerable amount of segmentation

errors along object boundaries still exist, which means the mutual consistency

between the semantic segmentation branch and the semantic edge detection

branch should be further studied to improve the quality of segmentation

results.

We have observed that many semantic segmentation works can be loosely

viewed as semantic edge detection tasks, since applying edge detectors to

semantic segmentation outputs can yield semantic edge results. Their rela-

tionship can be modeled as Figure. 2. Logically, in order to conserve con-

sistency among tasks, the results of inferring semantic edges from an input

image should be the same regardless of the inference paths. That is, predict-

ing semantic edges by ﬁrst predicting semantic segmentation maps from an

input image should achieve similar predictions as directly predicting seman-

tic edges from the input image. This observation aligns with the concept of

inference-path invariance, which serves as the guiding ideology in the work

by Zamir et al. [13]. The concept emphasizes that predictions should re-

main consistent regardless of the speciﬁc inference paths. The input image

domain, the semantic segmentation domain, and the semantic edge domain

form an Elementary Consistency Unit proposed by Zamir et al. [13], which

can be illustrated by Figure. 2.

By imposing a cross-task consistency loss on the endpoint outputs of the

two paths, the consistency between semantic segmentation and semantic edge

detection can be explicitly learned. Based on these analyses, we propose a

new framework to simultaneously train a semantic segmentation branch and a

semantic edge detection branch, and the overall process is shown in Figure. 3.

The highlights of this paper are as follows.

1. Figure 1illustrates the superior balance between speed and accuracy

achieved by our framework on the Cityscapes dataset, distinguishing it

as one of the few models capable of real-time inference at full resolu-

tion. Notably, our model operates at an impressive 77.4% mIoU while

maintaining a fast frame rate of 46.2 FPS on Cityscapes.

2. We introduce a novel approach, "decoupled cross-task consistency loss,"

to explicitly enhance cross-task consistency between semantic edge de-

tection and semantic segmentation, resulting in 1.83% improvement in

mIoU on the Cityscapes test set. The decoupled loss eﬀectively enforces

consistency across tasks, facilitating the learning of shared representa-

tions and leading to improved overall performance.

3. Our model demonstrates exceptional eﬃcacy in categories character-

ized by distinct edges and boundaries, as evidenced by some categories

achieving signiﬁcant IoU improvements, with "train" nearly reaching

an 18% increase in IoU on the Cityscapes test set. These results further

reinforce the importance of incorporating edge information through our

approach, highlighting its impact on enhancing segmentation perfor-

mance.

4. The decoupled architecture we have designed allows for joint training

of multiple tasks without the need for fusion modules during inference,

thereby avoiding the introduction of extra inference overhead. This ef-

ﬁcient and practical approach enables us to leverage the advantages of

multitasking for real-time semantic segmentation without compromis-

ing on performance.

Figure 1: Run-time/accuracy trade-oﬀ comparison on the Cityscapes test

set. Our models (in red) achieves an excellent run-time vs. accuracy trade-

oﬀ among all previous real-time methods. FPS=30 is the red line dividing

real-time and non-real-time performance in the graph. The asterisk after

the model name indicates that the inference speeds of these models were

obtained using the same deep learning framework, PaddlePaddle, and the

same hardware platform, A100 40G device.

2 Related Work

2.1 Semantic Segmentation

Strengths, weaknesses, and major challenges of semantic segmentation are

extensively discussed in the literature [14,15,16,9]. There are currently two

approaches to semantic segmentation: improving the object’s inner consis-

tency or reﬁning details along objects’ boundaries.

The inner inconsistency of the object is attributed to the limited receptive

ﬁeld, by which the longer-range relationships of pixels in an image cannot

be fully modeled. Consequently, dilated convolution [17] or high-resolution

network [18] is introduced to enlarge the receptive ﬁeld. Furthermore, many

attempts have been made to capture contextual information, such as recur-

... ...

Input

Semantic Edge Detection

Semantic Segmentation

Edge Detector

Figure 2: The multi-task learning framework of semantic segmentation and

semantic edge detection coincides with the Elementary Consistency Unit the-

ory where the prediction χ→ψ1is enforced to be consistent with χ→ψ2

using a function that relates ψ1to ψ2.S,E, and Crespectively denote the

outputs processed through Γχψ1,Γχψ2, and Γψ1ψ2.

rent networks [19,20], pyramid pooling module [21], graph convolution net-

works [22], CRF related networks [5,6,23], non-local operator [24], attention

mechanism [25,26], etc.

The ambiguity along edges is caused by down-sampling operations in the

FCNs that result in blurred predictions. It is diﬃcult to recover spatial

information lost during down-sampling through simple up-sampling. Thus,

previous papers have made eﬀorts to add priors to guide the upsampling

process, many of which focus on the use of edge priors. The general practice is

a two-stream framework that trains an edge detection branch and a semantic

segmentation branch jointly, which will be elaborated later.

2.2 Multi-Task Learning

Driven by deep learning, many dense prediction tasks such as semantic seg-

mentation, instance segmentation, etc. have achieved signiﬁcant performance

improvements. Typically, tasks are learned in isolation, i.e. each task is

trained with a separate neural network. Recently, multi-task learning (MTL)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TriangleNet:EdgePriorAugmentedNetworkforSemanticSegmentationthroughCross-TaskConsistencyDanZhang1,RuiZheng∗1,LuosangGadeng2andPeiYang31SchoolofInformationandEngineering,MinzuUniversityofChina,Beijing100081,China2DepartmentofInformationScienceandTechnology,TibetUniversity,Lhasa850012,China3Department...

展开>> 收起<<

TriangleNet Edge Prior Augmented Network for Semantic Segmentation through Cross-Task Consistency.pdf

共31页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TriangleNet Edge Prior Augmented Network for Semantic Segmentation through Cross-Task Consistency

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: