ASAP Accurate semantic segmentation for real time performance Jae Hyun Park

2025-05-08 0 0 1.63MB 5 页 10玖币

侵权投诉

ASAP: Accurate semantic segmentation for real

time performance

Jae Hyun Park

AI tech team

Lotte Data Communication Company

Seoul, South Korea

jaehyun-park@lotte.net

Su Bin Lee

AI tech team

Lotte Data Communication Company

Seoul, South Korea

leesubin@lotte.net

Eon Kim

AI tech team

Lotte Data Communication Company

Seoul, South Korea

eon.kim@lotte.net

Byeong Jun Moon

AI tech team

Lotte Data Communication Company

Seoul, South Korea

bj moon@lotte.net

Da Been Yu

AI tech team

Lotte Data Communication Company

Seoul, South Korea

db.yu@lotte.net

Yeon Seung Yu

AI tech team

Lotte Data Communication Company

Seoul, South Korea

yys4000@lotte.net

Jung Hwan Kim

AI tech team

Lotte Data Communication Company

Seoul, South Korea

jhwan kim@lotte.net

Abstract—Feature fusion modules from encoder and self-

attention module have been adopted in semantic segmentation.

However, the computation of these modules is costly and has

operational limitations in real-time environments. In addition,

segmentation performance is limited in autonomous driving

environments with a lot of contextual information perpendicular

to the road surface, such as people, buildings, and general objects.

In this paper, we propose an efﬁcient feature fusion method,

Feature Fusion with Different Norms (FFDN) that utilizes rich

global context of multi-level scale and vertical pooling module

before self-attention that preserves most contextual information

while reducing the complexity of global context encoding in

the vertical direction. By doing this, we could handle the

properties of representation in global space and reduce additional

computational cost. In addition, we analyze low performance in

challenging cases including small and vertically featured objects.

We achieve the mean Interaction of-union(mIoU) of 73.1 and the

Frame Per Second(FPS) of 191, which are comparable results

with state-of-the-arts on Cityscapes test datasets.

Index Terms—semantic segmentation, deep learning

I. INTRODUCTION

Semantic segmentation is a per-pixel classiﬁcation which

predicts pixel by pixel. Including biomedical and human-

machine interaction, semantic segmentation has been widely

researched [1], [2].

In particular, segmentation used in autonomous driving,

such as depth estimation and free space, operates in real time

and requires fast inference speed and high performance. To

improve inference speed, aligned feature maps at adjacent

levels used to balance performance and inference speed in

segmentation task [3]. ladder-style lightweight decoder is

designed for upsampling low spatial resolution [4].

To achieve high accuracy, segmentation models require

global contextual information and capabilities with multi-

level semantics. Some studies include a self-attention module,

which helps to concentrate contextual features [5] to satisfy

accuracy. Other studies propose the feature fusion module,

which combine multi-level features [6], [7]. However, these

modules, which contain convolution-based operations to fusion

multi-level features, require huge computational complexity

and memory storage.

In order to reduce the amount of computation while not

dropping the accuracy, we attempt to exploit normalization

technics in feature fusion of semantic segmentation. In U-GAT-

IT [1], spatial and semantic contents are considered adequately

by using adaptive normalizations [12, 13] to reﬂect image

content such as style and geometry information. Inspired by

these approaches, we propose an efﬁcient Feature Fusion

with Different Norms (FFDN) where layer normalization

and instance normalization are used in aggregating features

of different layers as shown in Fig 2. These normalization

methods allow the segmentation model to obtain exact object

location from spatial information and detailed parts of object

from content information with low computational complexity.

FFDN receives multi-level features obtained from simply

modiﬁed FPN (*FPN) as input and combines them to capture

global properties of representations. The *FPN is shown in

Fig 1.

One of the challenging problems with semantic segmen-

tation is considering speciﬁc directions, such as vertical or

diagonal (e.g., people, pole). Since a general convolution or

pooling operation uses uniform kernels with the same height

arXiv:2210.01323v1 [cs.CV] 4 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ASAP:AccuratesemanticsegmentationforrealtimeperformanceJaeHyunParkAItechteamLotteDataCommunicationCompanySeoul,SouthKoreajaehyun-park@lotte.netSuBinLeeAItechteamLotteDataCommunicationCompanySeoul,SouthKorealeesubin@lotte.netEonKimAItechteamLotteDataCommunicationCompanySeoul,SouthKoreaeon.kim@lotte.n...

展开>> 收起<<

ASAP Accurate semantic segmentation for real time performance Jae Hyun Park.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ASAP Accurate semantic segmentation for real time performance Jae Hyun Park

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: