Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search

2025-04-15 0 0 4.69MB 14 页 10玖币

侵权投诉

Toward Edge-Efﬁcient Dense Predictions with Synergistic Multi-Task Neural

Architecture Search

Thanh Vu1*Yanqi Zhou2Chunfeng Wen3†Yueqi Li3Jan-Michael Frahm1

1UNC at Chapel Hill 2Google Research 3X, The Moonshot Factory

{tvu,jmf}@cs.unc.edu {yanqiz}@google.com {fannywen,yueqili}@google.com

Multi-Task

Learning

Hardware

-Aware NAS

Dense

Predictions

on Edge

accuracy

speed

accuracy

scalability

negative

transfer

reduction

proxyless

target

task

GFLOPs

Relative Accuracy Gain (%)

-5

0 25 50 75 100

Figure 1: An overview of our proposed methods. First, EDNAS framework leverages the synergy and joint learning of multi-task dense

prediction (MT-DP) and hardware-aware NAS to both complement each component and boost on-device performance. On the left is an

illustration of the synergistic relationship of these components. Second, JAReD loss reduces depth estimation noise and further improves

accuracy. On the right is the performance of our proposed techniques on CityScapes compared to state-of-the-art MT-DP approaches.

Abstract

In this work, we propose a novel and scalable solution to

address the challenges of developing efﬁcient dense predic-

tions on edge platforms. Our ﬁrst key insight is that Multi-

Task Learning (MTL) and hardware-aware Neural Archi-

tecture Search (NAS) can work in synergy to greatly beneﬁt

on-device Dense Predictions (DP). Empirical results reveal

that the joint learning of the two paradigms is surprisingly

effective at improving DP accuracy, achieving superior per-

formance over both the transfer learning of single-task NAS

and prior state-of-the-art approaches in MTL, all with just

1/10th of the computation. To the best of our knowledge, our

framework, named EDNAS, is the ﬁrst to successfully lever-

age the synergistic relationship of NAS and MTL for DP.

Our second key insight is that the standard depth training

for multi-task DP can cause signiﬁcant instability and noise

to MTL evaluation. Instead, we propose JAReD, an im-

proved, easy-to-adopt Joint Absolute-Relative Depth loss,

*Work done during an internship at X. †Work done while at X.

that reduces up to 88% of the undesired noise while simul-

taneously boosting accuracy. We conduct extensive evalua-

tions on standard datasets, benchmark against strong base-

lines and state-of-the-art approaches, as well as provide an

analysis of the discovered optimal architectures.

1. Introduction

Recent years have witnessed a strong integration of com-

puter vision in many downstream edge applications such as

autonomous driving [2, 11, 38, 44, 52, 65, 68], mobile vi-

sion [16, 24, 25, 60, 61, 63], robotics [27, 35, 42], and even

computational agriculture [12, 28, 37], fueled by rapid in-

novations of deep neural networks. In many of these appli-

cations, pixel-level dense prediction tasks such as semantic

segmentation or depth estimation can play a critical role.

For example, self-driving agents are using semantic and

depth information to detect lanes, avoid obstacles, and lo-

cate their own positions. In precision agriculture, the output

of these tasks can be used for crop analysis, yield predic-

arXiv:2210.01384v1 [cs.CV] 4 Oct 2022

tion, in-ﬁeld robot navigation, etc. As more and more neu-

ral models are being deployed into the real world, there has

been a continuously growing interest in developing edge-

efﬁcient architectures for dense predictions over the years.

However, designing fast and efﬁcient dense prediction

models for edge devices is challenging. First of all, pixel-

level predictions such as semantic segmentation and depth

estimation are fundamentally slower than some other popu-

lar vision tasks, including image classiﬁcation or object de-

tection. This is because after encoding the input images into

low-spatial resolution features, these networks need to up-

sample them back to produce high-resolution output masks.

In fact, dense estimation can be several times or even an

order of magnitude slower than their counterparts, depend-

ing on the speciﬁc model, hardware, and target resolution.

Thus, real-time dense prediction models are not only non-

trivial to design, they can easily become a latency bottle-

neck in systems that utilize their outputs. Such problems

are intensiﬁed for edge applications on platforms like the

Coral TPU [13] due to the limited computational resources,

despite the need for low latency, e.g., to inform the users or

process subsequent tasks in real time.

Second, developing models for these edge environments

is costly and hard to scale in practice. On one hand, the

architectural design process requires a signiﬁcant amount

of time, human labor, and expertise, with the development

process ranging from a few months to a couple of years. On

the other hand, edge applications may require deployment

on various platforms, including cell phones, robots, drones,

and more. Unfortunately, optimal designs discovered for

one hardware may not generalize to another. All of these

together pose challenges to the development of fast and ef-

ﬁcient models for on-edge dense predictions.

To tackle these problems, our ﬁrst key insight is that

Multi-Task Learning of Dense Predictions (MTL-DP or

MT-DP) and hardware-aware Neural Architecture Search

(h-NAS) can work in synergy to not only mutually ben-

eﬁt but also signiﬁcantly improve accuracy and computa-

tion. To the best of our knowledge, our framework, named

EDNAS1, is the ﬁrst to successfully exploit such a syner-

gistic relationship of NAS and MTL for dense predictions.

Indeed, on one hand, state-of-the-art methods for multi-task

dense predictions [4, 22, 36, 40, 53, 58, 66], in which related

tasks are learned jointly together, mostly focus on learning

how to share a ﬁxed set of model components effectively

among tasks but do not consider if such a set itself is op-

timal for MTL to begin with. Moreover, these works typi-

cally study large models targeting powerful graphic accel-

erators such as V100 GPU for inference and are not read-

ily suitable for edge applications. On the other hand, NAS

methods aim to automatically learn an optimal set of neu-

ral components and their connections. However, the current

1short for “Edge-Efﬁcient Dense Predictions via Multi-Task NAS”

literature often focuses on either simpler tasks such as clas-

siﬁcation [7, 33, 62] or single-task training setup [19, 34].

In contrast, we jointly learn MTL-DP and NAS and lever-

age their strengths to tackle the aforementioned issues si-

multaneously, resulting in a novel and improved approach

to efﬁcient dense predictions for edge.

Our second key insight is that the standard depth esti-

mation training used in MTL-DP can produce signiﬁcant

ﬂuctuation in the evaluation accuracy. Indeed, our analysis

reveals a potential for undesirably large variance in both ab-

solute and relative depth. We hypothesize that this is caused

by the standard depth training practice that relies solely on

L1loss function. This can signiﬁcantly and negatively af-

fect the accuracy of MT-DP evaluation as arbitrary “im-

provement” (or “degradation”) can manifest purely because

of random ﬂuctuation in the relative error. It is important

that we raise awareness of and appropriately address this is-

sue as segmentation and depth information are arguably two

of the most commonly jointly learned and used tasks in edge

applications. To this end, we propose JAReD, an easy-to-

adopt augmented loss that jointly and directly optimizes for

both relative and absolute depth errors. The proposed loss

is highly effective at simultaneously reducing noisy ﬂuctu-

ations and boosting overall prediction accuracy.

We conduct extensive evaluations on CityScapes [14]

and NYUv2 [50] to demonstrate the effectiveness and ro-

bustness of EDNAS and JAReD loss. Experimental results

indicate that our methods can yield signiﬁcant gains, up to

+8.5% and +10.9% DP accuracy respectively, considerably

higher than the previous state of the art, with only 1/10th of

the parameter and FLOP counts (Fig. 1).

2. Background and Related Works

In general, dense prediction models are often designed

manually, in isolation, or not necessarily constrained by

limited edge computation [10, 27, 34, 35]. Speciﬁcally,

works on multi-task learning for dense predictions (MTL-

DP) [4, 5, 20, 22, 53, 58] often take a ﬁxed base archi-

tecture such as DeepLab [9] and focus on learning to ef-

fectively shared components, e.g. by cross-task commu-

nication modules [5, 20], adaptive tree-like branching [4,

22, 58], layer skipping [53], etc. (Fig. 2). On the other

hand, neural architecture search (NAS) studies up until re-

cently have focused mostly on either image classiﬁcation

problems[1, 7, 29, 33, 39, 62] or learning tasks in isola-

tion [19, 34, 54, 67]. Few have explored architecture search

for joint training of dense prediction tasks. However, as

mentioned earlier, edge efﬁciency can potentially beneﬁt

both MTL-DP and NAS. To the best of our knowledge, our

study is the ﬁrst to report successful joint optimization of

these two learning paradigms for dense predictions. Next,

we give an overview of the most relevant efforts in the two

domains of MTL and NAS. For more details, please refer to

(a) Hard parameter sharing [36, 66] (b) Learning to branch [22, 4, 58] (c) Learning to skip layers [53] (d) Searching for layers (ours)

Figure 2: Conceptual comparison with existing approaches. While current MT-DP methods focus on how to better share a ﬁxed set of

layers, we instead learn better sets of layers to share. Components in red are learnable while others are ﬁxed

these comprehensive surveys: MTL [8, 15], MTL for dense

predictions [59], NAS [46], and hardware-aware NAS [3], .

Neural Architecture Search (NAS). In the past few years,

neural architecture search (NAS) has emerged as a so-

lution to automate parts of the network design process.

NAS methods have shown remarkable progress and outper-

formed many handcrafted models [34, 54, 55, 56]. In our

case, we are interested in hardware-aware NAS [6, 63, 67]

which can discover efﬁcient architectures suitable for one

or multiple targeted edge platforms. This is typically done

by casting hardware-aware NAS as a multi-objective opti-

mization problem [6, 54, 63] and adding hardware cost, e.g.

latency, memory, and energy, alongside prediction accuracy,

to guide the search. However, current studies often focus on

image classiﬁcation [1, 7, 29, 33, 39, 62] or learning tasks in

isolation [54, 67]. However, performing multiple dense pre-

diction tasks simultaneously can have signiﬁcant beneﬁts

for both inference speed and accuracy since tasks can lever-

age each other’s training signals as inductive biases to im-

prove their own learning and the model’s generalization [8].

Thus, we are interested in combining hardware-aware NAS

with multi-task learning of dense prediction tasks to achieve

both better accuracy and better inference speed on edge de-

vices. To this end, there have been only a limited number of

studies [4, 22, 53, 58] that started to explore similar prob-

lems, which we will discuss next.

MTL for Dense Predictions. The goal of Multi-Task

Learning (MTL) [8, 15] is to jointly learn multiple tasks

together to leverage cross-task information to improve per-

task prediction quality. In the context of edge applications,

we are also interested in the property of MTL that lets

tasks share computation and output multiple task predic-

tions in one pass, thereby improving the overall inference

speed. This is particularly useful for dense predictions be-

cause they tend to be more computationally expensive than

their counterparts such as classiﬁcation [24, 26, 48, 55, 56]

or detection [57, 64]. A popular formulation of MTL

that accomplishes this goal is called hard parameter shar-

ing (HPS) [36, 66]. Compared to soft parameter sharing

(SPS) [20], whose multi-task model size scales linearly with

the number of tasks due to separate per-task sub-networks,

HPS models are more edge-friendly due to their compact

architectural structure. Speciﬁcally, HPS architectures are

typically composed of a shared trunk that extracts joint fea-

tures for all tasks and multiple per-task heads or branches

that take the extracted features as input and produce spe-

ciﬁc task prediction. The most standard setup is to have all

task heads branch off at the same point [36]. This is also

our setup of choice for the scope of this work. In addi-

tion, recent studies have begun to explore strategies to learn

adaptive sharing architectures from data [4, 22, 40, 53, 58].

Attention [40] and Layer-skipping[53] have been used to

efﬁciently learn a single shared model while modifying

their behaviors to output the desired task-speciﬁc predic-

tion, given a task. Other studies [4, 22, 58] opt to augment

the HPS architectures by learning the branching of tasks. In

other words, the learned models may have multiple splitting

points, where some tasks can branch off earlier while some

others share more layers. A common theme of these ap-

proaches is that given a ﬁxed starting architecture, the focus

is on learning which components of such network should be

shared. Our work shifts the focus to the base network and

instead asks what components should be included in such

architecture to best beneﬁt multi-task dense predictions.

3. Methodology

3.1. EDNAS: Joint MTL-DP and h-NAS

Synergistic Joint Learning. Our key idea is that we can

leverage multi-task inference to signiﬁcantly reduce com-

putation across several dense prediction tasks, while utiliz-

ing hardware-aware NAS to simultaneously improve edge

latency, design scalability, and multi-task learning. Com-

bining these two paradigms, MT-DP and NAS, is beneﬁcial

not only to edge inference but also to each other. Fig. 1

illustrates these relationships. First, regarding edge appli-

cations, multi-task models [59] that output several predic-

tions at once are attractive since they share computation

across tasks to avoid multiple inference runs and improve

the overall latency linearly by design. However, this multi-

task setup also leads to performance degradation, known as

negative transfer. While most current works attribute this

problem to improper sharing of neural components, we hy-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardEdge-EfficientDensePredictionswithSynergisticMulti-TaskNeuralArchitectureSearchThanhVu1*YanqiZhou2ChunfengWen3†YueqiLi3Jan-MichaelFrahm11UNCatChapelHill2GoogleResearch3X,TheMoonshotFactory{tvu,jmf}@cs.unc.edu{yanqiz}@google.com{fannywen,yueqili}@google.comFigure1:Anoverviewofourproposedmethods...

展开>> 收起<<

Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: