Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search

2025-04-15 0 0 4.69MB 14 页 10玖币
侵权投诉
Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural
Architecture Search
Thanh Vu1*Yanqi Zhou2Chunfeng Wen3Yueqi Li3Jan-Michael Frahm1
1UNC at Chapel Hill 2Google Research 3X, The Moonshot Factory
{tvu,jmf}@cs.unc.edu {yanqiz}@google.com {fannywen,yueqili}@google.com
Multi-Task
Learning
Hardware
-Aware NAS
Dense
Predictions
on Edge
accuracy
speed
speed
accuracy
scalability
negative
transfer
reduction
proxyless
target
task
GFLOPs
Relative Accuracy Gain (%)
-5
0
5
10
15
0 25 50 75 100
Figure 1: An overview of our proposed methods. First, EDNAS framework leverages the synergy and joint learning of multi-task dense
prediction (MT-DP) and hardware-aware NAS to both complement each component and boost on-device performance. On the left is an
illustration of the synergistic relationship of these components. Second, JAReD loss reduces depth estimation noise and further improves
accuracy. On the right is the performance of our proposed techniques on CityScapes compared to state-of-the-art MT-DP approaches.
Abstract
In this work, we propose a novel and scalable solution to
address the challenges of developing efficient dense predic-
tions on edge platforms. Our first key insight is that Multi-
Task Learning (MTL) and hardware-aware Neural Archi-
tecture Search (NAS) can work in synergy to greatly benefit
on-device Dense Predictions (DP). Empirical results reveal
that the joint learning of the two paradigms is surprisingly
effective at improving DP accuracy, achieving superior per-
formance over both the transfer learning of single-task NAS
and prior state-of-the-art approaches in MTL, all with just
1/10th of the computation. To the best of our knowledge, our
framework, named EDNAS, is the first to successfully lever-
age the synergistic relationship of NAS and MTL for DP.
Our second key insight is that the standard depth training
for multi-task DP can cause significant instability and noise
to MTL evaluation. Instead, we propose JAReD, an im-
proved, easy-to-adopt Joint Absolute-Relative Depth loss,
*Work done during an internship at X. Work done while at X.
that reduces up to 88% of the undesired noise while simul-
taneously boosting accuracy. We conduct extensive evalua-
tions on standard datasets, benchmark against strong base-
lines and state-of-the-art approaches, as well as provide an
analysis of the discovered optimal architectures.
1. Introduction
Recent years have witnessed a strong integration of com-
puter vision in many downstream edge applications such as
autonomous driving [2, 11, 38, 44, 52, 65, 68], mobile vi-
sion [16, 24, 25, 60, 61, 63], robotics [27, 35, 42], and even
computational agriculture [12, 28, 37], fueled by rapid in-
novations of deep neural networks. In many of these appli-
cations, pixel-level dense prediction tasks such as semantic
segmentation or depth estimation can play a critical role.
For example, self-driving agents are using semantic and
depth information to detect lanes, avoid obstacles, and lo-
cate their own positions. In precision agriculture, the output
of these tasks can be used for crop analysis, yield predic-
arXiv:2210.01384v1 [cs.CV] 4 Oct 2022
tion, in-field robot navigation, etc. As more and more neu-
ral models are being deployed into the real world, there has
been a continuously growing interest in developing edge-
efficient architectures for dense predictions over the years.
However, designing fast and efficient dense prediction
models for edge devices is challenging. First of all, pixel-
level predictions such as semantic segmentation and depth
estimation are fundamentally slower than some other popu-
lar vision tasks, including image classification or object de-
tection. This is because after encoding the input images into
low-spatial resolution features, these networks need to up-
sample them back to produce high-resolution output masks.
In fact, dense estimation can be several times or even an
order of magnitude slower than their counterparts, depend-
ing on the specific model, hardware, and target resolution.
Thus, real-time dense prediction models are not only non-
trivial to design, they can easily become a latency bottle-
neck in systems that utilize their outputs. Such problems
are intensified for edge applications on platforms like the
Coral TPU [13] due to the limited computational resources,
despite the need for low latency, e.g., to inform the users or
process subsequent tasks in real time.
Second, developing models for these edge environments
is costly and hard to scale in practice. On one hand, the
architectural design process requires a significant amount
of time, human labor, and expertise, with the development
process ranging from a few months to a couple of years. On
the other hand, edge applications may require deployment
on various platforms, including cell phones, robots, drones,
and more. Unfortunately, optimal designs discovered for
one hardware may not generalize to another. All of these
together pose challenges to the development of fast and ef-
ficient models for on-edge dense predictions.
To tackle these problems, our first key insight is that
Multi-Task Learning of Dense Predictions (MTL-DP or
MT-DP) and hardware-aware Neural Architecture Search
(h-NAS) can work in synergy to not only mutually ben-
efit but also significantly improve accuracy and computa-
tion. To the best of our knowledge, our framework, named
EDNAS1, is the first to successfully exploit such a syner-
gistic relationship of NAS and MTL for dense predictions.
Indeed, on one hand, state-of-the-art methods for multi-task
dense predictions [4, 22, 36, 40, 53, 58, 66], in which related
tasks are learned jointly together, mostly focus on learning
how to share a fixed set of model components effectively
among tasks but do not consider if such a set itself is op-
timal for MTL to begin with. Moreover, these works typi-
cally study large models targeting powerful graphic accel-
erators such as V100 GPU for inference and are not read-
ily suitable for edge applications. On the other hand, NAS
methods aim to automatically learn an optimal set of neu-
ral components and their connections. However, the current
1short for “Edge-Efficient Dense Predictions via Multi-Task NAS”
literature often focuses on either simpler tasks such as clas-
sification [7, 33, 62] or single-task training setup [19, 34].
In contrast, we jointly learn MTL-DP and NAS and lever-
age their strengths to tackle the aforementioned issues si-
multaneously, resulting in a novel and improved approach
to efficient dense predictions for edge.
Our second key insight is that the standard depth esti-
mation training used in MTL-DP can produce significant
fluctuation in the evaluation accuracy. Indeed, our analysis
reveals a potential for undesirably large variance in both ab-
solute and relative depth. We hypothesize that this is caused
by the standard depth training practice that relies solely on
L1loss function. This can significantly and negatively af-
fect the accuracy of MT-DP evaluation as arbitrary “im-
provement” (or “degradation”) can manifest purely because
of random fluctuation in the relative error. It is important
that we raise awareness of and appropriately address this is-
sue as segmentation and depth information are arguably two
of the most commonly jointly learned and used tasks in edge
applications. To this end, we propose JAReD, an easy-to-
adopt augmented loss that jointly and directly optimizes for
both relative and absolute depth errors. The proposed loss
is highly effective at simultaneously reducing noisy fluctu-
ations and boosting overall prediction accuracy.
We conduct extensive evaluations on CityScapes [14]
and NYUv2 [50] to demonstrate the effectiveness and ro-
bustness of EDNAS and JAReD loss. Experimental results
indicate that our methods can yield significant gains, up to
+8.5% and +10.9% DP accuracy respectively, considerably
higher than the previous state of the art, with only 1/10th of
the parameter and FLOP counts (Fig. 1).
2. Background and Related Works
In general, dense prediction models are often designed
manually, in isolation, or not necessarily constrained by
limited edge computation [10, 27, 34, 35]. Specifically,
works on multi-task learning for dense predictions (MTL-
DP) [4, 5, 20, 22, 53, 58] often take a fixed base archi-
tecture such as DeepLab [9] and focus on learning to ef-
fectively shared components, e.g. by cross-task commu-
nication modules [5, 20], adaptive tree-like branching [4,
22, 58], layer skipping [53], etc. (Fig. 2). On the other
hand, neural architecture search (NAS) studies up until re-
cently have focused mostly on either image classification
problems[1, 7, 29, 33, 39, 62] or learning tasks in isola-
tion [19, 34, 54, 67]. Few have explored architecture search
for joint training of dense prediction tasks. However, as
mentioned earlier, edge efficiency can potentially benefit
both MTL-DP and NAS. To the best of our knowledge, our
study is the first to report successful joint optimization of
these two learning paradigms for dense predictions. Next,
we give an overview of the most relevant efforts in the two
domains of MTL and NAS. For more details, please refer to
(a) Hard parameter sharing [36, 66] (b) Learning to branch [22, 4, 58] (c) Learning to skip layers [53] (d) Searching for layers (ours)
Figure 2: Conceptual comparison with existing approaches. While current MT-DP methods focus on how to better share a fixed set of
layers, we instead learn better sets of layers to share. Components in red are learnable while others are fixed
these comprehensive surveys: MTL [8, 15], MTL for dense
predictions [59], NAS [46], and hardware-aware NAS [3], .
Neural Architecture Search (NAS). In the past few years,
neural architecture search (NAS) has emerged as a so-
lution to automate parts of the network design process.
NAS methods have shown remarkable progress and outper-
formed many handcrafted models [34, 54, 55, 56]. In our
case, we are interested in hardware-aware NAS [6, 63, 67]
which can discover efficient architectures suitable for one
or multiple targeted edge platforms. This is typically done
by casting hardware-aware NAS as a multi-objective opti-
mization problem [6, 54, 63] and adding hardware cost, e.g.
latency, memory, and energy, alongside prediction accuracy,
to guide the search. However, current studies often focus on
image classification [1, 7, 29, 33, 39, 62] or learning tasks in
isolation [54, 67]. However, performing multiple dense pre-
diction tasks simultaneously can have significant benefits
for both inference speed and accuracy since tasks can lever-
age each other’s training signals as inductive biases to im-
prove their own learning and the model’s generalization [8].
Thus, we are interested in combining hardware-aware NAS
with multi-task learning of dense prediction tasks to achieve
both better accuracy and better inference speed on edge de-
vices. To this end, there have been only a limited number of
studies [4, 22, 53, 58] that started to explore similar prob-
lems, which we will discuss next.
MTL for Dense Predictions. The goal of Multi-Task
Learning (MTL) [8, 15] is to jointly learn multiple tasks
together to leverage cross-task information to improve per-
task prediction quality. In the context of edge applications,
we are also interested in the property of MTL that lets
tasks share computation and output multiple task predic-
tions in one pass, thereby improving the overall inference
speed. This is particularly useful for dense predictions be-
cause they tend to be more computationally expensive than
their counterparts such as classification [24, 26, 48, 55, 56]
or detection [57, 64]. A popular formulation of MTL
that accomplishes this goal is called hard parameter shar-
ing (HPS) [36, 66]. Compared to soft parameter sharing
(SPS) [20], whose multi-task model size scales linearly with
the number of tasks due to separate per-task sub-networks,
HPS models are more edge-friendly due to their compact
architectural structure. Specifically, HPS architectures are
typically composed of a shared trunk that extracts joint fea-
tures for all tasks and multiple per-task heads or branches
that take the extracted features as input and produce spe-
cific task prediction. The most standard setup is to have all
task heads branch off at the same point [36]. This is also
our setup of choice for the scope of this work. In addi-
tion, recent studies have begun to explore strategies to learn
adaptive sharing architectures from data [4, 22, 40, 53, 58].
Attention [40] and Layer-skipping[53] have been used to
efficiently learn a single shared model while modifying
their behaviors to output the desired task-specific predic-
tion, given a task. Other studies [4, 22, 58] opt to augment
the HPS architectures by learning the branching of tasks. In
other words, the learned models may have multiple splitting
points, where some tasks can branch off earlier while some
others share more layers. A common theme of these ap-
proaches is that given a fixed starting architecture, the focus
is on learning which components of such network should be
shared. Our work shifts the focus to the base network and
instead asks what components should be included in such
architecture to best benefit multi-task dense predictions.
3. Methodology
3.1. EDNAS: Joint MTL-DP and h-NAS
Synergistic Joint Learning. Our key idea is that we can
leverage multi-task inference to significantly reduce com-
putation across several dense prediction tasks, while utiliz-
ing hardware-aware NAS to simultaneously improve edge
latency, design scalability, and multi-task learning. Com-
bining these two paradigms, MT-DP and NAS, is beneficial
not only to edge inference but also to each other. Fig. 1
illustrates these relationships. First, regarding edge appli-
cations, multi-task models [59] that output several predic-
tions at once are attractive since they share computation
across tasks to avoid multiple inference runs and improve
the overall latency linearly by design. However, this multi-
task setup also leads to performance degradation, known as
negative transfer. While most current works attribute this
problem to improper sharing of neural components, we hy-
摘要:

TowardEdge-EfficientDensePredictionswithSynergisticMulti-TaskNeuralArchitectureSearchThanhVu1*YanqiZhou2ChunfengWen3†YueqiLi3Jan-MichaelFrahm11UNCatChapelHill2GoogleResearch3X,TheMoonshotFactory{tvu,jmf}@cs.unc.edu{yanqiz}@google.com{fannywen,yueqili}@google.comFigure1:Anoverviewofourproposedmethods...

展开>> 收起<<
Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:学术论文 价格:10玖币 属性:14 页 大小:4.69MB 格式:PDF 时间:2025-04-15

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注