Streaming Video Analytics On The Edge With Asynchronous Cloud Support

2025-05-02 0 0 7.87MB 12 页 10玖币
侵权投诉
Streaming Video Analytics On The Edge With Asynchronous
Cloud Support
Anurag Ghosh
Carnegie Mellon University
Pittsburgh, PA, USA
anuraggh@andrew.cmu.edu
Srinivasan Iyengar
Microsoft Research
Bangalore, India
sriyengar@microsoft.com
Stephen Lee
University of Pittsburgh
Pittsburgh, PA, USA
stephen.lee@pitt.edu
Anuj Rathore
Clutterbot
Bangalore, India
anuj@clutterbot.com
Venkat N Padmanabhan
Microsoft Research
Bangalore, India
padmanab@microsoft.com
ABSTRACT
Emerging Internet of Things (IoT) and mobile computing applica-
tions are expected to support latency-sensitive deep neural network
(DNN) workloads. To realize this vision, the Internet is evolving
towards an edge-computing architecture, where computing infras-
tructure is located closer to the end device to help achieve low
latency. However, edge computing may have limited resources
compared to cloud environments and thus, cannot run large DNN
models that often have high accuracy.
In this work, we develop
REACT
, a framework that leverages
cloud resources to execute large DNN models with higher accu-
racy to improve the accuracy of models running on edge devices.
To do so, we propose a novel edge-cloud fusion algorithm that
fuses edge and cloud predictions, achieving low latency and high
accuracy. We extensively evaluate our approach and show that
our approach can signicantly improve the accuracy compared to
baseline approaches. We focus specically on object detection in
videos (applicable in many video analytics scenarios) and show that
the fused edge-cloud predictions can outperform the accuracy of
edge-only and cloud-only scenarios by as much as 50%. We also
show that
REACT
can achieve good performance across tradeo
points by choosing a wide range of system parameters to satisfy
use-case specic constraints, such as limited network bandwidth
or GPU cycles.
ACM Reference Format:
Anurag Ghosh, Srinivasan Iyengar, Stephen Lee, Anuj Rathore, and Venkat
N Padmanabhan. 2022. Streaming Video Analytics On The Edge With
Asynchronous Cloud Support. In Proceedings of ACM Conference (Con-
ference’17). ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/
nnnnnnn.nnnnnnn
1 INTRODUCTION
Many emerging smart video analytics applications, such as traf-
c state detection, health monitoring, surveillance and assistive
technology require fast processing and real-time response to work
eectively. Such applications in built environment monitoring rely
on deep learning-based object detection models as a core part of
their processing pipeline. These models are compute-intensive and
tend to have large memory requirements.
Work done while at Microsoft Research
Features Our
Approach
Glimpse
[10]
Marlin
[2]
Edge-Ast.
[24]
detection at edge
detection at cloud
n/w variability
resilience ✓ ✓
Table 1: A comparison of our approach with existing video
analytics techniques.
Prior works have looked at ooading object detection to the
cloud [
10
,
24
]. By transferring data, the inference is either entirely
or partially ooaded to make use of the compute available in the
cloud. However, sending vast quantities of data to the cloud often in-
creases latency, making it unsuitable for near real-time analysis. For
intelligent drones [
17
] or smartphone based driver assistance [
5
]
to be practical, object detection is needed at low latency without
missing any objects. Thus, we believe that improving foundational
real-time vision tasks in a manner that is informed by systems con-
siderations would have a benecial impact on all these applications.
Edge computing has emerged as an approach to address the la-
tency issue with cloud infrastructure. Small form-factor hardware
that are low-cost and consume lower power are often suited for such
scenarios. But these often fall short of the heavy computing needs
of deep learning models. As such, there has been signicant focus
on special-purpose devices — e.g., Nvidia Jetson, Google Coral —
optimized to run specic DNN workloads. While edge accelerators
provide improved performance over a general-purpose edge com-
puting platform, they are still limited in their support
compared
to cloud-based GPUs. Further, due to system constraints, these ap-
proaches run smaller and quantized models at the edge, with lower
accuracy, compared to the larger models, with signicantly higher
accuracy, run on the cloud [15].
In this paper, we seek to answer the following research question:
Can we have the best of both worlds, i.e., the low latency of the edge
models and the high accuracy of the cloud models? In contrast to
cloud-only and edge-only approaches, our key idea is to employ
edge-based and cloud-based models in tandem with the cloud re-
sources accessible over a wide-area network that may have high
latency. By having redundant computation of object detections, we
Google Coral only supports integer (INT8) operations. Support for some special-
ized DNN layers/operations is not available in Jetson devices for FLOAT16 and INT8
operations.
1
arXiv:2210.01402v1 [cs.CV] 4 Oct 2022
Figure 1: Illustrates the ecacy of asynchronous cloud re-
sponse to improve edge performance. Note that objects are
undetected on edge but detected in the cloud. Thus, cloud
responses can be cascaded to improve system performance.
can use cloud-based inferences asynchronously to course correct
edge-based inferences, thereby improving accuracy without sacri-
cing latency. Table 1 distinguishes our work from the prior work
involving cloud-only and edge-only approaches.
We exploit this arbitrage between cloud and edge as the perfor-
mance disparity would remain for years ahead. Past works [
15
,
36
]
in the computer vision community have proposed using model
ensemble approaches. However, they combine detections from dif-
ferent models with comparable performance and do so on the same
frame without latency considerations.
REACT
’s novel fusion algo-
rithm in contrast combines higher accuracy cloud-based detections
on recent frames with current inference on the less-accurate edge
detections while removing irrelevant stale results from the cloud.
Figure 1 illustrates how redundant computation helps improve
overall accuracy for object detection. The models detect people on
a ood-aected riverbank area collected from an intelligent drone
at two dierent points in time. As shown, a cloud-based detection
model achieves higher accuracy but comes with signicant latency,
wherein the results of a frame sent at
𝑡=
0are obtained at
𝑡=𝑘
. On
the other hand, the edge-based detection model has lower accuracy,
as several humans are not detected. Note that at
𝑡=𝑛
, even though
the scene has changed, some people are still common across the
current and previous frames. However, the edge model still does not
detect these people. Moreover, edge results may be false positives.
Thus, we use cloud-based models to improve the overall accuracy by
considering detections from the accurate cloud model at time
𝑘<𝑛
and merging these with the frame at
𝑡=𝑛
on the edge. We note that
this merge operation is not trivial. We need to consider cases where
both detectors don’t agree with each other. Moreover, combining
results will not work if the edge receives a cloud response after all
the objects of interest within the frame change. It is necessary to
ensure that approaches must work in highly dynamic environments,
where objects of interest change frequently.
In this paper, we describe
REACT
— our system that builds on
these intuitions to exploit cloud’s accuracy with the low latency of
the edge. Below, are our contributions.
REACT System Design:
We designed an edge-cloud video pipeline
system capable of exploiting the performance gap of object detec-
tion models between the cloud and the edge. Our approach is de-
signed to scale to multiple edge devices and is resilient to network
variability. Finally, we develop APIs that edge-based systems can
use to leverage cloud-based models and improve overall accuracy.
Edge-Cloud Fusion Algorithm:
We develop a novel fusion al-
gorithm that combines predictions from edge and cloud object
detection models to achieve higher accuracy than edge-only and
cloud-only scenarios. To the best of our knowledge, we are the rst
to leverage redundant computations to improve the accuracy of
on-edge object detection.
Real-world Evaluation:
We evaluate
REACT
on two challenging
real-world datasets — data collected from car dashcams [
8
] and
drones [
35
]. These datasets span dierent cities and exhibit high
variations in scene characteristics and dynamics. Our results show
REACT
can signicantly improve accuracy by 50% over baseline
methods. Further,
REACT
can tradeo edge and cloud computation
while maintaining the same level of accuracy. For instance, by
reducing the edge detection frequency by a fourth (from every
5th frame to every 20th frame) and increasing cloud frequency
(from every 100th frame to 30th frame),
REACT
can achieve similar
accuracy.
Scalability and Resilience Analysis:
We analyze the scalability
of our approach and show
REACT
can support 60+ concurrent edge
devices on a single machine with a server-class GPU. We also show
that
REACT
is resilient to network variability. That is, it can function
on varying network conditions and leverages cloud models when
feasible. We evaluate
REACT
over dierent network types (WiFi and
LTE) with varying latency using a network emulator. Our results
show that even with varying response latency from the cloud,
REACT
performs better than the edge-only scenario,
2 BACKGROUND
In this section, we provide background on video-based applications
and challenges in cloud or edge-based video analytics applications.
Video-analytics systems collect rich visual information that of-
fers insights into the environment. These systems can be broadly
categorized as: (i) devices that send all video to the cloud for pro-
cessing, and (ii) devices that have limited processing capabilities
constrained by its small form-factor, cost, or energy. In this case, the
video processing can be split between the device and the cloud. That
is, the device can perform either some or possibly all the processing
before it sends the video to the cloud. Deep learning inference for
object detection forms the core aspect of such systems.
Since deep learning is compute-intensive, existing systems typi-
cally send data to the cloud for processing. However, cloud analysis
may incur signicant delays and may be unsuitable for live applica-
tions. Edge computing has emerged as an alternative to complement
the cloud, where data processing is done close to the devices to
avoid these delays. A variety of edge computing architectures exist,
depending on where the edge servers are located relative to the end-
devices [
31
]. Our work assumes the edge device is of low latency,
and limited computing capabilities, such as hubs in smart homes,
routers, and mobile phones and IoT devices such as intelligent
drones and wearable VR headsets. We assume that some form of
resource constrained AI-based workloads can be run on these edge
devices. Modern devices like Raspberry Pi or Jetson are devices are
capable of running lightweight models [
30
] with a smaller memory
footprint. Pairing specialized accelerators (such as Google Coral or
2
Intel Movidius) speeds up the inference time of small models with-
out aecting accuracy for a class of model. Unfortunately, larger
deep learning models (having higher accuracy than smaller models)
are still not within the latency and memory budget of these devices.
Larger models require cloud GPU resources, but this comes at the
cost of network delays. This is unacceptable for live and stream-
ing applications. In summary, edge processing provides a latency
advantage but there remains a signicant accuracy gap between
real-time prediction on an edge device and oine prediction in a
resource-rich setting [
20
]. Our goal in
REACT
is to leverage cloud
processing in tandem with edge processing to bridge the accuracy
gap while preserving the latency advantage of edge processing.
3 REACT DESIGN
For real-time edge inference, we propose a system that uses an
edge-cloud architecture while retaining the low latency of edge
devices but achieving higher accuracy than an edge-only approach.
In this section, we discuss how we leverage the cloud models to
inuence and improve edge results.
Basic Approach:
It is known that video frames are spatiotem-
porally correlated. Typically, it is sucient to invoke edge object
detection once every few frames. As illustrated in Figure 2(a), edge
detection runs every 5th frame. As shown in the Figure, to interpo-
late the intermediate frames, a comparatively lightweight operation
of object tracking can be employed. Additionally, to improve the
accuracy of inference, select frames are asynchronously transmitted
to the cloud for inference. Depending on network conditions (RTT,
bandwidth, etc.) and the cloud server conguration (GPU type,
memory, etc.), cloud detections are available to the edge device
only after a few frames. The newer cloud detections, which were
previously undetected, can be brought to the current frame using
another instance of an object tracker running on the past buered
images. Video frames retain the spatial and temporal context de-
pending on scene and camera dynamics. Our key insight is that
these asynchronous detections from the cloud can help improve
overall system performance as the scene usually does not change
abruptly. See Figure 2(b) for a visual result of the approach.
Challenges:
Nevertheless, designing a system that utilizes the
above approach would require addressing several challenges. First,
combining the detections from two sources, i.e., local edge detec-
tions and the delayed cloud detections is not straightforward. Each
of these two detections contain separate list of objects represented
by a
class_label,bounding_box,condence_score
tuple. A fusion al-
gorithm must consider several cases – such as class label mismatch,
misaligned bounding boxes, etc. – to consolidate the edge and cloud
detections into a single list. Second, some or all of the cloud objects
may be “stale”, outside the current edge frame. The longer it takes
to perform fusion, the greater the risk of such staleness, especially
if the scene changes rapidly. Thus, to minimize this risk, once the
old cloud annotations are received, they must be quickly processed
at the edge to help with the current frame.
Another challenge when running detection models on live videos
at the edge is minimizing resource utilization while maintaining
detection accuracy. Previous studies with edge-only detection sys-
tems have shown that running a deep neural network (DNN) for
every frame in a video can drain system resources (e.g., battery)
quickly [
2
]. In our case, with a distributed edge-cloud architecture,
several resource constraints need to be simultaneously considered.
For example, cloud detections are more accurate as one can run
computationally expensive models with access to server-class GPU
resources. However, bandwidth constraints or a limited cloud bud-
get might restrict their use to once every few frames. Moreover, if
the scene change is insignicant, it would be prudent not to invoke
object detections at the edge and the cloud. On the contrary, for
more dynamic scenes, increasing the frequency of edge detection
might result in excessive heat generation from the modest GPUs
used on edge devices leading to throttling.
Next, we present our system called
REACT
, which overcomes the
above challenges. Primarily,
REACT
consists of three components
– i)
REACT
Edge Manager, ii) Cloud-Edge Fusion Unit, iii)
REACT
Model Server. Below, we describe them in more detail.
3.1 REACT Edge Manager
The
REACT
Edge Manager (REM) consists of dierent modules, and
put together, enables fast and accurate object detection at the edge.
Change detector:
Previous studies have shown that running a ob-
ject detection on every frame in a video can drain system resources
(e.g., battery) quickly [
2
]. REM provides two parameters, i.e., the
detection frequency at the edge (
𝑘
) and the cloud (
𝑚
) – to modu-
late the number of frames between object detection. Intuitively, if
there is little object displacement across frames, running detection
models frequently will lead to wastage of resources. REM employs
a change detector that computes the optical ow on successive
frames. This represents the relative motion of the scene consisting
of objects and the camera, similar to [
2
,
10
,
18
]. Thus, the object
detection invocations will only occur at a detection frequency of
every
𝑘𝑡
and
𝑚𝑡
frame at the edge and the cloud, respectively, if
this motion is greater than a pre-decided threshold.
Edge Object Detector:
Every
𝑘𝑡
frame, REM triggers the edge
object detector module, which in turn outputs a list of
𝑙, 𝑝, 𝑐
tuples.
Here,
𝑙
and
𝑐
are class labels (e.g., cars, person) and condence scores
(between 0 and 1) associated with the detected objects, respectively.
𝑝=(𝑥, 𝑦, 𝑤, ℎ)
represents the bounding box for each of the detected
objects, where
𝑥, 𝑦
is the center coordinate of the object;
𝑤, ℎ
is the
width and height of the bounding box. To avoid multiple bounding
boxes for the same object, we use Non-max suppression, which
removes locally repeated detections.
Main Object tracker:
REM employs an CPU-based object tracker,
a computationally cheaper technique, between frames for which
the object detections are available. For example, a CSRT [
25
] tracker
can process images at
>
40 fps (on Nvidia Jetson Xavier). However,
as the quantum of associated displacement of objects increases, the
tracker accuracy also reduces. The tracker module accounts for
this degradation by multiplying every tracked object’s condence
scores by a decay rate
𝛿∈ [
0
,
1
]
. As the condence scores reduce
with every passing frame with this multiplier, the module sweeps
over the list of objects to discard the ones with lower condence
scores (i.e., 𝑐<0.5).
Cloud communicator:
The REM consists of a communication
module responsible for sending every
𝑚𝑡
frame (cloud detection
frequency) to the cloud and receive the associated output annota-
tions. Similar to edge detections, the cloud annotations consist of a
3
摘要:

StreamingVideoAnalyticsOnTheEdgeWithAsynchronousCloudSupportAnuragGhosh∗CarnegieMellonUniversityPittsburgh,PA,USAanuraggh@andrew.cmu.eduSrinivasanIyengarMicrosoftResearchBangalore,Indiasriyengar@microsoft.comStephenLeeUniversityofPittsburghPittsburgh,PA,USAstephen.lee@pitt.eduAnujRathore∗ClutterbotB...

展开>> 收起<<
Streaming Video Analytics On The Edge With Asynchronous Cloud Support.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:7.87MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注