Enabling ISP-less Low-Power Computer Vision Gourav Datta Zeyu Liu Zihan Yin Linyu Sun Akhilesh R. Jaiswal Peter A. Beerel Universiy of Southern California Los Angeles USA

2025-04-24 0 0 3.8MB 10 页 10玖币
侵权投诉
Enabling ISP-less Low-Power Computer Vision
Gourav Datta, Zeyu Liu, Zihan Yin, Linyu Sun, Akhilesh R. Jaiswal, Peter A. Beerel
Universiy of Southern California, Los Angeles, USA
{gdatta, liuzeyu, zihanyin, linyusun, akhilesh, pabeerel}@usc.edu
Abstract
Current computer vision (CV) systems use an image sig-
nal processing (ISP) unit to convert the high resolution raw
images captured by image sensors to visually pleasing RGB
images. Typically, CV models are trained on these RGB im-
ages and have yielded state-of-the-art (SOTA) performance
on a wide range of complex vision tasks, such as object de-
tection. In addition, in order to deploy these models on
resource-constrained low-power devices, recent works have
proposed in-sensor and in-pixel computing approaches that
try to partly/fully bypass the ISP and yield significant band-
width reduction between the image sensor and the CV pro-
cessing unit by downsampling the activation maps in the
initial convolutional neural network (CNN) layers. How-
ever, direct inference on the raw images degrades the test
accuracy due to the difference in covariance of the raw im-
ages captured by the image sensors compared to the ISP-
processed images used for training. Moreover, it is difficult
to train deep CV models on raw images, because most (if
not all) large-scale open-source datasets consist of RGB im-
ages. To mitigate this concern, we propose to invert the ISP
pipeline, which can convert the RGB images of any dataset
to its raw counterparts, and enable model training on raw
images. We release the raw version of the COCO dataset,
a large-scale benchmark for generic high-level vision tasks.
For ISP-less CV systems, training on these raw images re-
sult in a 7.1% increase in test accuracy on the visual wake
works (VWW) dataset compared to relying on training with
traditional ISP-processed RGB datasets. To further improve
the accuracy of ISP-less CV models and to increase the en-
ergy and bandwidth benefits obtained by in-sensor/in-pixel
computing, we propose an energy-efficient form of ana-
log in-pixel demosaicing that may be coupled with in-pixel
CNN computations. When evaluated on raw images cap-
tured by real sensors from the PASCALRAW dataset, our ap-
proach results in a 8.1% increase in mAP. Lastly, we demon-
strate a further 20.5% increase in mAP by using a novel ap-
plication of few-shot learning with thirty shots each for the
novel PASCALRAW dataset, constituting 3 classes.
1. Introduction
Modern high-resolution cameras generate huge amount
of visual data arranged in the form of raw Bayer color fil-
ter arrays (CFA), also known as a mosaic pattern, as shown
in Fig. 1, that need to be processed for downstream CV
tasks [43, 1]. An ISP unit, consisting of several pipelined
processing stages, is typically used before the CV process-
ing to convert the raw mosaiced images to RGB counter-
parts [20, 42, 26, 29]. The ISP step that converts these
single-channel CFA images to three-channel RGB images
is called demosaicing. Historically, ISP has been proven to
be extremely effective for computational photography ap-
plications, where the goal is to generate images that are
aesthetically pleasing to the human eye [29, 8]. How-
ever, is it important for high-level CV applications, such
as face detection by smart security cameras, where the sen-
sor data is unlikely to be viewed by any human? Exist-
ing works [42, 20, 26] show that most ISP steps can be
discarded with a small drop in the test accuracy for large-
scale image recognition tasks. The removal of the ISP
can potentially enable existing in-sensor [31, 10, 2] and in-
pixel [5, 27, 12, 13, 14] computing paradigms to process
CV computations, such as CNNs partly in the sensor, and
reduce the bandwidth and energy incurred in the data trans-
fer between the sensor and the CV system. Moreover, most
low-power cameras with a few MPixels resolution, do not
have an on-board ISP [3], thereby requiring the ISP to be
implemented off-chip, increasing the energy consumption
of the total CV system.
Although the ISP removal can facilitate model deploy-
ments in resource-constrained edge devices, one key chal-
lenge is that most large-scale datasets, that are used to train
CV models, are ISP-processed. Since there is a large co-
variance shift between the raw and RGB images (please see
Fig. 1 where we show the histogram of the pixel inten-
sity distributions of RGB and raw images), models trained
on ISP-processed RGB images and inferred on raw im-
ages, thereby removing the ISP, exhibit a significant drop
in the accuracy. One recent work has leveraged train-
able flow-based invertible neural networks [44] to convert
raw to RGB images and vice-versa using open-source ISP
datasets. These networks have recently yielded SOTA test
1
arXiv:2210.05451v1 [cs.CV] 11 Oct 2022
Figure 1. Difference in frequency distributions of pixel intensities
between mosaiced raw, demosaiced, and ISP-processed images.
performance in photographic tasks, which we propose to
modify to invert the ISP pipeline, and build the raw ver-
sion of any large-scale ISP processed database for high-
level vision applications, such as object detection. This
raw dataset can then be used to train CV models that can
be efficiently deployed on low-power edge devices without
any of the ISP steps, including demosaicing. To further im-
prove the performance of these ISP-less models, we pro-
pose a novel hardware-software co-design approach, where
a form of demosaicing is applied on the raw mosaiced im-
ages inside the pixel array using analog summation during
the pixel read-out operation, i.e., without a dedicated ISP
unit. Our models trained on this demosaiced version of the
visual wake words (VWW) lead to a 8.2% increase in the
test accuracy compared to standard training on RGB images
and inference on raw images (to simulate the ISP removal
and the in-pixel/in-sensor implementation). Even compared
to standard RGB training and inference, our models yield
0.7% (1.6%) higher accuracy (mAP) on the VWW (COCO)
dataset. Lastly, we propose a novel application of few-shot
learning to improve the accuracy of real raw images cap-
tured directly by a camera (which has limited number of
annotations) with our generated raw images constituting the
base dataset.
The key contributions of our paper can be summarized
as follows.
• Inspired by the energy and bandwidth benefits ob-
tained by in-sensor computing approaches and the re-
moval of most ISP steps in a CV pipeline, we present
and release a large-scale raw image database that can
be used to train accurate CV models for low-power
ISP-less edge deployments. This dataset is generated
by reversing the entire ISP pipeline using the recently
proposed flow-based invertible neural networks and
custom mosaicing. We demonstrate the utility of this
dataset to train ISP-less CV models with raw images.
To improve the accuracy obtained with raw images, we
propose a low-overhead form of in-pixel demosaicing
that can be implemented directly on the pixel array
alongside other CV computations enabled by recent
paradigms of in-pixel/in-sensor computing approaches
and that also reduces the data bandwidth.
We present a thorough evaluation of our approach with
both simulated (our released dataset) and real (cap-
tured by a real camera) raw images, for a diverse range
of use-cases with different memory/compute budgets.
To improve the accuracy of real raw images, we pro-
pose a novel application of few-shot learning, with the
simulated raw images having a large number of la-
belled classes constituting the base dataset.
2. Related Works
2.1. ISP Reversal & Removal
Since most ISP steps are irreversible, and depend on the
camera manufacturer’s proprietary color profile [6], it is
difficult to invert the ISP pipeline. To mitigate this chal-
lenge, a few recent works [25, 32, 46] proposed learning-
based methods, but they result in large losses and the re-
covered RAW images may be significantly different from
the originals captured by the camera. To reduce this loss,
a more recent work [44] used a stack of kinvertible and
bijective functions f=f1·f2·..fkto invert the ISP
pipeline. For a raw input x, the RGB output yand the in-
verted raw input xis computed as y=f1f2..fk(x)
and x=f1
kf1
k1..f1
1(y).
The bijective function fiis implemented through affine
coupling layers [44]. In each affine coupling layer, given a
Ddimensional input mand d<D, the output nis
n1:d=m1:d+r(md+1:D)(1)
nd+1:D=md+1:Dexp(s(m1:d)) + t(m1:d)(2)
where sand trepresent scale and translation functions from
Rdto RDdthat are realized by neural networks, repre-
sents the Hadamard product, and rrepresents an arbitrary
function from RDdto Rd. The inverse step is
md+1:D= (nd+1:Dt(n1:d)) exp(s(n1:d)) (3)
m1:d=n1:dr(md+1:D)(4)
The authors then utilize invertible 1×1convolution, pro-
posed in [23], as the learnable permutation function to revert
the channel order for the subsequent affine coupling layer.
Recent works have also investigated the role of the
ISP in image classification and the impact of its’ re-
moval/trimming on accuracy for energy and bandwidth ben-
efits. For example, [20] demonstrated that removal of the
whole ISP during edge inference results in a 8.6% loss in
accuracy with MobileNets [36] on ImageNet [15], which
can mostly be recovered by using just the tone-mapping
stage. Another work [42] attempted to integrate the ISP
and CV processing using tone mapping and feature-aware
downscaling blocks that reduce both the number of bits
2
摘要:

EnablingISP-lessLow-PowerComputerVisionGouravDatta,ZeyuLiu,ZihanYin,LinyuSun,AkhileshR.Jaiswal,PeterA.BeerelUniversiyofSouthernCalifornia,LosAngeles,USAfgdatta,liuzeyu,zihanyin,linyusun,akhilesh,pabeerelg@usc.eduAbstractCurrentcomputervision(CV)systemsuseanimagesig-nalprocessing(ISP)unittoconvertthe...

展开>> 收起<<
Enabling ISP-less Low-Power Computer Vision Gourav Datta Zeyu Liu Zihan Yin Linyu Sun Akhilesh R. Jaiswal Peter A. Beerel Universiy of Southern California Los Angeles USA.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:3.8MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注