IDD-3D Indian Driving Dataset for 3D Unstructured Road Scenes Shubham Dokania1 A. H. Abdul Hafez2 Anbumani Subramanian1 Manmohan Chandraker3 C.V . Jawahar1

2025-05-08 0 0 4.61MB 10 页 10玖币
侵权投诉
IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes
Shubham Dokania1, A. H. Abdul Hafez2, Anbumani Subramanian1,
Manmohan Chandraker3, C.V. Jawahar1
1IIIT Hyderabad, 2Hasan Kalyoncu University, 3UC San Diego
shubham.dokania@research.iiit.ac.in, abdul.hafez@hku.edu.tr,
anbumani@iiit.ac.in, mkchandraker@eng.ucsd.edu, jawahar@iiit.ac.in
Abstract
Autonomous driving and assistance systems rely on an-
notated data from traffic and road scenarios to model and
learn the various object relations in complex real-world
scenarios. Preparation and training of deploy-able deep
learning architectures require the models to be suited to
different traffic scenarios and adapt to different situations.
Currently, existing datasets, while large-scale, lack such di-
versities and are geographically biased towards mainly de-
veloped cities. An unstructured and complex driving layout
found in several developing countries such as India poses a
challenge to these models due to the sheer degree of varia-
tions in the object types, densities, and locations. To facili-
tate better research toward accommodating such scenarios,
we build a new dataset, IDD-3D, which consists of multi-
modal data from multiple cameras and LiDAR sensors with
12k annotated driving LiDAR frames across various traf-
fic scenarios. We discuss the need for this dataset through
statistical comparisons with existing datasets and highlight
benchmarks on standard 3D object detection and tracking
tasks in complex layouts. Code and data available 1.
1. Introduction
Intelligent vehicles and autonomous driving systems
have come a long way and keep becoming more sophisti-
cated over time, owing to the rapid progress in the deep
learning and computer vision. However, the core compo-
nent for all these increments is the availability of high-
quality annotated data. Recently, many works have fo-
cused on data selection and quality improvement [34, 8,
47], building high-quality and large-scale datasets, and ap-
proaches built using these resources, which improve the
state of autonomous driving [48, 16].
Existing datasets are usually collected in well-structured
environments with proper traffic regulations and relatively-
1https://github.com/shubham1810/idd3d_kit.git
Figure 1. Some examples from the dataset showing different traffic
scenarios, LiDAR data with annotations, and a sample of LiDAR
point clouds projected on camera data.
evenly distributed traffic. In such situations, crowd behavior
demonstrates low diversity and average densities. In south-
east Asian countries, such as India, the traffic densities and
inter-object behaviors are much more complex. Such com-
plexities have been studied in the past [39, 5, 4], but ex-
tensive data coverage and multi-modal systems are still un-
available for such scenes. It hence may not be entirely ap-
plied to cases where the distribution of object categories and
types varies greatly.
In this paper, we propose a dataset on complex unstruc-
tured driving scenarios with multi-modal data, highlighting
the capabilities of 3D sensors such as LiDAR for better
scene perception in unstructured and sporadically chaotic
traffic conditions. In the proposed dataset, we highlight a
significantly different distribution of object types and cat-
egories compared to existing datasets collected in Euro-
pean or similar settings [24, 13, 38], due to the different
nature of traffic scenes in Indian roads. Furthermore, the
categories and annotations available in the proposed dataset
vary greatly from existing datasets. Specifically, they cover
objects in scenes that usually appear in still-developing
1
arXiv:2210.12878v1 [cs.CV] 23 Oct 2022
Figure 2. Samples from the dataset highlighting different (a) RGB images and (b) LiDAR Bird-Eye-View (BEV) along with bounding box
annotations. The samples visualized above are taken from different sequences of the dataset.
cities, for example, Auto-rickshaws, hand carts, concrete
mixer machines on roads, and animals on roads.
We provide data collected in Indian road scenes, from
high-quality LiDAR sensors and six cameras that cover
the surrounding area of the ego-vehicle to enable sensor-
fusion-based applications. We provide annotations for
15.5k frames in the dataset, which spans 10 primary cat-
egories (and 7 additional miscellaneous categories), which
we use for model training and evaluation. Along with the
annotations, we also provide extra unlabelled raw data from
the sensors to facilitate further research, especially into
self- and unsupervised learning over such traffic scenes. A
unique feature of the proposed dataset, which stems from
the unstructured environment, is the availability of highly
complex trajectories. We show samples from the dataset
which emphasize such cases and display experiments on ob-
ject detection and tracking, which is possible due to avail-
ability of instance specific labels for each object bounding
box per sequence.
Our main contributions can be summarised as follows:
(i) We propose the IDD-3D dataset for driving in unstruc-
tured traffic scenarios for Indian roads with 3D informa-
tion, (ii) high-quality annotations for 3D object bounding
boxes with 9DoF data, and instance IDs to enable tracking,
(iii) Analysis over highly unstructured and diverse environ-
ments to accentuate the usefulness of proposed dataset, and
(iv) provide 3D object detection and tracking benchmarks
across popular methods in literature.
2. Related Work
Data plays a huge role in machine learning systems, and
in this context, for autonomous vehicles and scene percep-
tion. There have been several efforts over the years in this
area to improve the state of datasets available and towards
increasing the volumes of high-quality and well annotated
datasets.
2D Driving: One of the early datasets towards visual per-
ception and understanding driving has been the CamVid
[2] and Cityscapes [9, 10] dataset, providing annotations
for semantic segmentation and enabling research in deeper
scene understanding at pixel-level. KITTI [14, 15] dataset
provided 2D object annotations for detection and tracking
along with segmentation data. However, fusion of multi-
ple modalities such as 3D LiDAR data enhances the perfor-
mance for scene understanding benchmarks as these pro-
vide a higher level of detail of a scene when combined with
available 2D data. This multi-modal sensor-fusion based
direction has been the motivation for the proposed dataset
to alleviate the discrepancies in existing datasets for scene
2
摘要:

IDD-3D:IndianDrivingDatasetfor3DUnstructuredRoadScenesShubhamDokania1,A.H.AbdulHafez2,AnbumaniSubramanian1,ManmohanChandraker3,C.V.Jawahar11IIITHyderabad,2HasanKalyoncuUniversity,3UCSanDiegoshubham.dokania@research.iiit.ac.in,abdul.hafez@hku.edu.tr,anbumani@iiit.ac.in,mkchandraker@eng.ucsd.edu,jawah...

展开>> 收起<<
IDD-3D Indian Driving Dataset for 3D Unstructured Road Scenes Shubham Dokania1 A. H. Abdul Hafez2 Anbumani Subramanian1 Manmohan Chandraker3 C.V . Jawahar1.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:4.61MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注