Resolving Class Imbalance Problem for LiDAR-based Object Detector by Balanced Gradients and Contextual Ground Truth Sampling Daeun Lee1 Jongwon Park2 and Jinkyu Kim1

2025-04-29 0 0 2.38MB 10 页 10玖币
侵权投诉
Resolving Class Imbalance Problem for LiDAR-based Object Detector
by Balanced Gradients and Contextual Ground Truth Sampling
Daeun Lee1, Jongwon Park2, and Jinkyu Kim1,
1Department of Computer Science and Engineering, Korea University
2Autonomous Driving Center, Hyundai Motor Company R&D Division
Correspondence: jinkyukim@korea.ac.kr
Abstract
An autonomous driving system requires a 3D object de-
tector, which must perceive all present road agents reli-
ably to navigate an environment safely. However, real-
world driving datasets often suffer from the problem of data
imbalance, which causes difficulties in training a model
that works well across all classes, resulting in an unde-
sired imbalanced sub-optimal performance. In this work,
we propose a method to address this data imbalance prob-
lem. Our method consists of two main components: (i)
a LiDAR-based 3D object detector with per-class multiple
detection heads where losses from each head are modified
by dynamic weight average to be balanced. (ii) Contex-
tual ground truth (GT) sampling, where we improve con-
ventional GT sampling techniques by leveraging semantic
information to augment point cloud with sampled ground
truth GT objects. Our experiment with KITTI and nuScenes
datasets confirms our proposed method’s effectiveness in
dealing with the data imbalance problem, producing better
detection accuracy compared to existing approaches.
1. Introduction
LiDAR-based detectors have been widely adopted in the
autonomous driving system for capturing 3D scene percep-
tion and understanding. Such an autonomous driving sys-
tem must detect all possible other road agents (or objects)
to navigate an environment safely. Thus, a reliable LiDAR-
based detector requires dealing equally with different road
agents (or objects), e.g., cars, cyclists, barriers, or construc-
tion vehicles.
However, real-world driving datasets (e.g., KITTI and
nuScenes) suffer from the problem of imbalance where a
dataset contains unequal (or severely skewed) class dis-
tribution. As shown in Figure 1, objects such as cars
(42.63%) have a higher percentage compared to the per-
centage of other classes, such as bicycles (1.03%), motor-
cycles (1.11%), or construction vehicles (1.39%). Similarly,
in the KITTI dataset, cars (82.99%) have the majority of in-
stances, while pedestrians (12.76%) or cyclists (4.24%) are
underrepresented. Such data imbalance would cause diffi-
culties in training a 3D object detector that reliably works
well across all different classes, resulting in an undesired
imbalanced quality.
Multi-task learning techniques have been applied to ad-
dress this data imbalance problem by viewing multi-class
joint detection as multi-task learning. In this work, we ex-
plore applying such multi-task learning techniques to ad-
dress the data imbalance problem in the LiDAR-based 3D
object detection task. Specifically, we focus on answering
two key questions: (i) constructing multi-task network ar-
chitecture and (ii) balancing feature sharing across different
tasks. For (i), we use per-class multiple detection heads
instead of a single head. Each detection head is encour-
aged to learn class-specific features while sharing a back-
bone, which is trained to extract universal features. For (ii),
we explore applying existing multi-task loss balancing tech-
niques to improve the overall performance of different de-
tection heads. Specifically, we apply Dynamic Weight Av-
erage (DWA, [16]) that tunes gradients for different object
categories based on the rate of loss changes for each head
to learn average task weighting over time. We empirically
observe that combining multi-headed architecture and gra-
dient balancing techniques significantly improves detection
accuracy.
Another story is data augmentation, which can make
class distribution smoother by making the model sees rare
classes more often during training. Conventionally, ground
truth (GT) sampling has been widely used. GT sampling
collects all ground truth points inside the labeled bound-
ing box into a database, and some of them are randomly
introduced to the current training frame via concatenation.
However, this does not consider where to place these ob-
jects. We, in fact, observe ground truth points are often
introduced in a random position where that object is rarely
observed in the real world. Thus, we propose contextual
GT sampling that leverages semantic scene information to
arXiv:2210.03331v1 [cs.CV] 7 Oct 2022
Car
Pedestrian
Barrier
Truck
Traffic Cone
Trailer
Bus
Construction Vehicle
Motorcycle
Bicycle
Car
Ped
Cyclist
50%
40%
30%
20%
10%
Amount of Bounding Boxes
Amount of Bounding Boxes
100%
80%
60%
40%
20%
nuScenes dataset
Class Class
KITTI dataset
Figure 1. Class distributions for two 3D object detection datasets:
nuScenes [1] (left) and KITTI [9] (right).
present ground truth points in a more natural position, e.g.,
a sidewalk for pedestrians. Our experiment shows that our
contextual GT sampling provides extra performance gain,
especially for minor classes.
Our approach is mostly close to Zhu et al. [36] (CBGS)
in that they also use multiple detection heads and data aug-
mentation techniques, i.e., GT sampling [28]. However, our
work differs from it as follows: (i) we explore using multi-
task learning techniques, including multiple detection heads
with loss balancing techniques, to improve overall detection
performance across all categories. CBGS focused on utiliz-
ing multi-headed architecture with a uniform scaling, which
minimizes a uniformly weighted sum and does not consider
dynamically modifying weights like ours. (ii) we propose
contextual GT sampling, which addresses issues with con-
ventional GT sampling and results in better detection accu-
racy.
We summarize our contributions as follows:
Inspired by multi-task learning, we propose a multi-
headed LiDAR-based 3D object detector where losses
for each head are balanced by dynamic weight average
(DWA).
Combined with multi-headed architecture, we propose
contextual ground truth sampling, which improves
conventional ground truth (GT) sampling by leverag-
ing semantic scene information to introduce GT ob-
jects in a more realistic position.
We conduct various experiments to demonstrate the
effectiveness of our proposed approach with widely-
used public datasets: KITTI and nuScenes. Our exper-
iments show that multi-task learning techniques com-
bined with our contextual GT sampling significantly
improve the overall detection performance, especially
for minor classes.
2. Related Work
2.1. 3D Object Detection
A landmark work in the LiDAR-based 3D object detec-
tion is VoxelNet [35], an end-to-end trainable model that
first voxelized a point cloud, and each equally spaced voxel
is encoded as a descriptive volumetric representation. Given
these features, conventional 2D convolutions are used to
generate and regress its region proposals. Yan et al. [28]
used sparse 3D convolutions to accelerate heavy compu-
tations of earlier LiDAR-based works. PointPillars [15]
is another landmark work that speeds up the encoding of
3D volumetric representation by dividing the 3D space
into pillars (instead of voxels). A more sophisticated ar-
chitecture is also used to achieve better detection results.
PointRCNN [22] used a two-stage architecture to refine
the initial 3D bounding box proposals. Part-A2 [23] fo-
cuses on leveraging intra-object parts for better results. PV-
RCNN [20] and PV-RCNN++ [21] simultaneously process
coarse-grained voxels and the raw point cloud. Recently,
CenterPoint [31] applied a key-point detector that predicts
the geometric center of objects. Similarly, Voxel RCNN [5]
used coarse voxel granularity to reduce the computation
cost, retaining the overall detection performance. In this
work, we focus on improving data imbalance problems in
LiDAR-based object detection. Thus, we do not claim a
novel 3D object detector; rather, we rely on existing land-
mark work PointPillars [15], PV-RCNN [20], and Voxel
RCNN [5] to demonstrate the effectiveness of our proposed
approach. Note that, ideally, our approach is applicable to
others as well.
Lidar Points Augmentations. Data augmentation has been
widely applied to LiDAR-based 3D object detection for var-
ious reasons: (i) improving point cloud quality by upsam-
pling a low-density point cloud [30,32] or by point cloud
completion for occluded regions [2,27,29,33]. (ii) Improv-
ing the robustness of object detection by global and local
augmentations. Choi et al. [4] randomly augmented sub-
partitions of GT objects (e.g., dropping points in a certain
sub-partition) [4]. Zheng et al. [34] divided each ground
truth object into six (inward facing) pyramids, then aug-
mented them with random dropout, swap, and sparsifying
operations. (iii) Improving generalization power by aug-
menting clear weather point clouds with adverse conditions
via physical modelings, such as fog [13] or snowfall [12].
(iv) Augmenting LiDAR-based features with other modal-
ities, such as images [25,26]. (v) Smoothing class dis-
tribution by sampling ground truth objects from the (of-
fline) database and introducing them to the current scene
(GT sampling, [28]). In this work, similar to (v), we focus
on smoothing the density of each class to address the data
imbalance problem (i.e., improving the detection accuracy
of rare objects while maintaining that of common objects).
摘要:

ResolvingClassImbalanceProblemforLiDAR-basedObjectDetectorbyBalancedGradientsandContextualGroundTruthSamplingDaeunLee1,JongwonPark2,andJinkyuKim1;1DepartmentofComputerScienceandEngineering,KoreaUniversity2AutonomousDrivingCenter,HyundaiMotorCompanyR&DDivisionCorrespondence:jinkyukim@korea.ac.krAbs...

展开>> 收起<<
Resolving Class Imbalance Problem for LiDAR-based Object Detector by Balanced Gradients and Contextual Ground Truth Sampling Daeun Lee1 Jongwon Park2 and Jinkyu Kim1.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:2.38MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注