Resolving Class Imbalance Problem for LiDAR-based Object Detector by Balanced Gradients and Contextual Ground Truth Sampling Daeun Lee1 Jongwon Park2 and Jinkyu Kim1

2025-04-29 0 0 2.38MB 10 页 10玖币

侵权投诉

Resolving Class Imbalance Problem for LiDAR-based Object Detector

by Balanced Gradients and Contextual Ground Truth Sampling

Daeun Lee1, Jongwon Park2, and Jinkyu Kim1,∗

1Department of Computer Science and Engineering, Korea University

2Autonomous Driving Center, Hyundai Motor Company R&D Division

∗Correspondence: jinkyukim@korea.ac.kr

Abstract

An autonomous driving system requires a 3D object de-

tector, which must perceive all present road agents reli-

ably to navigate an environment safely. However, real-

world driving datasets often suffer from the problem of data

imbalance, which causes difﬁculties in training a model

that works well across all classes, resulting in an unde-

sired imbalanced sub-optimal performance. In this work,

we propose a method to address this data imbalance prob-

lem. Our method consists of two main components: (i)

a LiDAR-based 3D object detector with per-class multiple

detection heads where losses from each head are modiﬁed

by dynamic weight average to be balanced. (ii) Contex-

tual ground truth (GT) sampling, where we improve con-

ventional GT sampling techniques by leveraging semantic

information to augment point cloud with sampled ground

truth GT objects. Our experiment with KITTI and nuScenes

datasets conﬁrms our proposed method’s effectiveness in

dealing with the data imbalance problem, producing better

detection accuracy compared to existing approaches.

1. Introduction

LiDAR-based detectors have been widely adopted in the

autonomous driving system for capturing 3D scene percep-

tion and understanding. Such an autonomous driving sys-

tem must detect all possible other road agents (or objects)

to navigate an environment safely. Thus, a reliable LiDAR-

based detector requires dealing equally with different road

agents (or objects), e.g., cars, cyclists, barriers, or construc-

tion vehicles.

However, real-world driving datasets (e.g., KITTI and

nuScenes) suffer from the problem of imbalance where a

dataset contains unequal (or severely skewed) class dis-

tribution. As shown in Figure 1, objects such as cars

(42.63%) have a higher percentage compared to the per-

centage of other classes, such as bicycles (1.03%), motor-

cycles (1.11%), or construction vehicles (1.39%). Similarly,

in the KITTI dataset, cars (82.99%) have the majority of in-

stances, while pedestrians (12.76%) or cyclists (4.24%) are

underrepresented. Such data imbalance would cause difﬁ-

culties in training a 3D object detector that reliably works

well across all different classes, resulting in an undesired

imbalanced quality.

Multi-task learning techniques have been applied to ad-

dress this data imbalance problem by viewing multi-class

joint detection as multi-task learning. In this work, we ex-

plore applying such multi-task learning techniques to ad-

dress the data imbalance problem in the LiDAR-based 3D

object detection task. Speciﬁcally, we focus on answering

two key questions: (i) constructing multi-task network ar-

chitecture and (ii) balancing feature sharing across different

tasks. For (i), we use per-class multiple detection heads

instead of a single head. Each detection head is encour-

aged to learn class-speciﬁc features while sharing a back-

bone, which is trained to extract universal features. For (ii),

we explore applying existing multi-task loss balancing tech-

niques to improve the overall performance of different de-

tection heads. Speciﬁcally, we apply Dynamic Weight Av-

erage (DWA, [16]) that tunes gradients for different object

categories based on the rate of loss changes for each head

to learn average task weighting over time. We empirically

observe that combining multi-headed architecture and gra-

dient balancing techniques signiﬁcantly improves detection

accuracy.

Another story is data augmentation, which can make

class distribution smoother by making the model sees rare

classes more often during training. Conventionally, ground

truth (GT) sampling has been widely used. GT sampling

collects all ground truth points inside the labeled bound-

ing box into a database, and some of them are randomly

introduced to the current training frame via concatenation.

However, this does not consider where to place these ob-

jects. We, in fact, observe ground truth points are often

introduced in a random position where that object is rarely

observed in the real world. Thus, we propose contextual

GT sampling that leverages semantic scene information to

arXiv:2210.03331v1 [cs.CV] 7 Oct 2022

Car

Pedestrian

Barrier

Truck

Traffic Cone

Trailer

Bus

Construction Vehicle

Motorcycle

Bicycle

Car

Ped

Cyclist

50%

40%

30%

20%

10%

Amount of Bounding Boxes

100%

80%

60%

40%

20%

nuScenes dataset

Class Class

KITTI dataset

Figure 1. Class distributions for two 3D object detection datasets:

nuScenes [1] (left) and KITTI [9] (right).

present ground truth points in a more natural position, e.g.,

a sidewalk for pedestrians. Our experiment shows that our

contextual GT sampling provides extra performance gain,

especially for minor classes.

Our approach is mostly close to Zhu et al. [36] (CBGS)

in that they also use multiple detection heads and data aug-

mentation techniques, i.e., GT sampling [28]. However, our

work differs from it as follows: (i) we explore using multi-

task learning techniques, including multiple detection heads

with loss balancing techniques, to improve overall detection

performance across all categories. CBGS focused on utiliz-

ing multi-headed architecture with a uniform scaling, which

minimizes a uniformly weighted sum and does not consider

dynamically modifying weights like ours. (ii) we propose

contextual GT sampling, which addresses issues with con-

ventional GT sampling and results in better detection accu-

racy.

We summarize our contributions as follows:

• Inspired by multi-task learning, we propose a multi-

headed LiDAR-based 3D object detector where losses

for each head are balanced by dynamic weight average

(DWA).

• Combined with multi-headed architecture, we propose

contextual ground truth sampling, which improves

conventional ground truth (GT) sampling by leverag-

ing semantic scene information to introduce GT ob-

jects in a more realistic position.

• We conduct various experiments to demonstrate the

effectiveness of our proposed approach with widely-

used public datasets: KITTI and nuScenes. Our exper-

iments show that multi-task learning techniques com-

bined with our contextual GT sampling signiﬁcantly

improve the overall detection performance, especially

for minor classes.

2. Related Work

2.1. 3D Object Detection

A landmark work in the LiDAR-based 3D object detec-

tion is VoxelNet [35], an end-to-end trainable model that

ﬁrst voxelized a point cloud, and each equally spaced voxel

is encoded as a descriptive volumetric representation. Given

these features, conventional 2D convolutions are used to

generate and regress its region proposals. Yan et al. [28]

used sparse 3D convolutions to accelerate heavy compu-

tations of earlier LiDAR-based works. PointPillars [15]

is another landmark work that speeds up the encoding of

3D volumetric representation by dividing the 3D space

into pillars (instead of voxels). A more sophisticated ar-

chitecture is also used to achieve better detection results.

PointRCNN [22] used a two-stage architecture to reﬁne

the initial 3D bounding box proposals. Part-A2 [23] fo-

cuses on leveraging intra-object parts for better results. PV-

RCNN [20] and PV-RCNN++ [21] simultaneously process

coarse-grained voxels and the raw point cloud. Recently,

CenterPoint [31] applied a key-point detector that predicts

the geometric center of objects. Similarly, Voxel RCNN [5]

used coarse voxel granularity to reduce the computation

cost, retaining the overall detection performance. In this

work, we focus on improving data imbalance problems in

LiDAR-based object detection. Thus, we do not claim a

novel 3D object detector; rather, we rely on existing land-

mark work PointPillars [15], PV-RCNN [20], and Voxel

RCNN [5] to demonstrate the effectiveness of our proposed

approach. Note that, ideally, our approach is applicable to

others as well.

Lidar Points Augmentations. Data augmentation has been

widely applied to LiDAR-based 3D object detection for var-

ious reasons: (i) improving point cloud quality by upsam-

pling a low-density point cloud [30,32] or by point cloud

completion for occluded regions [2,27,29,33]. (ii) Improv-

ing the robustness of object detection by global and local

augmentations. Choi et al. [4] randomly augmented sub-

partitions of GT objects (e.g., dropping points in a certain

sub-partition) [4]. Zheng et al. [34] divided each ground

truth object into six (inward facing) pyramids, then aug-

mented them with random dropout, swap, and sparsifying

operations. (iii) Improving generalization power by aug-

menting clear weather point clouds with adverse conditions

via physical modelings, such as fog [13] or snowfall [12].

(iv) Augmenting LiDAR-based features with other modal-

ities, such as images [25,26]. (v) Smoothing class dis-

tribution by sampling ground truth objects from the (of-

ﬂine) database and introducing them to the current scene

(GT sampling, [28]). In this work, similar to (v), we focus

on smoothing the density of each class to address the data

imbalance problem (i.e., improving the detection accuracy

of rare objects while maintaining that of common objects).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ResolvingClassImbalanceProblemforLiDAR-basedObjectDetectorbyBalancedGradientsandContextualGroundTruthSamplingDaeunLee1,JongwonPark2,andJinkyuKim1;1DepartmentofComputerScienceandEngineering,KoreaUniversity2AutonomousDrivingCenter,HyundaiMotorCompanyR&DDivisionCorrespondence:jinkyukim@korea.ac.krAbs...

展开>> 收起<<

Resolving Class Imbalance Problem for LiDAR-based Object Detector by Balanced Gradients and Contextual Ground Truth Sampling Daeun Lee1 Jongwon Park2 and Jinkyu Kim1.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Resolving Class Imbalance Problem for LiDAR-based Object Detector by Balanced Gradients and Contextual Ground Truth Sampling Daeun Lee1 Jongwon Park2 and Jinkyu Kim1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: