LidarAugment Searching for Scalable 3D LiDAR Data Augmentations Zhaoqi Leng1 Guowang Li1 Chenxi Liu1 Ekin Dogus Cubuk2 Pei Sun1 Tong He1 Dragomir Anguelov1and Mingxing Tan1

2025-05-03 0 0 3.47MB 7 页 10玖币

侵权投诉

LidarAugment: Searching for Scalable 3D LiDAR Data Augmentations

Zhaoqi Leng1∗, Guowang Li1, Chenxi Liu1, Ekin Dogus Cubuk2, Pei Sun1, Tong He1,

Dragomir Anguelov1and Mingxing Tan1

Abstract— Data augmentations are important in training

high-performance 3D object detectors for point clouds. Despite

recent efforts on designing new data augmentations, perhaps

surprisingly, most state-of-the-art 3D detectors only use a

few simple data augmentations. In particular, different from

2D image data augmentations, 3D data augmentations need

to account for different representations of input data and

require being customized for different models, which introduces

signiﬁcant overhead. In this paper, we resort to a search-based

approach, and propose LidarAugment, a practical and effective

data augmentation strategy for 3D object detection. Unlike

previous approaches where all augmentation policies are tuned

in an exponentially large search space, we propose to factorize

and align the search space of each data augmentation, which

cuts down the 20+ hyperparameters to 2, and signiﬁcantly

reduces the search complexity. We show LidarAugment can

be customized for different model architectures with different

input representations by a simple 2D grid search, and con-

sistently improve both convolution-based UPillars/StarNet/RSN

and transformer-based SWFormer. Furthermore, LidarAug-

ment mitigates overﬁtting and allows us to scale up 3D detectors

to much larger capacity. In particular, by combining with latest

3D detectors, our LidarAugment achieves a new state-of-the-art

74.8 mAPH L2 on Waymo Open Dataset.

I. INTRODUCTION

Data augmentations are widely used in training deep

neural networks. In particular, for autonomous driving, many

data augmentations are developed to improve data efﬁciency

and model generalization. However, most recent 3D object

detectors only use a few basic data augmentation operations

such as rotation, ﬂip and ground-truth sampling [1], [2], [3],

[4], [5], [6], [7]. This is in a surprising contrast to 2D image

recognition and detection, where much more sophisticated

2D data augmentations are commonly used in modern image-

based models [8], [9], [10], [11], [12], [13]. In this paper,

we aim to answer: is it practical to adopt more advanced 3D

data augmentations to improve modern 3D object detectors,

especially for high-capacity models?

The main challenge of adopting advanced 3D data aug-

mentations is that 3D augmentations are often sensitive to

input representations and model capacity. For example, range

image based models and point cloud based models require

different types of data augmentation due to different input

representations. High capacity 3D detectors are typically

prone to overﬁtting and require stronger overall data augmen-

tation compared to lite models with fewer parameters. There-

fore, tailoring each 3D augmentation for different models is

necessary. However, the search space scales exponentially

with respect to the number of hyperparameters, which leads

1Waymo Research, 2Google Brain, ∗lengzhaoqi@waymo.com

UPillar UPillar-L

3D mAPH L2

57.8 60.0

63.7

71.0

Baseline LidarAugment

Fig. 1: Model scaling with LidarAugment on Waymo

Open Dataset. Baseline augmentations are from the prior

art of [14]. When scaling up UPillars to UPillars-L, our

LidarAugment improves both models, and the gains are more

signiﬁcant for the larger model, thanks to its customizable

regularization. More results in Table IV.

to signiﬁcant search cost. Recent studies [15], [16] attempt to

address these challenges by using efﬁcient search algorithms.

Those approaches typically construct a ﬁxed search space,

and run a complex search algorithms (such as population-

based search [17]) to ﬁnd a data augmentation strategy for

a model. However, our studies reveal that the search spaces

used in prior works are suboptimal. Despite having complex

search algorithms, without a systematic way to deﬁne a good

search space, we cannot unleash the potential of a model.

In this paper, we propose LidarAugment, a simpliﬁed

search-based approach for 3D data augmentations. Unlike

previous methods that rely on complex search algorithms to

explore an exponentially large search space, our approach

aims to deﬁne a simpliﬁed search space that contains a

variety of data augmentations but has minimal (i.e. two)

hyperparameters, such that users can easily customize a

diverse set of 3D data augmentations for different models.

Speciﬁcally, we construct the LidarAugment search space

by ﬁrst factorizing a large search space based on operations

and exploring each sub search space with a per-operation

search. Then, we normalize and align the sub search space

for each data augmentation to form the LidarAugment search

space. The ﬁnal LidarAugment search space contains only

two shared hyperparameters: m∈[0,∞)controls the nor-

malized magnitude and p∈[0,1]controls the probability of

applying each data augmentation policies. Our LidarAug-

ment search space signiﬁcantly simpliﬁes prior works [15]

by cutting down the number of hyperparameters to two, a

15×reduction in number of hyperparameters.

Despite only having two hyperparamters, our LidarAug-

ment search space contains a variety of existing 3D data

arXiv:2210.13488v1 [cs.CV] 24 Oct 2022

(1) Original (2) Global

Rotate

(3) Global

Scale (4) Global

Translate

(5) Global

Flip

(6) Global

Drop

(7) Frustum

Drop

(8) Frustum

Noise

(9) Drop

Box

(10) Paste

Box

(11) Swap

Background

(a) (b)

Fig. 2: Visualizing LidarAugment. (a) all data augmentation operations used in LidarAugment. For non-global operations,

we highlight the augmented parts in red (boxes). (b) occlusion introduced by data augmentation, e.g., paste a car object, is

handled by removing overlapping rays in range view based on distance. We show point clouds and the corresponding range

images with (bottom)/without (top) removing overlapping rays in the range view.

augmentations, such as drop/paste 3D bounding boxes, ro-

tate/scale/dropping points, and copy-paste objects and back-

grounds. In addition, LidarAugment supports coherent aug-

mentation across both point and range view representations,

which generalizes to multi-view 3D detectors.

We perform extensive experiments on the Waymo Open

Dataset [18] and demonstrate LidarAugment is effec-

tive and generalizes well to different model architectures

(convolutions-based and transformer-based), different input

views (3D point view and range image), and different

temporal scales (single and multi frames). Notably, Li-

darAugment advances state-of-the-art (SOTA) transformer-

based SWFormer by 1.4 mAPH on the test set. Furthermore,

LidarAugment provides customizable regularization, which

allows us to scale up 3D object detectors to much higher

capacity without overﬁtting. As summarized in Figure 1,

LidarAugment consistently improves UPillars models, and

the performance gains are particularly large for high-capacity

models. Our contributions can be summarized as:

1) New insight: we reveal that common 3D data aug-

mentation search spaces are suboptimal and should be

tailored for different models.

2) LidarAugment: we propose the LidarAugment search

space, which supports jointly optimizing 10 augmen-

tation policies with only two hyperparameters (15×

reduction compares to prior works), offering diverse

yet practical augmentations. In addition, we develop

a new method to coherently augment both point and

range-view input representations.

3) State-of-the-art performance: LidarAugment consis-

tently improves both convolution-based UPillars/Star-

Net/RSN and attention-based SWFormer. With Li-

darAugment, we achieve new state-of-the-art results

on Waymo Open Dataset. In addition, LidarAugment

enables model scaling to achieve much better quality

for high-capacity 3D detectors.

II. RELATED WORKS

Data augmentation. Data augmentation is widely used

in training deep neural networks In particular, for 3D object

detection from point clouds, several global and local data

augmentations, such as rotation, ﬂip, pasting objects, and

frustum noise, are used to improve model performance [19],

[1], [20], [2], [4], [21], [15], [22], [23], [24]. However, as

3D data augmentations are sensitive to model architectures

and capacity, it often requires extensive manual tuning to

use these augmentations. Therefore, most existing 3D object

detectors [2], [6], [25], [26], [14] only adopt a few simple

augmentations, such as ﬂip and shift pixels.

Several recent works attempt to use range images for

multi-view 3D detection, but very few augmentations are

developed for range images. [5] attempts to paste objects in

the range image without handling occlusions. Our Paste Box

augmentation support coherently augmenting both range-

view and point-view input data while handling occluded

objects in a simple way (more details in Figure 2), which

enables more realistic augmented scenes and enriches the

data augmentations for multi-view 3D detectors.

Learning data augmentation policies. Designing good

data augmentation normally requires manual tuning and

domain expertise. Several search-based approaches have

been proposed for 2D images, such as AutoAugment [9],

RandAugment [12], and Fast AutoAugment [27]. Our Li-

darAugment is inspired by RandAugment in the sense that

we both try to construct a simpliﬁed search space. However,

unlike 2D image augmentations, where a search space works

well for many models, we reveal that existing search space

for 3D detection tasks are suboptimal, which motivates us to

propose the ﬁrst systematical method to deﬁne search spaces

for 3D detection tasks.

On the other hand, for 3D detection, PPBA [15] and

PointAugment [16] propose efﬁcient learning-based data

augmentation frameworks for 3D point clouds. However,

both works require users to run a complex algorithm on

an exponentially large but not well-designed search space.

In contrast, our work provides a systematical framework to

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LidarAugment:SearchingforScalable3DLiDARDataAugmentationsZhaoqiLeng1,GuowangLi1,ChenxiLiu1,EkinDogusCubuk2,PeiSun1,TongHe1,DragomirAnguelov1andMingxingTan1AbstractDataaugmentationsareimportantintraininghigh-performance3Dobjectdetectorsforpointclouds.Despiterecenteffortsondesigningnewdataaugmentati...

展开>> 收起<<

LidarAugment Searching for Scalable 3D LiDAR Data Augmentations Zhaoqi Leng1 Guowang Li1 Chenxi Liu1 Ekin Dogus Cubuk2 Pei Sun1 Tong He1 Dragomir Anguelov1and Mingxing Tan1.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LidarAugment Searching for Scalable 3D LiDAR Data Augmentations Zhaoqi Leng1 Guowang Li1 Chenxi Liu1 Ekin Dogus Cubuk2 Pei Sun1 Tong He1 Dragomir Anguelov1and Mingxing Tan1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: