predictions for points selected based on their uncertainty. We
also propose to utilize the predicted probabilities to create
an efficient version of the k nearest neighbors algorithm
(pKNN). Furthermore, we provide several baselines and
evaluate their results on the task of uncertainty-aware LiDAR
panoptic segmentation. In summary, our contributions are as
follows:
•The novel proposal-free EvLPSNet architecture for
uncertainty-aware LiDAR panoptic segmentation.
•The uQR module for refining the prediction for the most
uncertain points.
•The efficient pKNN algorithm utilizing the predicted
class probabilities.
•Several baselines for comparison with EvLPSNet.
II. RELATED WORK
A. Segmentation of LiDAR Point Clouds
The release of the SemanticKITTI dataset [15] led to
the emergence of many works, initially for the semantic
segmentation of LiDAR point clouds. These can generally
be classified based on the point cloud representations they
employ, such as projected range images [16], [17], [18], 3D
voxels [19], point-based [10], and BEV polar coordinates
[9]. Most panoptic approaches utilize these representations
as well.
Panoptic segmentation approaches can be classified as
proposal-based and proposal-free. While both employ sep-
arate semantic and instance segmentation branches, the dis-
tinction lies in the latter. Proposal-based methods typically
employ bounding box regression for discovering instances,
such as Mask-RCNN [20] in the case of EfficientLPS [5].
On the other hand, proposal-free approaches perform clus-
tering on the semantic prediction to obtain instance ids for
objects belonging to separate instances. Panoptic-PolarNet
[11] utilizes a Panoptic Deeplab-based [4] instance head to
regress offsets and centers for different instances. DS-Net
[21] proposes a dynamic shifting module to move instance
points towards their respective center. Panoptic-PHNet [22]
utilizes two different encoders, BEV and voxel-based, to
encode point cloud features, followed by a KNN-transformer
module to model interaction among voxels belonging to thing
classes.
B. Uncertainty Estimation
Many works for estimating uncertainty in segmentation
tasks employ sampling-based methods, such as Bayesian
Neural Networks [14] or Monte Carlo dropout [13], [23].
However, such methods are time and memory-intensive,
requiring multiple passes or sampling operations. For LiDAR
point clouds, SalsaNext [17] is an uncertainty-aware seman-
tic segmentation utilizing BNNs. Even though the network
output is quick to evaluate, due to the sampling of the
BNN approach the uncertainty is slow to obtain. Further,
no metric is presented to quantify the calibration of the
predicted uncertainty for this approach. We believe these
are severe limitations for safety-critical real-time applications
like autonomous driving. The need for single-pass sampling-
free uncertainty estimation motivates many works in the field.
Classical neural networks utilize softmax operations of the
final logits to predict per class score or probability, which
is not a reliable estimate of the network’s confidence in the
prediction, as shown by [12]. Guo et al. [24] propose the
Temperature Scaling (TS) method to learn a logit scaling fac-
tor on the softmax operation to provide calibrated probability
predictions. Other methods, such as [25], learn to separate
different classes in a latent space and, based on the distance
of the predicted to the nearest class feature, calculate the
uncertainty.
Sensoy et al. [12] proposed evidential deep learning to
provide reliable and fast uncertainty estimation with minimal
changes to a network. Petek et al. [26] utilize this method to
simultaneously predict semantic segmentation and bounding
box regression uncertainty. Sirohi et al. [2] introduce the
uncertainty-aware panoptic segmentation task and provide a
sampling-free network for a unified panoptic segmentation
and uncertainty for images. In our present work, we build
upon this to extend the approach to LiDAR point-clouds and
we provide a comprehensive quantitative analysis.
III. TECHNICAL APPROACH
An overview of our network architecture is shown in
Fig. 2. It is based on the proposal-free Panoptic-PolarNet
network [11]. Our evidential semantic segmentation head
and Panoptic-Deeplab based [4] instance segmentation head
utilize the learned features to predict per-point semantic
segmentation, semantic uncertainty, instance center and off-
sets. The predictions from both heads are fused to provide
panoptic segmentation results. Leveraging the segmentation
uncertainties, our proposed query and refine module helps to
improve the prediction for points within uncertain voxels.
Moreover, post-processing using our efficient probability-
based KNN improves the results further.
A. Network Architecture
We project the LiDAR points into a polar BEV grid
utilizing the encoder design proposed by PolarNet [9]. First,
the points (represented in 3D polar coordinates) are grouped
according to their location in a 2D polar BEV grid. The
grid has the dimensions of H×W= 480 ×360, where H
corresponds to the range and Wto the heading angle. Then,
for each grid cell, the corresponding points are encoded using
a simplified PointNet [27]. This is followed by a max pooling
operation to calculate the feature vector for every 2D grid cell
and to create a fixed-size grid representation of W×H×F,
where F= 512 is our number of feature channels.
The subsequent encoder-decoder network utilizes the U-
net [28] architecture. Its first three decoder layers are shared
by the semantic and instance segmentation branches, while
the remaining layers are separate. The instance segmentation
regresses the instance center heatmap and the instance offsets
on the BEV grid.