Boosting Point Clouds Rendering via Radiance Mapping Xiaoyang Huang1 Yi Zhang1 Bingbing Ni1 Teng Li2 Kai Chen3 Wenjun Zhang1 1Shanghai Jiao Tong University Shanghai 200240 China

2025-05-06 0 0 2.4MB 15 页 10玖币
侵权投诉
Boosting Point Clouds Rendering via Radiance Mapping
Xiaoyang Huang1*, Yi Zhang1*, Bingbing Ni1, Teng Li2, Kai Chen3, Wenjun Zhang1
1Shanghai Jiao Tong University, Shanghai 200240, China,
2Anhui University, 3Shanghai AI Lab
{huangxiaoyang, yizhangphd, nibingbing}@sjtu.edu.cn
Abstract
Recent years we have witnessed rapid development in NeRF-
based image rendering due to its high quality. However, point
clouds rendering is somehow less explored. Compared to
NeRF-based rendering which suffers from dense spatial sam-
pling, point clouds rendering is naturally less computation in-
tensive, which enables its deployment in mobile computing
device. In this work, we focus on boosting the image qual-
ity of point clouds rendering with a compact model design.
We first analyze the adaption of the volume rendering for-
mulation on point clouds. Based on the analysis, we simplify
the NeRF representation to a spatial mapping function which
only requires single evaluation per pixel. Further, motivated
by ray marching, we rectify the the noisy raw point clouds
to the estimated intersection between rays and surfaces as
queried coordinates, which could avoid spatial frequency col-
lapse and neighbor point disturbance. Composed of rasteriza-
tion, spatial mapping and the refinement stages, our method
achieves the state-of-the-art performance on point clouds ren-
dering, outperforming prior works by notable margins, with
a smaller model size. We obtain a PSNR of 31.74 on NeRF-
Synthetic, 25.88 on ScanNet and 30.81 on DTU. Code and
data are publicly available1.
Introduction
The rising trend of AR/VR application calls for better im-
age quality and higher computation efficiency in render-
ing technology. Recent works mainly focus on NeRF-based
(Mildenhall et al.) rendering due to its photo-realistic effect.
Nevertheless, NeRF-based rendering suffers from heavy
computation cost, since its representation assumes no ex-
plicit geometry is known, and requires burdensome spatial
sampling. This drawback severely hampers its application
in mobile computing devices, such as smart phones or AR
headsets. On the other hand, point clouds (Huang et al.),
which have explicit geometry, are easy to obtained as the
depth sensors become prevalent and MVS algorithms (Yao
et al.; Wang et al.) get powerful. It deserves more attention
to develop high-performance rendering methods based on
*These authors contributed equally.
Corresponding Author.
Copyright © 2023, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
1https://github.com/seanywang0408/RadianceMapping
point clouds, which is so far insufficiently explored. In this
work, we introduce a point clouds rendering method which
achieves comparable rendering performance to NeRF.
The main difference between NeRF-based rendering and
point clouds rendering is that the latter is designed upon the
noisy surface of objects. On the bright side, it is a beneficial
geometric prior which could greatly reduce the query times
in 3D space. On the bad side, this prior is noisy and sparse,
since the point clouds are generally reconstructed by MVS
algorithms or collected by depth sensors. It needs additional
approaches to alleviate the artifact brought by the noise and
sparsity. Therefore, most of the current point clouds render-
ing methods require two steps. One is the spatial feature
mapping, and the other is image-level refinement. The spa-
tial feature mapping step is similar to the NeRF represen-
tation, which maps a 3D coordinate to its color, density or
latent feature. The refinement step is usually implemented
as a convolutional neural network. In this work, we mainly
focus on the spatial feature mapping step. Previous works
use point clouds voxelization (Dai et al.), learnable param-
eters (R¨
uckert, Franke, and Stamminger; Kopanas et al.) or
linear combination of sphere basis (Rakhimov et al.) as map-
ping functions. However, these methods suffer either from
high computation cost, large storage requirements, or unsat-
isfactory rendering performance. To this end, we introduce
a much simpler but surprisingly effective mapping function.
Motivated by the volume rendering formulation in NeRF, we
analyze its adaptation on point clouds rendering scenarios.
It is concluded that in a point cloud scene, the volumetric
rendering could be simplified to the modeling of the view-
dependent color of the first-time intersection between the
estimated surface and the ray. In other words, we augment
each 3D point (i.e., most probably a surface point) with a
learnable feature indicating first-hit color. Thereby the point
clouds rendering task could be re-cast within the high fi-
delity NeRF framework, without consuming redundant com-
putation on internal ray samples. We name it radiance map-
ping. Moreover, based on radiance mapping, we rectify the
raw point cloud coordinates that are fed into the mapping
function using the z-buffer in rasterization to obtain a query
point which lies exactly on the camera ray. This approach
allows us to obtain a more accurate geometry and avoid spa-
tial frequency collapse. The radiance mapping function con-
sisted of a 5-layer MLP is only 0.75M large, which is much
arXiv:2210.15107v2 [cs.CV] 8 Dec 2022
×
Ground Truth
Neighbor Point Disturbance Our Rectification
o
Raw Coordinate
Rectified Coordinate
o
××
(b) (c)
radius τ
Spatial Frequency Collapse
o
(a)
radius τ radius τ
Figure 1: (a) Spatial frequency collapse occurs when using neural descriptors or raw point clouds query. The point is optimized
to a green color, which is the mixing of yellow and blue. (b) Using raw point clouds query would additionally cause neighbor
point disturbance. The points lying close have a larger impact on the feature optimization of each other. (c) Our coordinate
rectification could alleviate the above issues. The idea is illustrated in 1D rendering.
smaller than the spatial feature mapping functions in previ-
ous works, but with notably better performance. Followed
by a 2D neural renderer which compensates the sparcity and
noise in point clouds as done in previous works, our com-
plete model is approximately 8M in total.
Our method reaches comparable rendering effect com-
pared to NeRF, but with much less computation cost since
it only needs single model inference per pixel. Compared to
prior point clouds rendering methods, we obtain notable im-
provement in terms of image quality, with a smaller model
size and simpler computation. We achieve a PSNR of 31.74
on NeRF-Synthetic (Mildenhall et al.), 25.88 on ScanNet
(Dai et al.) and 30.81 on DTU (Aanæs et al.). As far as we
know, It is the state-of-the-art result on this task.
Related Work
Implicit Rendering
NeRF-based Neural Radiance Fields (Mildenhall et al.)
advance the neural rendering quality to a higher level. NeRF
represents the scene using an MLP which predicts the color
and density of a point. It projects the points along the cam-
era ray to the pixel color with volume rendering. Follow-
ing NeRF, there are various innovations which address the
different challenges in NeRF representation. PixelNeRF (Yu
et al.), IBRNet (Wang et al.) and DietNeRF (Jain, Tancik,
and Abbeel) render novel views from only one or a few input
images. NeRF-W (Martin-Brualla et al.) tackles the variable
illumination and transient occluders in the wild instead of a
static scene. Mip-NeRF (Barron et al.) and Mip-NeRF 360
(Barron et al.) improves the image quality by rendering anti-
aliased conical frustums. NSVF (Liu et al.), PlenOctrees (Yu
et al.) and TensoRF (Chen et al.) aim at accelerating the in-
ference speed of NeRF by building a more efficient struc-
ture after scene fitting. Point-NeRF (Xu et al.) also assumes
a point cloud is given like ours. But it still follows the vol-
ume rendering formulation in NeRF, which also suffers from
dense spatial sampling.
Implicit Surface Rendering This line of works aim at re-
construction the implicit surfaces via neural rendering. DVR
(Niemeyer et al.) learns implicit 3D representation from im-
ages by analytically deriving depth gradients from implicit
differentiation. IDR (Yariv et al.) renders an implicit surface
by appromitaing the light reflected from the surface towards
the camera. UNISURF (Oechsle, Peng, and Geiger) com-
bines implicit surface models and radiance fields together to
enable surface reconstruction without object masks. NeuS
(Wang et al.) gives a theoretical proof that the classic vol-
ume rendering formulation causes error on the expectation
of the object surface, and presents a solution which yields an
unbiased SDF representation. Yariv et al. models the volume
density as the function of the SDF representation, leading to
a more accurate sampling of the camera ray.
Point Clouds Rendering
Inverse Rendering Early work (Zwicker et al.) proposes
a point cloud rendering method using an Elliptical Weighted
Average filter based on Gaussian Kernel. Yifan et al. enables
backward propagation of surface splatting to optimize the
position of point clouds to match to object geometry from
images. Insafutdinov and Dosovitskiy use a differentiable
point clouds projection module to unsupervisedly learn the
object shape and pose from two-view images. Lin, Kong,
and Lucey propose pseudo-rendering which upsamples the
target image to alleviate the collision effect in discretization.
Wiles et al. construct a point cloud from single-view im-
ages by using a depth regressor and spatial feature predictor,
and render the point cloud with α-composition followed by a
ResNet (He et al.) refinement network. The training is super-
vised by a photometric loss and a GAN loss (Wang et al.).
Zhou et al. and Godard, Mac Aodha, and Brostow adopt a
similar approach, but on a monocular depth estimation task
with street-view video sequences.
View Synthesis NPBG (Aliev et al.) proposes to render
novel views of a scene using point-based learnable neural
descriptors and a U-Net refinement network. It adopts multi-
scale rasterization to model image details in different level.
Johnson, Alahi, and Fei-Fei; Dosovitskiy and Brox use a
perceptual loss to optimize the network. Dai et al. propose
to project the point clonds to a layered volume by voxeliza-
tion. Then a 3D CNN (Maturana and Scherer; Yang et al.)
is used to transform the volume into a set of multi-plane im-
ages and their blending weights, which form the final im-
age by its the weighted sum. NPBG++ (Rakhimov et al.)
Ground Truth NeRFOurs Ground Truth NeRFOurs
Opaque
Surface
Translucent
Surface
Figure 2: The translucent and opaque surfaces rendered by NeRF and our method. In the Drums scene, both methods are
optimizing the color of the first-intersected surface instead of modeling the correct translucency. The second row shows the
membrane from another view. In the Materials scene, our method could even render more decorate specular effect on the
smooth metal balls, while NeRF generates somehow blurry artifacts. The visualization in the original NeRF paper (Mildenhall
et al.) reveals the same artifact. We owe this superiority to the explicit geometry provided by point clouds.
reduces the running time upon NPBG, using a feature ex-
tracter to lift the neural descriptor feature and making it
view-dependent. ADOP (R¨
uckert, Franke, and Stamminger)
renders HDR and LDR images with a coarsely-initialized
point cloud and camera parameters. The point clouds, cam-
era poses and the 2D refinement network are jointly opti-
mized. Kopanas et al. perform scene optimization for each
view based on bi-directional Elliptical Weighted Average
splatting. Ost et al. promote point clouds to implicit light
fields to allow fast inference in view synthesis. READ (Li
et al.) adopt a similar approach to NPBG++ to synthesize
photo-realistic street views for autonomous driving. We an-
alyze the most relevant works to ours in the next section.
Method
Spatial Mapping
We first analyze the spatial mapping functions in previous
point clouds rendering methods. Then we introduce our ra-
diance mapping, a simpler but more effective mapping.
Previous Mapping Functions Revisited NPBG (Aliev
et al.) attachs learnable parameters to each point as neural
descriptors. The advantage of this approach is that each point
feature is optimized independently, and would not be influ-
enced by nearby point feature. This is beneficial to those
scenes where surface color changes drastically. However, it
also leads to a drawback that the density of point clouds
impose restrictions on the representation capacity of point
feature. When the point clouds are sparse, the same neural
descriptor would be rasterized to multiple pixels and opti-
mized to fit the average of multiple pixels, which harms the
rendering quality. We illustrate this issue in 1D rendering in
Figure 1 (a). The point is optimized to a green color, which
is the mixing of yellow and blue. Since the cause of the
phenomenon is analogous to the dissatisfaction of Nyquist
Rate in signal processing, we call it spatial frequency col-
lapse. On the other hand, when point clouds get compara-
tively dense due to higher quality of reconstruction or depth
sensor, the size of point feature would grows proportionally,
which consumes more memory for storage and training. Be-
sides, some of the point features which are only visible in a
few views might not be sufficiently optimized.
Dai et al. propose to use a 3D CNN to extract spatial fea-
ture. It first voxelizes the point clouds into a layered volume,
and then adopts a 3D CNN to extract spacial feature. Due to
the high computation complexity of 3D CNN, this model is
much more heavier, and not easy to deploy.
NPBG++ (Rakhimov et al.) develops a spatial mapping
function motivated by sphere harmonics basis. It first uses a
shared 2D CNN to extract image feature from multi-view
images, and then aggregate the feature of each view into
the point clouds by a linear combination of learnable basis
functions over the unit sphere. This approach considers view
direction as input, which would potentially generate better
rendered images. However, it still suffers from proportion-
ally increasing memory as the point clouds get more dense,
similar to NPBG . Besides, it requires an additional U-Net as
a image feature extractor which further increase model size.
Radiance Mapping Comparing to the above spatial map-
ping functions, our method is much more light-weight.
Our compact representation store the view-dependent ra-
diance of the object surface. The idea is motivated from
the volumetric rendering formulation in NeRF represen-
tations (Mildenhall et al.), which take the 3D coordinate
x= (x, y, z)and view direction d= (θ, φ)as inputs and
output the color cand density σusing a multi-layer percep-
tron (MLP) FΘ, parameterized by Θ:
c, σ =FΘ(x,d)(1)
Since NeRF representations assume no explicit geometry
exists, each point lying on the camera ray r=o+tdare
queried and aggregated to obtain the final pixel color C(r):
C(r) = Ztf
tn
T(t)σ(r(t))c(r(t),d)dt (2)
T(t) = exp Zt
tn
σ(r(s))ds(3)
摘要:

BoostingPointCloudsRenderingviaRadianceMappingXiaoyangHuang1*,YiZhang1*,BingbingNi1†,TengLi2,KaiChen3,WenjunZhang11ShanghaiJiaoTongUniversity,Shanghai200240,China,2AnhuiUniversity,3ShanghaiAILabfhuangxiaoyang,yizhangphd,nibingbingg@sjtu.edu.cnAbstractRecentyearswehavewitnessedrapiddevelopmentinNeRF-...

展开>> 收起<<
Boosting Point Clouds Rendering via Radiance Mapping Xiaoyang Huang1 Yi Zhang1 Bingbing Ni1 Teng Li2 Kai Chen3 Wenjun Zhang1 1Shanghai Jiao Tong University Shanghai 200240 China.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:2.4MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注