Boosting Point Clouds Rendering via Radiance Mapping Xiaoyang Huang1 Yi Zhang1 Bingbing Ni1 Teng Li2 Kai Chen3 Wenjun Zhang1 1Shanghai Jiao Tong University Shanghai 200240 China

2025-05-06 0 0 2.4MB 15 页 10玖币

侵权投诉

Boosting Point Clouds Rendering via Radiance Mapping

Xiaoyang Huang1*, Yi Zhang1*, Bingbing Ni1†, Teng Li2, Kai Chen3, Wenjun Zhang1

1Shanghai Jiao Tong University, Shanghai 200240, China,

2Anhui University, 3Shanghai AI Lab

{huangxiaoyang, yizhangphd, nibingbing}@sjtu.edu.cn

Abstract

Recent years we have witnessed rapid development in NeRF-

based image rendering due to its high quality. However, point

clouds rendering is somehow less explored. Compared to

NeRF-based rendering which suffers from dense spatial sam-

pling, point clouds rendering is naturally less computation in-

tensive, which enables its deployment in mobile computing

device. In this work, we focus on boosting the image qual-

ity of point clouds rendering with a compact model design.

We ﬁrst analyze the adaption of the volume rendering for-

mulation on point clouds. Based on the analysis, we simplify

the NeRF representation to a spatial mapping function which

only requires single evaluation per pixel. Further, motivated

by ray marching, we rectify the the noisy raw point clouds

to the estimated intersection between rays and surfaces as

queried coordinates, which could avoid spatial frequency col-

lapse and neighbor point disturbance. Composed of rasteriza-

tion, spatial mapping and the reﬁnement stages, our method

achieves the state-of-the-art performance on point clouds ren-

dering, outperforming prior works by notable margins, with

a smaller model size. We obtain a PSNR of 31.74 on NeRF-

Synthetic, 25.88 on ScanNet and 30.81 on DTU. Code and

data are publicly available1.

Introduction

The rising trend of AR/VR application calls for better im-

age quality and higher computation efﬁciency in render-

ing technology. Recent works mainly focus on NeRF-based

(Mildenhall et al.) rendering due to its photo-realistic effect.

Nevertheless, NeRF-based rendering suffers from heavy

computation cost, since its representation assumes no ex-

plicit geometry is known, and requires burdensome spatial

sampling. This drawback severely hampers its application

in mobile computing devices, such as smart phones or AR

headsets. On the other hand, point clouds (Huang et al.),

which have explicit geometry, are easy to obtained as the

depth sensors become prevalent and MVS algorithms (Yao

et al.; Wang et al.) get powerful. It deserves more attention

to develop high-performance rendering methods based on

*These authors contributed equally.

†Corresponding Author.

1https://github.com/seanywang0408/RadianceMapping

point clouds, which is so far insufﬁciently explored. In this

work, we introduce a point clouds rendering method which

achieves comparable rendering performance to NeRF.

The main difference between NeRF-based rendering and

point clouds rendering is that the latter is designed upon the

noisy surface of objects. On the bright side, it is a beneﬁcial

geometric prior which could greatly reduce the query times

in 3D space. On the bad side, this prior is noisy and sparse,

since the point clouds are generally reconstructed by MVS

algorithms or collected by depth sensors. It needs additional

approaches to alleviate the artifact brought by the noise and

sparsity. Therefore, most of the current point clouds render-

ing methods require two steps. One is the spatial feature

mapping, and the other is image-level reﬁnement. The spa-

tial feature mapping step is similar to the NeRF represen-

tation, which maps a 3D coordinate to its color, density or

latent feature. The reﬁnement step is usually implemented

as a convolutional neural network. In this work, we mainly

focus on the spatial feature mapping step. Previous works

use point clouds voxelization (Dai et al.), learnable param-

eters (R¨

uckert, Franke, and Stamminger; Kopanas et al.) or

linear combination of sphere basis (Rakhimov et al.) as map-

ping functions. However, these methods suffer either from

high computation cost, large storage requirements, or unsat-

isfactory rendering performance. To this end, we introduce

a much simpler but surprisingly effective mapping function.

Motivated by the volume rendering formulation in NeRF, we

analyze its adaptation on point clouds rendering scenarios.

It is concluded that in a point cloud scene, the volumetric

rendering could be simpliﬁed to the modeling of the view-

dependent color of the ﬁrst-time intersection between the

estimated surface and the ray. In other words, we augment

each 3D point (i.e., most probably a surface point) with a

learnable feature indicating ﬁrst-hit color. Thereby the point

clouds rendering task could be re-cast within the high ﬁ-

delity NeRF framework, without consuming redundant com-

putation on internal ray samples. We name it radiance map-

ping. Moreover, based on radiance mapping, we rectify the

raw point cloud coordinates that are fed into the mapping

function using the z-buffer in rasterization to obtain a query

point which lies exactly on the camera ray. This approach

allows us to obtain a more accurate geometry and avoid spa-

tial frequency collapse. The radiance mapping function con-

sisted of a 5-layer MLP is only 0.75M large, which is much

arXiv:2210.15107v2 [cs.CV] 8 Dec 2022

Ground Truth

Neighbor Point Disturbance Our Rectification

Raw Coordinate

Rectified Coordinate

××

(b) (c)

radius τ

Spatial Frequency Collapse

(a)

radius τ radius τ

Figure 1: (a) Spatial frequency collapse occurs when using neural descriptors or raw point clouds query. The point is optimized

to a green color, which is the mixing of yellow and blue. (b) Using raw point clouds query would additionally cause neighbor

point disturbance. The points lying close have a larger impact on the feature optimization of each other. (c) Our coordinate

rectiﬁcation could alleviate the above issues. The idea is illustrated in 1D rendering.

smaller than the spatial feature mapping functions in previ-

ous works, but with notably better performance. Followed

by a 2D neural renderer which compensates the sparcity and

noise in point clouds as done in previous works, our com-

plete model is approximately 8M in total.

Our method reaches comparable rendering effect com-

pared to NeRF, but with much less computation cost since

it only needs single model inference per pixel. Compared to

prior point clouds rendering methods, we obtain notable im-

provement in terms of image quality, with a smaller model

size and simpler computation. We achieve a PSNR of 31.74

on NeRF-Synthetic (Mildenhall et al.), 25.88 on ScanNet

(Dai et al.) and 30.81 on DTU (Aanæs et al.). As far as we

know, It is the state-of-the-art result on this task.

Related Work

Implicit Rendering

NeRF-based Neural Radiance Fields (Mildenhall et al.)

advance the neural rendering quality to a higher level. NeRF

represents the scene using an MLP which predicts the color

and density of a point. It projects the points along the cam-

era ray to the pixel color with volume rendering. Follow-

ing NeRF, there are various innovations which address the

different challenges in NeRF representation. PixelNeRF (Yu

et al.), IBRNet (Wang et al.) and DietNeRF (Jain, Tancik,

and Abbeel) render novel views from only one or a few input

images. NeRF-W (Martin-Brualla et al.) tackles the variable

illumination and transient occluders in the wild instead of a

static scene. Mip-NeRF (Barron et al.) and Mip-NeRF 360

(Barron et al.) improves the image quality by rendering anti-

aliased conical frustums. NSVF (Liu et al.), PlenOctrees (Yu

et al.) and TensoRF (Chen et al.) aim at accelerating the in-

ference speed of NeRF by building a more efﬁcient struc-

ture after scene ﬁtting. Point-NeRF (Xu et al.) also assumes

a point cloud is given like ours. But it still follows the vol-

ume rendering formulation in NeRF, which also suffers from

dense spatial sampling.

Implicit Surface Rendering This line of works aim at re-

construction the implicit surfaces via neural rendering. DVR

(Niemeyer et al.) learns implicit 3D representation from im-

ages by analytically deriving depth gradients from implicit

differentiation. IDR (Yariv et al.) renders an implicit surface

by appromitaing the light reﬂected from the surface towards

the camera. UNISURF (Oechsle, Peng, and Geiger) com-

bines implicit surface models and radiance ﬁelds together to

enable surface reconstruction without object masks. NeuS

(Wang et al.) gives a theoretical proof that the classic vol-

ume rendering formulation causes error on the expectation

of the object surface, and presents a solution which yields an

unbiased SDF representation. Yariv et al. models the volume

density as the function of the SDF representation, leading to

a more accurate sampling of the camera ray.

Point Clouds Rendering

Inverse Rendering Early work (Zwicker et al.) proposes

a point cloud rendering method using an Elliptical Weighted

Average ﬁlter based on Gaussian Kernel. Yifan et al. enables

backward propagation of surface splatting to optimize the

position of point clouds to match to object geometry from

images. Insafutdinov and Dosovitskiy use a differentiable

point clouds projection module to unsupervisedly learn the

object shape and pose from two-view images. Lin, Kong,

and Lucey propose pseudo-rendering which upsamples the

target image to alleviate the collision effect in discretization.

Wiles et al. construct a point cloud from single-view im-

ages by using a depth regressor and spatial feature predictor,

and render the point cloud with α-composition followed by a

ResNet (He et al.) reﬁnement network. The training is super-

vised by a photometric loss and a GAN loss (Wang et al.).

Zhou et al. and Godard, Mac Aodha, and Brostow adopt a

similar approach, but on a monocular depth estimation task

with street-view video sequences.

View Synthesis NPBG (Aliev et al.) proposes to render

novel views of a scene using point-based learnable neural

descriptors and a U-Net reﬁnement network. It adopts multi-

scale rasterization to model image details in different level.

Johnson, Alahi, and Fei-Fei; Dosovitskiy and Brox use a

perceptual loss to optimize the network. Dai et al. propose

to project the point clonds to a layered volume by voxeliza-

tion. Then a 3D CNN (Maturana and Scherer; Yang et al.)

is used to transform the volume into a set of multi-plane im-

ages and their blending weights, which form the ﬁnal im-

age by its the weighted sum. NPBG++ (Rakhimov et al.)

Ground Truth NeRFOurs Ground Truth NeRFOurs

Opaque

Surface

Translucent

Surface

Figure 2: The translucent and opaque surfaces rendered by NeRF and our method. In the Drums scene, both methods are

optimizing the color of the ﬁrst-intersected surface instead of modeling the correct translucency. The second row shows the

membrane from another view. In the Materials scene, our method could even render more decorate specular effect on the

smooth metal balls, while NeRF generates somehow blurry artifacts. The visualization in the original NeRF paper (Mildenhall

et al.) reveals the same artifact. We owe this superiority to the explicit geometry provided by point clouds.

reduces the running time upon NPBG, using a feature ex-

tracter to lift the neural descriptor feature and making it

view-dependent. ADOP (R¨

uckert, Franke, and Stamminger)

renders HDR and LDR images with a coarsely-initialized

point cloud and camera parameters. The point clouds, cam-

era poses and the 2D reﬁnement network are jointly opti-

mized. Kopanas et al. perform scene optimization for each

view based on bi-directional Elliptical Weighted Average

splatting. Ost et al. promote point clouds to implicit light

ﬁelds to allow fast inference in view synthesis. READ (Li

et al.) adopt a similar approach to NPBG++ to synthesize

photo-realistic street views for autonomous driving. We an-

alyze the most relevant works to ours in the next section.

Method

Spatial Mapping

We ﬁrst analyze the spatial mapping functions in previous

point clouds rendering methods. Then we introduce our ra-

diance mapping, a simpler but more effective mapping.

Previous Mapping Functions Revisited NPBG (Aliev

et al.) attachs learnable parameters to each point as neural

descriptors. The advantage of this approach is that each point

feature is optimized independently, and would not be inﬂu-

enced by nearby point feature. This is beneﬁcial to those

scenes where surface color changes drastically. However, it

also leads to a drawback that the density of point clouds

impose restrictions on the representation capacity of point

feature. When the point clouds are sparse, the same neural

descriptor would be rasterized to multiple pixels and opti-

mized to ﬁt the average of multiple pixels, which harms the

rendering quality. We illustrate this issue in 1D rendering in

Figure 1 (a). The point is optimized to a green color, which

is the mixing of yellow and blue. Since the cause of the

phenomenon is analogous to the dissatisfaction of Nyquist

Rate in signal processing, we call it spatial frequency col-

lapse. On the other hand, when point clouds get compara-

tively dense due to higher quality of reconstruction or depth

sensor, the size of point feature would grows proportionally,

which consumes more memory for storage and training. Be-

sides, some of the point features which are only visible in a

few views might not be sufﬁciently optimized.

Dai et al. propose to use a 3D CNN to extract spatial fea-

ture. It ﬁrst voxelizes the point clouds into a layered volume,

and then adopts a 3D CNN to extract spacial feature. Due to

the high computation complexity of 3D CNN, this model is

much more heavier, and not easy to deploy.

NPBG++ (Rakhimov et al.) develops a spatial mapping

function motivated by sphere harmonics basis. It ﬁrst uses a

shared 2D CNN to extract image feature from multi-view

images, and then aggregate the feature of each view into

the point clouds by a linear combination of learnable basis

functions over the unit sphere. This approach considers view

direction as input, which would potentially generate better

rendered images. However, it still suffers from proportion-

ally increasing memory as the point clouds get more dense,

similar to NPBG . Besides, it requires an additional U-Net as

a image feature extractor which further increase model size.

Radiance Mapping Comparing to the above spatial map-

ping functions, our method is much more light-weight.

Our compact representation store the view-dependent ra-

diance of the object surface. The idea is motivated from

the volumetric rendering formulation in NeRF represen-

tations (Mildenhall et al.), which take the 3D coordinate

x= (x, y, z)and view direction d= (θ, φ)as inputs and

output the color cand density σusing a multi-layer percep-

tron (MLP) FΘ, parameterized by Θ:

c, σ =FΘ(x,d)(1)

Since NeRF representations assume no explicit geometry

exists, each point lying on the camera ray r=o+tdare

queried and aggregated to obtain the ﬁnal pixel color C(r):

C(r) = Ztf

T(t)σ(r(t))c(r(t),d)dt (2)

T(t) = exp −Zt

σ(r(s))ds(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BoostingPointCloudsRenderingviaRadianceMappingXiaoyangHuang1*,YiZhang1*,BingbingNi1,TengLi2,KaiChen3,WenjunZhang11ShanghaiJiaoTongUniversity,Shanghai200240,China,2AnhuiUniversity,3ShanghaiAILabfhuangxiaoyang,yizhangphd,nibingbingg@sjtu.edu.cnAbstractRecentyearswehavewitnessedrapiddevelopmentinNeRF-...

展开>> 收起<<

Boosting Point Clouds Rendering via Radiance Mapping Xiaoyang Huang1 Yi Zhang1 Bingbing Ni1 Teng Li2 Kai Chen3 Wenjun Zhang1 1Shanghai Jiao Tong University Shanghai 200240 China.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Boosting Point Clouds Rendering via Radiance Mapping Xiaoyang Huang1 Yi Zhang1 Bingbing Ni1 Teng Li2 Kai Chen3 Wenjun Zhang1 1Shanghai Jiao Tong University Shanghai 200240 China

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: