Estimating Neural Reflectance Field from Radiance Field using Tree Structures

2025-04-22 0 0 8.04MB 16 页 10玖币
侵权投诉
Estimating Neural Reflectance Field from Radiance Field using Tree Structures
Xiu Li1,2Xiao Li1Yan Lu1
1Microsoft Research Asia 2Tencent
Abstract
We present a new method for estimating the Neural Re-
flectance Field (NReF) of an object from a set of posed
multi-view images under unknown lighting. NReF repre-
sents 3D geometry and appearance of objects in a disentan-
gled manner, and are hard to be estimated from images only.
Our method solve this problem by exploiting the Neural Ra-
diance Field (NeRF) as a proxy representation, from which
we perform further decomposition. A high-quality NeRF
decomposition relies on good geometry information extrac-
tion as well as good prior terms to properly resolve ambigu-
ities between different components. To extract high-quality
geometry information from radiance fields, we re-design a
new ray-casting based method for surface point extraction.
To efficiently compute and apply prior terms, we convert dif-
ferent prior terms into different type of filter operations on
the surface extracted from radiance field. We then employ
two type of auxiliary data structures, namely Gaussian KD-
tree and octree, to support fast querying of surface points
and efficient computation of surface filters during training.
Based on this, we design a multi-stage decomposition op-
timization pipeline for estimating neural reflectance field
from neural radiance fields. Extensive experiments show
our method outperforms other state-of-the-art methods on
different data, and enable high-quality free-view relighting
as well as material editing tasks.
1. Introduction
The problem of digitally reproducing, editing, and
photo-realistically synthesizing an object’s 3D shape and
appearance is a fundamental research topic with many ap-
plications, ranging from virtual conferencing to augment
reality. Despite its usefulness, this topic is very challeng-
ing because of its inherently highly ill-posed nature and a
highly non-linear optimization process, due to the complex
interplay of shape, reflectance, and lighting [27] in the ob-
servations. Typical inverse-rendering approaches [5,10,13]
rely on either dedicated capture devices, active lighting, or
* This work was done during Xiu Li’s internship at Microsoft Re-
search Asia.
restrictive assumptions on target geometry and/or materials.
Recently, the pioneering work of NeRF [19] has shown
great advances in 3D reconstruction from a set of posed
multi-view images without additional setups. NeRF rep-
resents a radiance field for a given object using neural net-
work as an implicit function. A radiance field is suitable
for view synthesis but cannot support further manipulation
tasks due to its entanglement of reflectance and lighting.
To fully solve the inverse rendering problem and supports
manipulation, a more suitable representation is reflectance
field [4], [5], which represents shape, reflectance and light-
ing in a disentangled manner.
Given the surprisingly high reconstruction quality and
simple cature setup of neural radiance fields (i.e., NeRF), a
few recent works ( [4,6,24,31]) have been attempted ex-
tending neural representations to reflectance fields. Yet,
some of those methods still need additional inputs such
lighting information; other methods without additional in-
put requirements, are still struggling at fully resolving the
high-complexity of inverse-rendering optimization, produc-
ing noticeable artifacts and/or degenerated results. Thus, a
set of questions naturally come up: Can we really achieve
high quality estimation of reflectance fields with neural
representations? And if possible, what is the key for
an effective and robust estimation of neural reflectance
fields using only posed multi-view images with unknown
lighting?
In this paper, we provide positive answers to the two
questions, by proposing a new method that estimates a neu-
ral reflectance field for a given object from only a set of
multi-view images under unknown lighting. Inspired by
[31], we formulate this problem as a two-stage optimiza-
tion: An initial NeRF training stage and a NeRF decom-
position stage. The pre-trained NeRF gives a plausible ini-
tialization for object shape but reflectance properties and
lighting are still entangled. We then train a set of neural
networks to represent implicit fields of reflectance, surface
normal, and lighting visibility respectively. Fig. 1demon-
strates our idea. To avoid confusion, we will use NReF for
Neural Reflectance Field afterwards.
A key challenge of decomposing neural radiance field
into neural reflectance fields is to correctly extract geom-
etry information as priors from the pre-trained NeRF. Un-
1
arXiv:2210.04217v1 [cs.CV] 9 Oct 2022
Neural Radiance Field
(NeRF)
Neural Reflectance Field
(NReF)
Decomposition
Multi-view Posed Images Applications
Albedo
BRDF
Normal
Light
Visibility Free-view Relighting
Material Editing
Figure 1. Given a set of multi-view posed images of an object with unknown illumination only (left-top), we estimate a neural reflectance
field (NReF, mid-bottom) which decompose the neural radiance field (NeRF, left-bottom) of the object into fields of albedo, BRDF, surface
normals and light visibility. NReF enables photo-realistic 3D editing tasks such as material editing and object relighting (right).
like radiance fields that generate the final renderings with
volumetric integration, a reflectance field only computes its
rendering results on surface points of the corresponded ob-
ject. Thus, a robust and accurate surface point extraction
method is required for computing shading color and geom-
etry visibility terms. However, current surface point extrac-
tion method based on volumetric density integration, using
by most NeRF-based methods ( [19], [31]), often produces
too surface extraction results for a robust geometry initial-
ization, as we will shown later. To alleviate this problem,
we revisit their method and amend surface the point extrac-
tion process by proposing an effective strategy based on ray-
casting. To support fast point querying during training, we
construct an octree on densely sampled point cloud from
NeRF.
The second key challenge of high-quality neural re-
flectance field optimization under unknown lighting is to
resolve ambiguities due to its intrinsically ill-posed nature.
Previous image-based reflectance decomposition methods
( [2]) have shown that adding suitable smoothness and par-
simony prior terms is crucial to resolve shading/albedo am-
biguity. Our key observation is that, adding different type
of priors mentioned above during training, can be unified
as applying different type of filters on the geometry sur-
face. However, applying such filters are non-trivial for neu-
ral reflectace field as the surface are only defined with im-
plicit functions. To address this issue, we exploit the idea of
Gaussian KD-tree ( [1]) to efficiently compute a discrete
sampled approximation of all prior terms, and employ a
commitment loss to propagate the prior back into the im-
plicit fields. In this way, we are able to add suitable priors
for decomposing reflectance and shading and significantly
improving the quality of neural reflectance field estimation.
Based on the two auxiliary tree-based data structures,
we design an optimization pipeline with carefully consid-
erations on surface extraction, prior terms, and importance
sampling of lighting. Our pipeline enables the estimation of
high-quality neural reflectance fields with only multi-view
posed images under unknown lighting as input. We validate
and demonstrate the strength of our method with extensive
experiments on both synthetic and real data. We also ap-
ply our method to manipulation tasks such as relighting and
material editing.
To summarize, our contributions are as follows:
A novel approach for estimating reflectance field of 3D
objects using only multi-view posed images under un-
controlled, unknown lighting.
• A new method to extract surfaces point from pre-
trained radiance fields with reduced noise.
A dedicate designed optimization pipeline that decom-
poses a neural radiance fields into neural reflectance
fields to support manipulation tasks.
2. Related Works
Inverse Rendering. The task of inverse rendering is
to decompose an observed image of a given object into
geometry, appearance properties and lighting conditions,
such that the components follow the physical imaging
process. Since the decomposition is intrinsically an
ill-posed problem, most prior approaches address this
problem by adding strong assumptions on object shape
( [2,8,11,15,16,28]), exploiting additional information of
shape or lighting ( [5,9,24]), or designing dedicated devices
for controllable capturing ( [13,18]). Our method only
uses multi-view images as input and has less restriction on
shapes/materials.
Neural 3D Representations. Recently, the neural repre-
sentation of 3D scenes has attracted considerable attention
2
in the literature( [3,7,19,22,23]). These methods ex-
ploit multi-layer perceptrons to represent implicit fields
such as sign distance functions for surface or volumetric
radiance fields, known as Neural Fields. Our method
builds upon the neural radiance field (NeRF) for 3D
representation. NeRF [19] and its variants have surpassed
previous state-of-the-art methods on novel view synthesis
tasks; however, NeRF cannot support various editing
tasks because it models radiance fields as a “black-box”.
Our work takes one step further towards opening this
”black-box” by providing a method to decompose NeRF
into shape, reflectance and lighting, enabling editing tasks.
Some prior arts also attempt to model reflectance fields
with neural networks. NeRV [24] proposed a method
that estimates reflectance fields from multi-view images
with known lighting. Bi et el. [4] estimate reflectance
fields from images captured with a collocated camera-light
setup. Our method does not require lighting conditions
as prior information. NeRD [6] and PhySG [30] directly
solve reflectance fields from multi-view posed images
with unknown illumination. Both NeRD and PhySG do
not take light visibility into account and are unable to
simulate any lighting occlusion or shadowing effects. We
address this issue by modeling the light visibility field
in our decomposition. The most similar work to us is
NeRFactor [31] which also decomposes a reflectance field
from a pre-trained NeRF. A key drawback of NeRFactor is
their limited quality. Overall, NeRFactor tends to output
over smoothed normal, less disentangled albedo/shading,
and degenerated specular components. Our method greatly
improve the quality of neural reflectance field by improving
the surface point extraction, correctly handling dynamic
importance sampling, and adding additional priors. These
improvements cannot be trivially implemented without
our introducing of tree-based data structures and carefully
designed training strategies.
Data structures for neural representations. The octree
data structure have been used in several works to accel-
erate training and/or rendering of neural radiance fields (
[17], [29]). The method of Gaussian KD-Tree [1] has been
used for accelerating a broad class of non-linear filters that
includes the bilateral, non-local means, and other related
filters. Both data structures plays an important role in our
method during NReF training: the octree gives us the abil-
ity to query extracted surface points on the fly for comput-
ing geometric visibility terms, and the Gaussian KD-tree
enables us to apply different prior term in a unified way by
filtering high-dimensional features on object surfaces.
3. Method
Our goal is to estimate a neural reflectance field (NReF),
given only nmulti-view posed images {Ik|k= 1...n}with
unknown lighting as observations. A NReF f(x)represents
the shape, light, and reflectance properties of an object at
any 3D location xon its opaque surface. We parameterize
NReF with a set of multi-layer perceptron (MLP) networks
and solve the NReF estimation with a ‘NeRF decomposi-
tion’ approach. A NeRF MLP is first is trained with the
same set of inputs (section 3.1) and the initial surface ge-
ometry is extracted from it with a novel ray-casting based
approach, accelerated with octree (section 3.2). The decom-
position itself relies on a set of priors to resolve ambigui-
ties that are non-trivial to employ with neural implicit field
representations only. We address this issue with a Gaus-
sian KD-tree that converts priors into surface filtering oper-
ations (section 3.3). Finally, we introduce our multi-stage
NReF decomposition pipeline with implementation details
(section 3.4).
3.1. From Radiance to Reflectance
We begin by training a Neural Radiance Field (NeRF)
following the same procedure in [19]. In NeRF, the ren-
dered color C(r)of the camera ray r(t) = o+tdis gener-
ated by querying and blending the radiance Lo(ωv,r(t)) ac-
cording to the volume density value σ(r(t)) alongside r(t)
via
C(r) = Z
0
T (r(t))
t Lo(ωv,r(t))dt (1)
where
T(r(t)) = 1 exp Zt
0
σ(r(t))ds(2)
Here, ωv=d/kdkis the normalized view direction, and
T(r(t)) is the transmittance function. NeRF works well for
view synthesis since it already learned reasonable shape
via the volume density σ(t); however, it is not suitable for
other manipulations of shading effects because reflectance
and lighting are still entangled. To enable control over
those factors, we formulate a decomposition problem for
estimating NReF as follows.
Reflectance field formulation The relationship of radi-
ance, shape, reflectance, and lighting at surface point xfrom
direction ωvis given by the rendering equation ( [12]):
Lo(ωv,x) = Zfr(ωv,ωi,x)Li(ωi,x) max(fn(x)·ωi,0)dωi
(3)
where fr(·)is the Bidirectional Reflectance Distribution
Function (BRDF), Li(ωi,x)is the incident light at direc-
tion ωiand fn(·)is the surface normal. We further assume
light sources are far-field and decompose the lighting Li(·)
into a directional environment map L(ωi)and a light visi-
bility term fv(x, ωi),
Li(ωi,x) = fv(ωi,x)L(ωi)(4)
3
A straightforward way to estimate the NReF is by simply
inserting equ. 3into equ. 1and minimizing the render loss
with image observations. However, simultaneously esti-
mating all components of NReF from scratch is extremely
hard and unstable due to its ill-posedness nature, even under
known illumination conditions [24]. Fortunately, the NeRF
has already decomposed geometry information to some ex-
tent and we can extract an initial surface Sfrom it. Given
this, the rendering loss R(r)can be then greatly reduced to
first query the surface point xsand then evaluate equ. 3on
it:
R(r) = R(ωv,xs) = kI(r)Lo(ωv,xs)k2
2(5)
The numerical method for approximating the integral of
equation 3plays a crucial role during the optimization. Pre-
vious neural reflection field estimation method ( [31], [24])
approximate the integration with a pre-defined equirectan-
gular map of lighting directions. However, we argue that
this simple strategy is far from an optimal one ( [25]). In
particular, this sampling strategy is not only biased but also
gives significant noisy results with an affordable amount of
samples during training. Naively increasing number of sam-
ples leads to unacceptable memory and time cost. We ad-
dress this issue by following the standard importance sam-
pling strategy [25] in physical-based rendering field. The
importance sampling directions are calculated based on the
material roughness properties.
3.2. Extracting geometry priors from NeRF
Figure 2. The improvement of surface extraction. Left: surface
extraction result with equation 6produces many erratic scattered
points. Middle: A slightly improved version of equation 6by nor-
malizing weight still produces scattered points. Right: Our sur-
face extraction method with equation 7removes almost all outlier
points.
Surface points and normals The original NeRF suggests
extracting surface points along a single ray r(t)with its ex-
pected termination:
xs=Z
0
T (r(t))
t r(t)dt (6)
The surface normal at xscan be computed as the nega-
tive normalized gradient of NeRF’s density output σ(t)w.r.t
point positions via auto-differentiation [24] [31]. In prac-
tice, however, we observe that the surface and normal de-
rived from equ. 6is usually noisy and erratic, as shown in
fig 2and fig 5. The reason is that the density field from
NeRF actually tends to ‘fake’ glossy surfaces by creating
two or more small layers, thus naively blend them along ray
directions will create ‘fake’ floating points. The detailed
analysis of failure cases of equ. 6are given in the supple-
mentary material. To alleviate this, we employ an empirical
but effective strategy by simply finding the point xson the
ray that satisfies
T(r(s)) = T(r(tn)) + T(r(tf))
2(7)
where tnand tfare the ray-tracing bound. Unlike equ 6
which spreads floating points along the ray, extracting
points with equ 7will force the points distributed on one
of the surface layers. As T(·)is a monotonic function
by definition, there is always a unique solution for equ. 7
and we found it works well in practice, as shown in fig 2.
Given the surface point xs, we extract its normal directions
by averaging the density gradient around a small region
centered at xsweighted by its density value to further
reduce the normal noise.
Light visibility The light visibility at surface point xs
can be calculated as integrating the T(·)defined in equ.2
on the normal-directed hemisphere. Since we have built
an octree to store the density field on the surface points,
we again utilize it to compute the integration dynamically
with importance sampling strategy during the optimization.
Note that the octree not only supports light visibility query
on the fly, but also enables efficient depth estimation during
the rendering.
NeRF commitment During the decomposing process, the
surface normal and light visibility of NReF will be refined
by the render loss. Yet, the predicted normal and light vis-
ibility should not derive too much to that of NeRF. Thus,
we add a NeRF commitment loss to constrain the optimized
normal and visibility close to NeRF on the extracted surface
points,
Cfθ(xs) = kfθ(xs)F(xs)k2
2(8)
where fθ(·)and F(·)denote corresponding components
(normal or visibility) of NReF and NeRF respectively, θin-
dicates the dependence of network parameters of NReF.
3.3. Enforcing Smooth and Parsimony
For estimating reflectance under unknown illumination,
other priors are necessary. We employ two well-known
prior knowledge from the intrinsic decomposition litera-
ture [2]: the predicted normal, visibility and BRDF should
be locally smooth, and the albedo color should be globally
4
摘要:

EstimatingNeuralReectanceFieldfromRadianceFieldusingTreeStructuresXiuLi1;2XiaoLi1YanLu11MicrosoftResearchAsia2TencentAbstractWepresentanewmethodforestimatingtheNeuralRe-ectanceField(NReF)ofanobjectfromasetofposedmulti-viewimagesunderunknownlighting.NReFrepre-sents3Dgeometryandappearanceofobjectsi...

展开>> 收起<<
Estimating Neural Reflectance Field from Radiance Field using Tree Structures.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:8.04MB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注