
in the literature( [3,7,19,22,23]). These methods ex-
ploit multi-layer perceptrons to represent implicit fields
such as sign distance functions for surface or volumetric
radiance fields, known as Neural Fields. Our method
builds upon the neural radiance field (NeRF) for 3D
representation. NeRF [19] and its variants have surpassed
previous state-of-the-art methods on novel view synthesis
tasks; however, NeRF cannot support various editing
tasks because it models radiance fields as a “black-box”.
Our work takes one step further towards opening this
”black-box” by providing a method to decompose NeRF
into shape, reflectance and lighting, enabling editing tasks.
Some prior arts also attempt to model reflectance fields
with neural networks. NeRV [24] proposed a method
that estimates reflectance fields from multi-view images
with known lighting. Bi et el. [4] estimate reflectance
fields from images captured with a collocated camera-light
setup. Our method does not require lighting conditions
as prior information. NeRD [6] and PhySG [30] directly
solve reflectance fields from multi-view posed images
with unknown illumination. Both NeRD and PhySG do
not take light visibility into account and are unable to
simulate any lighting occlusion or shadowing effects. We
address this issue by modeling the light visibility field
in our decomposition. The most similar work to us is
NeRFactor [31] which also decomposes a reflectance field
from a pre-trained NeRF. A key drawback of NeRFactor is
their limited quality. Overall, NeRFactor tends to output
over smoothed normal, less disentangled albedo/shading,
and degenerated specular components. Our method greatly
improve the quality of neural reflectance field by improving
the surface point extraction, correctly handling dynamic
importance sampling, and adding additional priors. These
improvements cannot be trivially implemented without
our introducing of tree-based data structures and carefully
designed training strategies.
Data structures for neural representations. The octree
data structure have been used in several works to accel-
erate training and/or rendering of neural radiance fields (
[17], [29]). The method of Gaussian KD-Tree [1] has been
used for accelerating a broad class of non-linear filters that
includes the bilateral, non-local means, and other related
filters. Both data structures plays an important role in our
method during NReF training: the octree gives us the abil-
ity to query extracted surface points on the fly for comput-
ing geometric visibility terms, and the Gaussian KD-tree
enables us to apply different prior term in a unified way by
filtering high-dimensional features on object surfaces.
3. Method
Our goal is to estimate a neural reflectance field (NReF),
given only nmulti-view posed images {Ik|k= 1...n}with
unknown lighting as observations. A NReF f(x)represents
the shape, light, and reflectance properties of an object at
any 3D location xon its opaque surface. We parameterize
NReF with a set of multi-layer perceptron (MLP) networks
and solve the NReF estimation with a ‘NeRF decomposi-
tion’ approach. A NeRF MLP is first is trained with the
same set of inputs (section 3.1) and the initial surface ge-
ometry is extracted from it with a novel ray-casting based
approach, accelerated with octree (section 3.2). The decom-
position itself relies on a set of priors to resolve ambigui-
ties that are non-trivial to employ with neural implicit field
representations only. We address this issue with a Gaus-
sian KD-tree that converts priors into surface filtering oper-
ations (section 3.3). Finally, we introduce our multi-stage
NReF decomposition pipeline with implementation details
(section 3.4).
3.1. From Radiance to Reflectance
We begin by training a Neural Radiance Field (NeRF)
following the same procedure in [19]. In NeRF, the ren-
dered color C(r)of the camera ray r(t) = o+tdis gener-
ated by querying and blending the radiance Lo(ωv,r(t)) ac-
cording to the volume density value σ(r(t)) alongside r(t)
via
C(r) = Z∞
0
∂T (r(t))
∂t Lo(ωv,r(t))dt (1)
where
T(r(t)) = 1 −exp −Zt
0
σ(r(t))ds(2)
Here, ωv=−d/kdkis the normalized view direction, and
T(r(t)) is the transmittance function. NeRF works well for
view synthesis since it already learned reasonable shape
via the volume density σ(t); however, it is not suitable for
other manipulations of shading effects because reflectance
and lighting are still entangled. To enable control over
those factors, we formulate a decomposition problem for
estimating NReF as follows.
Reflectance field formulation The relationship of radi-
ance, shape, reflectance, and lighting at surface point xfrom
direction ωvis given by the rendering equation ( [12]):
Lo(ωv,x) = Zfr(ωv,ωi,x)Li(ωi,x) max(fn(x)·ωi,0)dωi
(3)
where fr(·)is the Bidirectional Reflectance Distribution
Function (BRDF), Li(ωi,x)is the incident light at direc-
tion ωiand fn(·)is the surface normal. We further assume
light sources are far-field and decompose the lighting Li(·)
into a directional environment map L(ωi)and a light visi-
bility term fv(x, ωi),
Li(ωi,x) = fv(ωi,x)L(ωi)(4)
3