Estimating Neural Reﬂectance Field from Radiance Field using Tree Structures

2025-04-22 1 0 8.04MB 16 页 10玖币

侵权投诉

Xiu Li1,2∗Xiao Li1Yan Lu1

1Microsoft Research Asia 2Tencent

Abstract

We present a new method for estimating the Neural Re-

ﬂectance Field (NReF) of an object from a set of posed

multi-view images under unknown lighting. NReF repre-

sents 3D geometry and appearance of objects in a disentan-

gled manner, and are hard to be estimated from images only.

Our method solve this problem by exploiting the Neural Ra-

diance Field (NeRF) as a proxy representation, from which

we perform further decomposition. A high-quality NeRF

decomposition relies on good geometry information extrac-

tion as well as good prior terms to properly resolve ambigu-

ities between different components. To extract high-quality

geometry information from radiance ﬁelds, we re-design a

new ray-casting based method for surface point extraction.

To efﬁciently compute and apply prior terms, we convert dif-

ferent prior terms into different type of ﬁlter operations on

the surface extracted from radiance ﬁeld. We then employ

two type of auxiliary data structures, namely Gaussian KD-

tree and octree, to support fast querying of surface points

and efﬁcient computation of surface ﬁlters during training.

Based on this, we design a multi-stage decomposition op-

timization pipeline for estimating neural reﬂectance ﬁeld

from neural radiance ﬁelds. Extensive experiments show

our method outperforms other state-of-the-art methods on

different data, and enable high-quality free-view relighting

as well as material editing tasks.

1. Introduction

The problem of digitally reproducing, editing, and

photo-realistically synthesizing an object’s 3D shape and

appearance is a fundamental research topic with many ap-

plications, ranging from virtual conferencing to augment

reality. Despite its usefulness, this topic is very challeng-

ing because of its inherently highly ill-posed nature and a

highly non-linear optimization process, due to the complex

interplay of shape, reﬂectance, and lighting [27] in the ob-

servations. Typical inverse-rendering approaches [5,10,13]

rely on either dedicated capture devices, active lighting, or

* This work was done during Xiu Li’s internship at Microsoft Re-

search Asia.

restrictive assumptions on target geometry and/or materials.

Recently, the pioneering work of NeRF [19] has shown

great advances in 3D reconstruction from a set of posed

multi-view images without additional setups. NeRF rep-

resents a radiance ﬁeld for a given object using neural net-

work as an implicit function. A radiance ﬁeld is suitable

for view synthesis but cannot support further manipulation

tasks due to its entanglement of reﬂectance and lighting.

To fully solve the inverse rendering problem and supports

manipulation, a more suitable representation is reﬂectance

ﬁeld [4], [5], which represents shape, reﬂectance and light-

ing in a disentangled manner.

Given the surprisingly high reconstruction quality and

simple cature setup of neural radiance ﬁelds (i.e., NeRF), a

few recent works ( [4,6,24,31]) have been attempted ex-

tending neural representations to reﬂectance ﬁelds. Yet,

some of those methods still need additional inputs such

lighting information; other methods without additional in-

put requirements, are still struggling at fully resolving the

high-complexity of inverse-rendering optimization, produc-

ing noticeable artifacts and/or degenerated results. Thus, a

set of questions naturally come up: Can we really achieve

high quality estimation of reﬂectance ﬁelds with neural

representations? And if possible, what is the key for

an effective and robust estimation of neural reﬂectance

ﬁelds using only posed multi-view images with unknown

lighting?

In this paper, we provide positive answers to the two

questions, by proposing a new method that estimates a neu-

ral reﬂectance ﬁeld for a given object from only a set of

multi-view images under unknown lighting. Inspired by

[31], we formulate this problem as a two-stage optimiza-

tion: An initial NeRF training stage and a NeRF decom-

position stage. The pre-trained NeRF gives a plausible ini-

tialization for object shape but reﬂectance properties and

lighting are still entangled. We then train a set of neural

networks to represent implicit ﬁelds of reﬂectance, surface

normal, and lighting visibility respectively. Fig. 1demon-

strates our idea. To avoid confusion, we will use NReF for

Neural Reﬂectance Field afterwards.

A key challenge of decomposing neural radiance ﬁeld

into neural reﬂectance ﬁelds is to correctly extract geom-

etry information as priors from the pre-trained NeRF. Un-

arXiv:2210.04217v1 [cs.CV] 9 Oct 2022

Neural Radiance Field

(NeRF)

Neural Reflectance Field

(NReF)

Decomposition

Multi-view Posed Images …Applications

Albedo

BRDF

Normal

Light

Visibility Free-view Relighting

Material Editing

Figure 1. Given a set of multi-view posed images of an object with unknown illumination only (left-top), we estimate a neural reﬂectance

ﬁeld (NReF, mid-bottom) which decompose the neural radiance ﬁeld (NeRF, left-bottom) of the object into ﬁelds of albedo, BRDF, surface

normals and light visibility. NReF enables photo-realistic 3D editing tasks such as material editing and object relighting (right).

like radiance ﬁelds that generate the ﬁnal renderings with

volumetric integration, a reﬂectance ﬁeld only computes its

rendering results on surface points of the corresponded ob-

ject. Thus, a robust and accurate surface point extraction

method is required for computing shading color and geom-

etry visibility terms. However, current surface point extrac-

tion method based on volumetric density integration, using

by most NeRF-based methods ( [19], [31]), often produces

too surface extraction results for a robust geometry initial-

ization, as we will shown later. To alleviate this problem,

we revisit their method and amend surface the point extrac-

tion process by proposing an effective strategy based on ray-

casting. To support fast point querying during training, we

construct an octree on densely sampled point cloud from

NeRF.

The second key challenge of high-quality neural re-

ﬂectance ﬁeld optimization under unknown lighting is to

resolve ambiguities due to its intrinsically ill-posed nature.

Previous image-based reﬂectance decomposition methods

( [2]) have shown that adding suitable smoothness and par-

simony prior terms is crucial to resolve shading/albedo am-

biguity. Our key observation is that, adding different type

of priors mentioned above during training, can be uniﬁed

as applying different type of ﬁlters on the geometry sur-

face. However, applying such ﬁlters are non-trivial for neu-

ral reﬂectace ﬁeld as the surface are only deﬁned with im-

plicit functions. To address this issue, we exploit the idea of

Gaussian KD-tree ( [1]) to efﬁciently compute a discrete

sampled approximation of all prior terms, and employ a

commitment loss to propagate the prior back into the im-

plicit ﬁelds. In this way, we are able to add suitable priors

for decomposing reﬂectance and shading and signiﬁcantly

improving the quality of neural reﬂectance ﬁeld estimation.

Based on the two auxiliary tree-based data structures,

we design an optimization pipeline with carefully consid-

erations on surface extraction, prior terms, and importance

sampling of lighting. Our pipeline enables the estimation of

high-quality neural reﬂectance ﬁelds with only multi-view

posed images under unknown lighting as input. We validate

and demonstrate the strength of our method with extensive

experiments on both synthetic and real data. We also ap-

ply our method to manipulation tasks such as relighting and

material editing.

To summarize, our contributions are as follows:

• A novel approach for estimating reﬂectance ﬁeld of 3D

objects using only multi-view posed images under un-

controlled, unknown lighting.

• A new method to extract surfaces point from pre-

trained radiance ﬁelds with reduced noise.

• A dedicate designed optimization pipeline that decom-

poses a neural radiance ﬁelds into neural reﬂectance

ﬁelds to support manipulation tasks.

2. Related Works

Inverse Rendering. The task of inverse rendering is

to decompose an observed image of a given object into

geometry, appearance properties and lighting conditions,

such that the components follow the physical imaging

process. Since the decomposition is intrinsically an

ill-posed problem, most prior approaches address this

problem by adding strong assumptions on object shape

( [2,8,11,15,16,28]), exploiting additional information of

shape or lighting ( [5,9,24]), or designing dedicated devices

for controllable capturing ( [13,18]). Our method only

uses multi-view images as input and has less restriction on

shapes/materials.

Neural 3D Representations. Recently, the neural repre-

sentation of 3D scenes has attracted considerable attention

in the literature( [3,7,19,22,23]). These methods ex-

ploit multi-layer perceptrons to represent implicit ﬁelds

such as sign distance functions for surface or volumetric

radiance ﬁelds, known as Neural Fields. Our method

builds upon the neural radiance ﬁeld (NeRF) for 3D

representation. NeRF [19] and its variants have surpassed

previous state-of-the-art methods on novel view synthesis

tasks; however, NeRF cannot support various editing

tasks because it models radiance ﬁelds as a “black-box”.

Our work takes one step further towards opening this

”black-box” by providing a method to decompose NeRF

into shape, reﬂectance and lighting, enabling editing tasks.

Some prior arts also attempt to model reﬂectance ﬁelds

with neural networks. NeRV [24] proposed a method

that estimates reﬂectance ﬁelds from multi-view images

with known lighting. Bi et el. [4] estimate reﬂectance

ﬁelds from images captured with a collocated camera-light

setup. Our method does not require lighting conditions

as prior information. NeRD [6] and PhySG [30] directly

solve reﬂectance ﬁelds from multi-view posed images

with unknown illumination. Both NeRD and PhySG do

not take light visibility into account and are unable to

simulate any lighting occlusion or shadowing effects. We

address this issue by modeling the light visibility ﬁeld

in our decomposition. The most similar work to us is

NeRFactor [31] which also decomposes a reﬂectance ﬁeld

from a pre-trained NeRF. A key drawback of NeRFactor is

their limited quality. Overall, NeRFactor tends to output

over smoothed normal, less disentangled albedo/shading,

and degenerated specular components. Our method greatly

improve the quality of neural reﬂectance ﬁeld by improving

the surface point extraction, correctly handling dynamic

importance sampling, and adding additional priors. These

improvements cannot be trivially implemented without

our introducing of tree-based data structures and carefully

designed training strategies.

Data structures for neural representations. The octree

data structure have been used in several works to accel-

erate training and/or rendering of neural radiance ﬁelds (

[17], [29]). The method of Gaussian KD-Tree [1] has been

used for accelerating a broad class of non-linear ﬁlters that

includes the bilateral, non-local means, and other related

ﬁlters. Both data structures plays an important role in our

method during NReF training: the octree gives us the abil-

ity to query extracted surface points on the ﬂy for comput-

ing geometric visibility terms, and the Gaussian KD-tree

enables us to apply different prior term in a uniﬁed way by

ﬁltering high-dimensional features on object surfaces.

3. Method

Our goal is to estimate a neural reﬂectance ﬁeld (NReF),

given only nmulti-view posed images {Ik|k= 1...n}with

unknown lighting as observations. A NReF f(x)represents

the shape, light, and reﬂectance properties of an object at

any 3D location xon its opaque surface. We parameterize

NReF with a set of multi-layer perceptron (MLP) networks

and solve the NReF estimation with a ‘NeRF decomposi-

tion’ approach. A NeRF MLP is ﬁrst is trained with the

same set of inputs (section 3.1) and the initial surface ge-

ometry is extracted from it with a novel ray-casting based

approach, accelerated with octree (section 3.2). The decom-

position itself relies on a set of priors to resolve ambigui-

ties that are non-trivial to employ with neural implicit ﬁeld

representations only. We address this issue with a Gaus-

sian KD-tree that converts priors into surface ﬁltering oper-

ations (section 3.3). Finally, we introduce our multi-stage

NReF decomposition pipeline with implementation details

(section 3.4).

3.1. From Radiance to Reﬂectance

We begin by training a Neural Radiance Field (NeRF)

following the same procedure in [19]. In NeRF, the ren-

dered color C(r)of the camera ray r(t) = o+tdis gener-

ated by querying and blending the radiance Lo(ωv,r(t)) ac-

cording to the volume density value σ(r(t)) alongside r(t)

via

C(r) = Z∞

∂T (r(t))

∂t Lo(ωv,r(t))dt (1)

where

T(r(t)) = 1 −exp −Zt

σ(r(t))ds(2)

Here, ωv=−d/kdkis the normalized view direction, and

T(r(t)) is the transmittance function. NeRF works well for

view synthesis since it already learned reasonable shape

via the volume density σ(t); however, it is not suitable for

other manipulations of shading effects because reﬂectance

and lighting are still entangled. To enable control over

those factors, we formulate a decomposition problem for

estimating NReF as follows.

Reﬂectance ﬁeld formulation The relationship of radi-

ance, shape, reﬂectance, and lighting at surface point xfrom

direction ωvis given by the rendering equation ( [12]):

Lo(ωv,x) = Zfr(ωv,ωi,x)Li(ωi,x) max(fn(x)·ωi,0)dωi

(3)

where fr(·)is the Bidirectional Reﬂectance Distribution

Function (BRDF), Li(ωi,x)is the incident light at direc-

tion ωiand fn(·)is the surface normal. We further assume

light sources are far-ﬁeld and decompose the lighting Li(·)

into a directional environment map L(ωi)and a light visi-

bility term fv(x, ωi),

Li(ωi,x) = fv(ωi,x)L(ωi)(4)

A straightforward way to estimate the NReF is by simply

inserting equ. 3into equ. 1and minimizing the render loss

with image observations. However, simultaneously esti-

mating all components of NReF from scratch is extremely

hard and unstable due to its ill-posedness nature, even under

known illumination conditions [24]. Fortunately, the NeRF

has already decomposed geometry information to some ex-

tent and we can extract an initial surface Sfrom it. Given

this, the rendering loss R(r)can be then greatly reduced to

ﬁrst query the surface point xsand then evaluate equ. 3on

it:

R(r) = R(ωv,xs) = kI(r)−Lo(ωv,xs)k2

2(5)

The numerical method for approximating the integral of

equation 3plays a crucial role during the optimization. Pre-

vious neural reﬂection ﬁeld estimation method ( [31], [24])

approximate the integration with a pre-deﬁned equirectan-

gular map of lighting directions. However, we argue that

this simple strategy is far from an optimal one ( [25]). In

particular, this sampling strategy is not only biased but also

gives signiﬁcant noisy results with an affordable amount of

samples during training. Naively increasing number of sam-

ples leads to unacceptable memory and time cost. We ad-

dress this issue by following the standard importance sam-

pling strategy [25] in physical-based rendering ﬁeld. The

importance sampling directions are calculated based on the

material roughness properties.

3.2. Extracting geometry priors from NeRF

Figure 2. The improvement of surface extraction. Left: surface

extraction result with equation 6produces many erratic scattered

points. Middle: A slightly improved version of equation 6by nor-

malizing weight still produces scattered points. Right: Our sur-

face extraction method with equation 7removes almost all outlier

points.

Surface points and normals The original NeRF suggests

extracting surface points along a single ray r(t)with its ex-

pected termination:

xs=Z∞

∂T (r(t))

∂t r(t)dt (6)

The surface normal at xscan be computed as the nega-

tive normalized gradient of NeRF’s density output σ(t)w.r.t

point positions via auto-differentiation [24] [31]. In prac-

tice, however, we observe that the surface and normal de-

rived from equ. 6is usually noisy and erratic, as shown in

ﬁg 2and ﬁg 5. The reason is that the density ﬁeld from

NeRF actually tends to ‘fake’ glossy surfaces by creating

two or more small layers, thus naively blend them along ray

directions will create ‘fake’ ﬂoating points. The detailed

analysis of failure cases of equ. 6are given in the supple-

mentary material. To alleviate this, we employ an empirical

but effective strategy by simply ﬁnding the point xson the

ray that satisﬁes

T(r(s)) = T(r(tn)) + T(r(tf))

2(7)

where tnand tfare the ray-tracing bound. Unlike equ 6

which spreads ﬂoating points along the ray, extracting

points with equ 7will force the points distributed on one

of the surface layers. As T(·)is a monotonic function

by deﬁnition, there is always a unique solution for equ. 7

and we found it works well in practice, as shown in ﬁg 2.

Given the surface point xs, we extract its normal directions

by averaging the density gradient around a small region

centered at xsweighted by its density value to further

reduce the normal noise.

Light visibility The light visibility at surface point xs

can be calculated as integrating the T(·)deﬁned in equ.2

on the normal-directed hemisphere. Since we have built

an octree to store the density ﬁeld on the surface points,

we again utilize it to compute the integration dynamically

with importance sampling strategy during the optimization.

Note that the octree not only supports light visibility query

on the ﬂy, but also enables efﬁcient depth estimation during

the rendering.

NeRF commitment During the decomposing process, the

surface normal and light visibility of NReF will be reﬁned

by the render loss. Yet, the predicted normal and light vis-

ibility should not derive too much to that of NeRF. Thus,

we add a NeRF commitment loss to constrain the optimized

normal and visibility close to NeRF on the extracted surface

points,

Cfθ(xs) = kfθ(xs)−F(xs)k2

2(8)

where fθ(·)and F(·)denote corresponding components

(normal or visibility) of NReF and NeRF respectively, θin-

dicates the dependence of network parameters of NReF.

3.3. Enforcing Smooth and Parsimony

For estimating reﬂectance under unknown illumination,

other priors are necessary. We employ two well-known

prior knowledge from the intrinsic decomposition litera-

ture [2]: the predicted normal, visibility and BRDF should

be locally smooth, and the albedo color should be globally

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EstimatingNeuralReectanceFieldfromRadianceFieldusingTreeStructuresXiuLi1;2XiaoLi1YanLu11MicrosoftResearchAsia2TencentAbstractWepresentanewmethodforestimatingtheNeuralRe-ectanceField(NReF)ofanobjectfromasetofposedmulti-viewimagesunderunknownlighting.NReFrepre-sents3Dgeometryandappearanceofobjectsi...

展开>> 收起<<

Estimating Neural Reﬂectance Field from Radiance Field using Tree Structures.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Estimating Neural Reﬂectance Field from Radiance Field using Tree Structures

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: