Higher-order Monte Carlo through cubic stratification Nicolas Chopin1 Mathieu Gerber2

2025-05-06 0 0 833.81KB 35 页 10玖币
侵权投诉
Higher-order Monte Carlo through cubic
stratification
Nicolas Chopin(1), Mathieu Gerber(2)
(1) ENSAE, Institut Polytechnique de Paris, Paris, France
(2) School of Mathematics, University of Bristol, UK
We propose two novel unbiased estimators of the integral ´[0,1]sf(u)du for
a function f, which depend on a smoothness parameter rN. The first
estimator integrates exactly the polynomials of degrees p<rand achieves
the optimal error n1/2r/s (where nis the number of evaluations of f) when
fis rtimes continuously differentiable. The second estimator is also optimal
in term of convergence rate and has the advantage to be computationally
cheaper, but it is restricted to functions that vanish on the boundary of
[0,1]s. The construction of the two estimators relies on a combination of
cubic stratification and control variates based on numerical derivatives. We
provide numerical evidence that they show good performance even for mod-
erate values of n.
1. Introduction
1.1. Background
This paper is concerned with the construction of unbiased estimators of the integral
I(f) := ´[0,1]sf(u)du based on a certain number nof evaluations of f. The motivation
for this problem is well-known. Many quantities of interest in applied mathematics may
be expressed as such an integral. Providing random, unbiased approximations present
several practical advantages. First, it greatly facilitates the assessment of the numer-
ical error, through repeated runs. Second, such independent estimates may be gener-
ated in parallel, and then may be averaged to obtain a lower variance approximation of
I(f). Third, generating unbiased estimates as plug-in replacements is of interest in vari-
ous advanced Monte Carlo methodologies, such as pseudo-marginal sampling Andrieu
and Roberts (2009), stochastic approximation Robbins and Monro (1951) and stochastic
gradient descent. Finally, random integration algorithms converge at a faster rate than
deterministic ones Novak (1988) (but note that these convergence rates correspond to
different criteria).
1
arXiv:2210.01554v2 [stat.CO] 9 Jun 2023
The most basic and well-known stochastic integration rule is the crude Monte Carlo
method, where one simulates uniformly nindependent and identically distributed variates
Ui, and returns n1Pn
i=1 f(Ui)as an estimate of I(f). Assuming that fL2([0,1]s),
the root mean square error (RMSE) of this estimator converges to zero at rate n1/2.
In this paper we consider the problem of estimating I(f)under the additional condition
that all the partial derivatives of fof order less or equal to rexist and are continuous,
or, in short, that f∈ Cr([0,1]s). Under this assumption on fit is well-known that we
can improve upon the crude Monte Carlo error rate. More precisely, for f∈ Cr([0,1]s)
the optimal convergence rate for the RMSE of an estimate b
I(f)of I(f)based on n
evaluations of fis n1/2r/s, in the sense that if g:N[0,)is such that
f∈ Cr([0,1]s), n 1,Eh|b
I(f)− I(f)|2i1/2g(n)fr
(where fris a bound on the r-th order derivatives of f, see Section 1.4 for a proper
definition) then we must have n1/2r/s/g(n) = O(1) (this result can for instance be
obtained from Propositions 1-2 given in Section 2.2.4, page 55, of Novak (1988)).
Stochastic algorithms that achieve this optimal convergence rate for f∈ Cr([0,1]s)
have been proposed e.g. in Haber (1966) for r= 1 and in Haber (1967) for r∈ {1,2}. In
Haber (1969) it is shown that if b
I(·)is a stochastic quadrature (SQ) of degree r1, that
is, if E[b
I(f)] = I(f)for all fL1([0,1]s)and P(b
I(f) = I(f)) = 1 if fis a polynomial of
degree p<r, then b
I(·)can be used to define an estimator of I(f)whose RMSE converges
to zero at rate n1/2r/s when f∈ Cr([0,1]s). In Haber (1969) a formula for a SQ of
degree r1is given for r∈ {3,4}while, for s= 1, Siegel and O’Brien (1985) provides a
SQ of degree 2r+1 for all r1. For multivariate integration problems, and an arbitrary
value of r1, a SQ of degree r1can be constructed from the integration method
proposed in Ermakov and Zolotukhin (1960). However, the algorithm proposed in this
reference requires to perform a sampling task which is so computationally expensive that
it is considered as almost intractable Patterson (1987).
A related approach is derived by Dick in Dick (2011), which achieves rate O(n1/2α+ε)
for ε > 0and a certain class of functions indexed by α(which differs from Cr([0,1]s)
even when r=). We will go back to this point and compare our approach to Dick’s
in our numerical study.
1.2. Motivation and plan
The paper is structured as follows. We introduce in Section 2 an unbiased estimator of
I(f)which has the following three appealing properties when f∈ Cr([0,1]s). First, its
RMSE converges to zero at the optimal n1/2r/s rate. Second, it integrates exactly fif
fis a polynomial of degree p<r. Third, for some constant C < and with probability
one, the absolute value of its estimation error is bounded by Cnr/s, where nr/s is the
optimal convergence rate for a deterministic integration rule (this result can for instance
be obtained from Proposition 1.3.5, page 28, of Novak (1988)). In addition, we establish
a central limit theorem (CLT) for a particular version of the proposed estimator. To the
best of our knowledge, a CLT for an estimator of I(f)having an RMSE that converges
2
at the optimal rate when f∈ Cr([0,1]s)exists only for r= 1 (see Bardenet and Hardy
(2020)).
In Section 3, we focus our attention on the estimation of I(f)when f∈ Cr
0([0,1]s),
where we define Cr
0([0,1]s)as the set of functions in Cr([0,1]s)whose partial derivatives of
order orare all equal to zero on the boundary of [0,1]s. As we explain in that section,
this set-up is particularly relevant for solving integration problems on Rs. Restricting
our attention to Cr
0([0,1]s)⊂ Cr([0,1]s)allows us to derive an estimator of I(f), referred
to as the vanishing estimator in what follows, which is computationally cheaper than the
previous estimator, while retaining its convergence properties, namely an RMSE of size
O(n1/2r/s)and an actual error of size O(nr/s)almost surely. We note that these
convergence rates are optimal for integrating a function in Cr
0([0,1]s)(again, see Sections
1.3.5 and 2.2.4 of Novak (1988)) and that an algorithm considering a similar class of
functions is proposed in Krieg and Novak (2017). The algorithm derived in this latter
reference has the advantage to achieve the optimal aforementioned convergence rates for
any rNbut its implementation at reasonable computational cost remains an open
problem.
Section 4 discusses some practical details about the proposed estimators, regarding on
how their variance may be estimated and how the order of the vanishing estimator may
be selected automatically. Section 5 presents numerical experiments which confirm that
the estimators converge at the expected rates, and show that they are already practical
for moderate values of n. Section 6 discusses future work. Proofs of certain technical
lemmas are deferred to Appendix C.
1.3. Connection with function approximation
As noted by e.g. Novak (2016), there is a strong connection between (unbiased) integra-
tion and function approximation. If one is able to construct an optimal approximation
An(f)of f∈ Cr([0,1]s), that is f− An(f)=O(nr/s)(see Novak (1988), page 36)
then one may derive the following unbiased estimate of I(f)
b
I(f) := I(An(f)) + 1
n
n
X
i=1
(f− An(f)) (Ui), Ui
iid
∼ U([0,1]s)(1)
which is also optimal, in the sense that its RMSE is O(n1/2r/s)for estimating I(f).
The (non-vanishing) estimator proposed in this paper for integrating a function f
Cr([0,1]s)is to some extent related to this idea, with An(f)a piecewise polynomial
approximation of fbased on local Taylor expansions in which the partial derivatives of
fare approximated using numerical differentiation techniques. Note however that we
use stratified random variables, rather than independent and identically distributed ones.
This makes the estimator easier to compute, and reduces its variance.
1.4. Notation regarding derivatives and Taylor expansions
Let Nbe the set of positive integers, and N0=N∪ {0}. For αNs
0, let |α|0=
sPs
j=1 1{0}(αj),|α|=Ps
i=1 α,α! = Qs
i=1 αi!,uα=Qs
i=1 uαi
ifor uRs. For
3
g∈ Cr([0,1]s)we let Dαg: [0,1]sRbe defined by
Dαg(u) = |α|
uα1
1. . . ∂uαs
s
g(u), u [0,1]s,
with the convention Dαg=gif |α|= 0, and we let gr:= maxα:|α|=rDαg.
With this notation in place, we recall that if g∈ Cr([0,1]s)then, by Taylor’s theorem,
there exists a function Rg,r : [0,1]s×[0,1]sRsuch that (Loomis and Sternberg (1968),
Section 3.17, page 191)
g(v) =
r1
X
l=0 X
α:|α|=l
(vu)αDαg(u)
α!+Rg,r(u, v),u, v [0,1]s(2)
where, for some τu,v [0,1],
Rg,r(u, v) = X
α:|α|=r
Dαg(u+τu,v(vu))
α!(vu)α.(3)
1.5. Notation related to stratification
Throughout the paper, f: [0,1]sRand s1. Our approach relies on stratifying
[0,1]sinto ksclosed hyper-cubes, k2, and performing a certain number lof evaluations
of finside each hyper-cube; see Figure 1. The total number of evaluations is therefore
something like n=lks, but with a value for lthat depends on the considered estimator
and other parameters such as r. Thus, we will index the proposed estimators by k, e.g
b
Ik(f)(or b
Ir,k(f)when it also depends on r) rather than n. We will provide the exact
expression of nalongside the definition of the considered estimator.
For cRsand k1, we use the short-hand Bk(c)for the hyper-cube Qs
i=1[ci
1/2k, ci+ 1/2k]; in other words, the ball with radius 1/2kand centre rwith respect to
the maximum norm.
For mN0let
Cm,k =2j1+ 1
2k,...,2js+ 1
2ks.t. (j1, . . . , js)∈ {−m, . . . , k +m1}s(4)
be the set of the centres of the (k+2m)shypercubes Bk(c)whose union is equal to the set
Sm,k := [m/k, 1+m/k]s. In Section 2, we will set m= 0 and recover the aforementioned
stratification; in that case, we will use the short-hand Ck:= C0,k. However, in order to
define the second (vanishing) estimator in Section 3, we shall take m0.
To each cCm,k (with m, again, fixed and determined by the context), we associate
a random variable Ucsuch that
Uc∼ U 1
2k,1
2ks.
Notice that the support of c+Ucis Bk(c).
4
Figure 1: Stratification of [0,1]swhen s= 2 and k= 5, and two evaluations are performed
in each of the ks= 25 squares. The location of the points are generated as in
Haber’s second estimator, which we discuss in Section 2.1.
2. Integration of functions in Cr([0,1]s)
2.1. Preliminaries: Haber’s estimators
In Haber (1966) Haber introduced the following estimator:
b
I1,k(f) := 1
ksX
cCk
f(c+Uc), Uc∼ U 1
2k,1
2ks(5)
based on n=ksevaluations of f, which is optimal for r= 1; i.e. its RMSE is
O(n1/21/s)provided f∈ C1([0,1]s). To establish this result, note that each term
f(c+Uc)has expectation ks´Bk(c)f(u)duand variance O(n2/s), since |f(u)f(v)|=
O(k1) = O(n1/s)for u, v Bk(c).
We note in passing that an alternative, and closely related, estimator may be obtained
by approximating fwith the piecewise constant function fndefined by
fn(u) = X
cCk
f(c)1Bk(c)(u), u [0,1]s
and using that particular fnin (1). Since ffn=O(n1/s)when f∈ C1([0,1]s),
this alternative estimator is indeed optimal for r= 1. The estimator defined in (5) is
however slightly more convenient to compute, and relies on only nevaluations (versus
2nfor the alternative estimator).
5
摘要:

Higher-orderMonteCarlothroughcubicstratificationNicolasChopin(1),MathieuGerber(2)(1)ENSAE,InstitutPolytechniquedeParis,Paris,France(2)SchoolofMathematics,UniversityofBristol,UKWeproposetwonovelunbiasedestimatorsoftheintegral´[0,1]sf(u)duforafunctionf,whichdependonasmoothnessparameterr∈N.Thefirstesti...

展开>> 收起<<
Higher-order Monte Carlo through cubic stratification Nicolas Chopin1 Mathieu Gerber2.pdf

共35页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:35 页 大小:833.81KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 35
客服
关注