Tight relative estimation in the mean of Bernoulli random variables

2025-05-06 0 0 171.58KB 12 页 10玖币

侵权投诉

arXiv:2210.12861v1 [cs.LG] 23 Oct 2022

Mark Huber

Abstract Given a stream of Bernoulli random variables, consider the problem of estimating the mean of

the random variable within a speciﬁed relative error with a speciﬁed probability of failure. Until now, the

Gamma Bernoulli Approximation Scheme (GBAS) was the method that accomplished this goal using the

smallest number of average samples. In this work, a new method is introduced that is faster when the mean

is bounded away from zero. The process uses a two-stage process together with some simple inequalities to

get rigorous bounds on the error probability.

1 Introduction

Suppose that B1, B2,...are a stream of independent, identically distributed (iid) random variables that have

a Bernoulli distribution. That is, for each i,P(Bi= 1) = pand P(Bi= 0) = 1 −p. Consider the problem of

estimating p.

This is one of the oldest problems in probability, and was the inspiration for the ﬁrst Law of Large Numbers

of Jacob Bernoulli [4]. There are numerous modern applications, including network analysis [2], testing

sequence association [3], testing if a population is in Hardy-Weinburg equilibrium [10], ray tracing [15], and

many others [7, 1, 13, 8, 14].

The basic algorithm simply generates a ﬁxed number of Bivalues, say n, and then estimates pusing the

sample average ˆpn= (B1+···+Bn)/n. While both simple and unbiased, this method fails to have any

guarantees on the relative error of the estimate. Taking a random number of samples can help create an

estimate with such a guarantee.

The goal here will be, given parameters ǫ > 0 and δ∈(0,1), ﬁnd an integrable stopping time Twith respect

to the natural ﬁltration and an estimate

ˆp= ˆp(B1, B2,...,BT) (1)

such that the relative error is greater than ǫwith probability at most δ. That is,

P



ˆp

p−1



> ǫ≤δ. (2)

This is also known as a randomized approximation scheme, or ras for short.

The time to ﬁnd the estimate is dominated by the time needed to generate B1,...,BT. Hence the running

time of the procedure is taken to be E(T), the average number of samples needed by the ras.

Returning to the sample average for the moment, the variance of ˆpnis vn=p(1 −p)/n. The standard

deviation is then √vn. To get the standard deviation to be ǫp,n= (1 −p)ǫ−2p−1samples are necessary.

Using this together with the mean of medians technique [11, 12], it is possible to get an ras for this sample

average using

Θ((1 −p)p−1ǫ−2ln(δ−1)) (3)

samples. Of course, here the Big-Theta notation (Θ) is hiding the constant involved.

While the Central Limit Theorem (because it is only an asymptotic result) does not yield an ras, it does

give a rough estimate of the number of samples required. A CLT approximation indicates that about

2(1 −p)p−1ǫ−2ln(2δ−1) samples should be necessary and suﬃcient to obtain an (ǫ, δ)-ras.

This is a problem, because the number of samples needed has a factor of (1 −p)/p, and pis the very thing

being estimated! Dagum, Karp, Luby, and Ross [5] found an elegant solution to this diﬃculty through an

algorithm that will be called DKLR throughout this work.

Their idea was to use the Bernoulli samples to create iid geometric random variables. Say that G∼Geo(p)

if for all i∈ {1,2,...},P(G=i) = p(1 −p)i−1. Then it is straightforward to see that for

R= inf{r:B1+···+Br= 1},(4)

then R∼Geo(p).

The mean of Ris 1/p, and the variance is (1 −p)/p2. Hence for the sample average of kiid draws R1,...,Rk

with the same distribution as R,R1+···+Rk

k(5)

has mean 1/p and standard deviation (1 −p)1/2/[k1/2p].

That is, the standard deviation is already of the same order as the mean being estimated! The DKLR

algorithm then runs as follows.

DKLR

1. Given ǫand δ, set kequal to

⌈1 + (1 + ǫ)4(e−2)ǫ−2ln(2δ−1)⌉.

2. Use the Bernoulli stream to generate R1,...,Rkiid Geo(p).

3. Return

ˆpDKLR =k−1

R1+···+Rk

The choice of kin the ﬁrst line comes from [5], and is based on their analysis of the value needed to obtain

an (ǫ, δ)-ras. The use of k−1 in the numerator of the ﬁnal line is a change from their algorithm, which used

k. The k−1 used here makes this algorithm more easily comparable to the improvement on the algorithm,

GBAS, that is described later on in this section.

While this was the ﬁrst ras for this problem, there are some drawbacks. First, the constant of 4(e−2) ≈

2.873 ...is not close to the constant of 2 predicted by the CLT. Second, it does not take advantage of second

order eﬀects, that is, the tails of the geometric random variable decline in a similar fashion to normals,

allowing for second order reductions in the run time that are signiﬁcant. Third, it is biased.

To solve these issues, the author introduced the Gamma Bernoulli Approximation Scheme (GBAS) [9]. This

algorithm took the reduction of Bernoullis to Geometrics one step further. An easy to show fact about

Geometric random variables is the following. Suppose that Ais an exponentially distributed random variable

with rate λ > 0 so that for all a > 0, it holds that P(A > a) = exp(−λa). Write A∼Exp(λ).

Theorem 1.1. If G∼Geo(p)and A1,...,AGare iid exponential random variables of rate 1, then A1+···+

AG∼Exp(p).

(See, for instance, [9] for a proof.)

For S∼Exp(p), the mean of Sis 1/p and standard deviation is also 1/p. Moreover, for S1,...,Skiid

Exp(p), M=S1+···+Skhas a gamma distribution with shape parameter kand rate parameter p. Write

M∼Gamma(k, p).

The density of both Mand 1/M (which has an inverse gamma distribution) are known precisely, and the

mean of 1/M is p/(k−1). So

ˆpGBAS = (k−1)/M (6)

is an unbiased estimator for p.

In addition, gamma random variables are scalable. Multiplying a gamma distributed random variable by a

constant results in a new gamma distributed random variable where the new rate constant is the old constant

divided by the multiplication factor. Hence

ˆpGBAS

=pM

k−1∼Gamma(k, k −1).(7)

That is to say, the relative error of the GBAS estimate does not depend on the value pthat is being

estimated! This makes it a straightforward numerical calculation to ﬁnd the minimum value of ksuch that

for X∼Gamma(k, k −1),

P



X−1



> ǫ≤δ. (8)

This allows the GBAS algorithm to immediately give an (ǫ, δ)-ras.

GBAS

1. Given ǫand δ, set kequal to the smallest value such that a Gamma(k, k −1) random variable lies

outside the interval [1/(1 + ǫ),1/(1 −ǫ)] with probability at most δ.

2. Use the Bernoulli stream to generate S1,...,Skiid Exp(p).

3. Return

ˆpGBAS =k−1

S1+···+Sk

This corrects most of the three deﬁciencies noted earlier for the DKLR algorithm. First, the value of kcan

be shown to be at most 2ǫ−2ln(2δ−1) for ǫand δboth less than 1. This follows directly from Theorem 1 of

[6]. Second, the tails of a gamma decline much like those of a normal, and so the second order eﬀects are

retained. Third, the result is unbiased.

Unfortunately, one new deﬁciency has been added. For a stream of Rithat are iid Geo(p), the mean of

R1+···+Rkis 1/p while the standard deviation is the slightly smaller (1 −p)1/2/p. For Siiid Exp(p) the

mean is 1/p while the standard deviation is exactly 1/p.

From a Monte Carlo point of view, this means that the DKLR algorithm (if pwas known ahead of time)

should take about 1 −ptimes the number of samples that the GBAS algorithm does. While this does not

aﬀect much if pis close to 0, for pin the middle of the interval [0,1] this can have a signiﬁcant eﬀect on the

running time. Of course, pis not known ahead of time, and so there has been no way of getting this 1 −p

factor in the run time exactly.

This paper for the ﬁrst time presents a way of running DKLR close to optimally in order to regain a factor

that is asymptotically close for small ǫand δto the 1 −pfactor in the run time. The rest of the paper is

organized as follows. The next section introduces the two-stage method of running and proves the veracity

of the estimate. The following section covers how to gain most of the advantage of the 1 −pfactor while

still obtaining an unbiased estimate. The last section then concludes.

2 The two-stage algorithm

Suppose that U1, U2,... forms an iid sequence of random variables that are uniform over [0,1]. Suppose

Bi=(Ui≤p), where is the indicator function that evaluates to 0 if the argument is false and 1 if it is

true. Then the B1, B2,... form an iid Bernoulli sequence with mean p. Let

Tk(p) = inf{t:(U1< p) + (U2< p) + ··· (Ut< p) = k}.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

arXiv:2210.12861v1[cs.LG]23Oct2022TightrelativeestimationinthemeanofBernoullirandomvariablesMarkHuberAbstractGivenastreamofBernoullirandomvariables,considertheproblemofestimatingthemeanoftherandomvariablewithinaspeciﬁedrelativeerrorwithaspeciﬁedprobabilityoffailure.Untilnow,theGammaBernoulliApproxim...

展开>> 收起<<

Tight relative estimation in the mean of Bernoulli random variables.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Tight relative estimation in the mean of Bernoulli random variables

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: