Tight relative estimation in the mean of Bernoulli random variables

2025-05-06 0 0 171.58KB 12 页 10玖币
侵权投诉
arXiv:2210.12861v1 [cs.LG] 23 Oct 2022
Tight relative estimation in the mean of Bernoulli random variables
Mark Huber
Abstract Given a stream of Bernoulli random variables, consider the problem of estimating the mean of
the random variable within a specified relative error with a specified probability of failure. Until now, the
Gamma Bernoulli Approximation Scheme (GBAS) was the method that accomplished this goal using the
smallest number of average samples. In this work, a new method is introduced that is faster when the mean
is bounded away from zero. The process uses a two-stage process together with some simple inequalities to
get rigorous bounds on the error probability.
1 Introduction
Suppose that B1, B2,...are a stream of independent, identically distributed (iid) random variables that have
a Bernoulli distribution. That is, for each i,P(Bi= 1) = pand P(Bi= 0) = 1 p. Consider the problem of
estimating p.
This is one of the oldest problems in probability, and was the inspiration for the first Law of Large Numbers
of Jacob Bernoulli [4]. There are numerous modern applications, including network analysis [2], testing
sequence association [3], testing if a population is in Hardy-Weinburg equilibrium [10], ray tracing [15], and
many others [7, 1, 13, 8, 14].
The basic algorithm simply generates a fixed number of Bivalues, say n, and then estimates pusing the
sample average ˆpn= (B1+···+Bn)/n. While both simple and unbiased, this method fails to have any
guarantees on the relative error of the estimate. Taking a random number of samples can help create an
estimate with such a guarantee.
The goal here will be, given parameters ǫ > 0 and δ(0,1), find an integrable stopping time Twith respect
to the natural filtration and an estimate
ˆp= ˆp(B1, B2,...,BT) (1)
such that the relative error is greater than ǫwith probability at most δ. That is,
P
ˆp
p1
> ǫδ. (2)
This is also known as a randomized approximation scheme, or ras for short.
The time to find the estimate is dominated by the time needed to generate B1,...,BT. Hence the running
time of the procedure is taken to be E(T), the average number of samples needed by the ras.
Returning to the sample average for the moment, the variance of ˆpnis vn=p(1 p)/n. The standard
deviation is then vn. To get the standard deviation to be ǫp,n= (1 p)ǫ2p1samples are necessary.
Using this together with the mean of medians technique [11, 12], it is possible to get an ras for this sample
average using
Θ((1 p)p1ǫ2ln(δ1)) (3)
samples. Of course, here the Big-Theta notation (Θ) is hiding the constant involved.
1
While the Central Limit Theorem (because it is only an asymptotic result) does not yield an ras, it does
give a rough estimate of the number of samples required. A CLT approximation indicates that about
2(1 p)p1ǫ2ln(2δ1) samples should be necessary and sufficient to obtain an (ǫ, δ)-ras.
This is a problem, because the number of samples needed has a factor of (1 p)/p, and pis the very thing
being estimated! Dagum, Karp, Luby, and Ross [5] found an elegant solution to this difficulty through an
algorithm that will be called DKLR throughout this work.
Their idea was to use the Bernoulli samples to create iid geometric random variables. Say that GGeo(p)
if for all i∈ {1,2,...},P(G=i) = p(1 p)i1. Then it is straightforward to see that for
R= inf{r:B1+···+Br= 1},(4)
then RGeo(p).
The mean of Ris 1/p, and the variance is (1 p)/p2. Hence for the sample average of kiid draws R1,...,Rk
with the same distribution as R,R1+···+Rk
k(5)
has mean 1/p and standard deviation (1 p)1/2/[k1/2p].
That is, the standard deviation is already of the same order as the mean being estimated! The DKLR
algorithm then runs as follows.
DKLR
1. Given ǫand δ, set kequal to
1 + (1 + ǫ)4(e2)ǫ2ln(2δ1).
2. Use the Bernoulli stream to generate R1,...,Rkiid Geo(p).
3. Return
ˆpDKLR =k1
R1+···+Rk
.
The choice of kin the first line comes from [5], and is based on their analysis of the value needed to obtain
an (ǫ, δ)-ras. The use of k1 in the numerator of the final line is a change from their algorithm, which used
k. The k1 used here makes this algorithm more easily comparable to the improvement on the algorithm,
GBAS, that is described later on in this section.
While this was the first ras for this problem, there are some drawbacks. First, the constant of 4(e2)
2.873 ...is not close to the constant of 2 predicted by the CLT. Second, it does not take advantage of second
order effects, that is, the tails of the geometric random variable decline in a similar fashion to normals,
allowing for second order reductions in the run time that are significant. Third, it is biased.
To solve these issues, the author introduced the Gamma Bernoulli Approximation Scheme (GBAS) [9]. This
algorithm took the reduction of Bernoullis to Geometrics one step further. An easy to show fact about
Geometric random variables is the following. Suppose that Ais an exponentially distributed random variable
with rate λ > 0 so that for all a > 0, it holds that P(A > a) = exp(λa). Write AExp(λ).
Theorem 1.1. If GGeo(p)and A1,...,AGare iid exponential random variables of rate 1, then A1+···+
AGExp(p).
(See, for instance, [9] for a proof.)
For SExp(p), the mean of Sis 1/p and standard deviation is also 1/p. Moreover, for S1,...,Skiid
Exp(p), M=S1+···+Skhas a gamma distribution with shape parameter kand rate parameter p. Write
MGamma(k, p).
2
The density of both Mand 1/M (which has an inverse gamma distribution) are known precisely, and the
mean of 1/M is p/(k1). So
ˆpGBAS = (k1)/M (6)
is an unbiased estimator for p.
In addition, gamma random variables are scalable. Multiplying a gamma distributed random variable by a
constant results in a new gamma distributed random variable where the new rate constant is the old constant
divided by the multiplication factor. Hence
p
ˆpGBAS
=pM
k1Gamma(k, k 1).(7)
That is to say, the relative error of the GBAS estimate does not depend on the value pthat is being
estimated! This makes it a straightforward numerical calculation to find the minimum value of ksuch that
for XGamma(k, k 1),
P
1
X1
> ǫδ. (8)
This allows the GBAS algorithm to immediately give an (ǫ, δ)-ras.
GBAS
1. Given ǫand δ, set kequal to the smallest value such that a Gamma(k, k 1) random variable lies
outside the interval [1/(1 + ǫ),1/(1 ǫ)] with probability at most δ.
2. Use the Bernoulli stream to generate S1,...,Skiid Exp(p).
3. Return
ˆpGBAS =k1
S1+···+Sk
.
This corrects most of the three deficiencies noted earlier for the DKLR algorithm. First, the value of kcan
be shown to be at most 2ǫ2ln(2δ1) for ǫand δboth less than 1. This follows directly from Theorem 1 of
[6]. Second, the tails of a gamma decline much like those of a normal, and so the second order effects are
retained. Third, the result is unbiased.
Unfortunately, one new deficiency has been added. For a stream of Rithat are iid Geo(p), the mean of
R1+···+Rkis 1/p while the standard deviation is the slightly smaller (1 p)1/2/p. For Siiid Exp(p) the
mean is 1/p while the standard deviation is exactly 1/p.
From a Monte Carlo point of view, this means that the DKLR algorithm (if pwas known ahead of time)
should take about 1 ptimes the number of samples that the GBAS algorithm does. While this does not
affect much if pis close to 0, for pin the middle of the interval [0,1] this can have a significant effect on the
running time. Of course, pis not known ahead of time, and so there has been no way of getting this 1 p
factor in the run time exactly.
This paper for the first time presents a way of running DKLR close to optimally in order to regain a factor
that is asymptotically close for small ǫand δto the 1 pfactor in the run time. The rest of the paper is
organized as follows. The next section introduces the two-stage method of running and proves the veracity
of the estimate. The following section covers how to gain most of the advantage of the 1 pfactor while
still obtaining an unbiased estimate. The last section then concludes.
2 The two-stage algorithm
Suppose that U1, U2,... forms an iid sequence of random variables that are uniform over [0,1]. Suppose
Bi=(Uip), where is the indicator function that evaluates to 0 if the argument is false and 1 if it is
true. Then the B1, B2,... form an iid Bernoulli sequence with mean p. Let
Tk(p) = inf{t:(U1< p) + (U2< p) + ··· (Ut< p) = k}.
3
摘要:

arXiv:2210.12861v1[cs.LG]23Oct2022TightrelativeestimationinthemeanofBernoullirandomvariablesMarkHuberAbstractGivenastreamofBernoullirandomvariables,considertheproblemofestimatingthemeanoftherandomvariablewithinaspecifiedrelativeerrorwithaspecifiedprobabilityoffailure.Untilnow,theGammaBernoulliApproxim...

展开>> 收起<<
Tight relative estimation in the mean of Bernoulli random variables.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:171.58KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注