The Optimal Sample Size in Crosswise Model for Sensitive Questions Stanis law Jaworski

2025-05-06 0 0 527.13KB 10 页 10玖币
侵权投诉
The Optimal Sample Size
in Crosswise Model for Sensitive Questions
Stanis law Jaworski
Wojciech Zieli´nski
Warsaw University of Life Sciences (Poland)
e-mail: stanislaw jaworski@sggw.edu.pl
e-mail: wojciech zielinski@sggw.edu.pl
Abstract
The problem is in the estimation of the fraction of population with a stigmatizing characteristic. In the
paper the nonrandomized response model proposed by Tian, Yu, Tang, and Geng (2007) is considered. The
exact confidence interval for this fraction is constructed. Also the optimal sample size for obtaining the
confidence interval of a given length is derived.
Keywords: sensitive questions, NNR model, exact confidence interval
2010 Mathematics Subject Classification: 62F25, 62P20
1 Introduction
The problem is in the estimation of the percentage of population who “committed” socially stigmatizing “crimes”
such as corruption, tax frauds, illegal work (black market), drug uses, violence against children and other.
Mathematicaly, let Ybe a random variable such that
P{Y= 1}=π= 1 P{Y= 0}.
The r.v. takes on the value 1 when the answer to the sensitive question is YES and the value 0 otherwise.
The number π(0,1) is the probability of the positive answer to the sensitive question, i.e. π·100% is the
percentage of interest. We want to estimate the probability π, i.e. we are going to construct a confidence
interval for π.
Let Y1, . . . , Ynbe a sample. The statistical model for the sample is
({0,1, . . . , n},{Bin(n, π), π (0,1)}),
where Bin(·,·) denotes a Binomial distribution.
The difficulty which arises is such that random variables Y1, . . . , Ynare not observable. Answers to the sensitive
question are “hidden” through asking a “neutral” question, which is answered YES or NO. It is assumed that
the “neutral” question is independent from the sensitive question. In a questionnaire two questions are asked:
sensitive and neutral. But only one answer is registered and the interviewer does not know which of the two
questions the interviewee answered.
The first method of obscuring the answer to a sensitive question was proposed by (Warner, 1965). His method
consists in the randomization of answers. This randomization is done by the respondent and the interviewer does
not know what the answer to the sensitive question is. This model was extended in different ways (Horvitz,
Shah, & Simmons, 1967; Greenberg, Abul-Ela, & Horvitz, 1969; Raghavarao, 1978; Franklin, 1989; Arnab,
Shangodoyin, & Arcos, 2019; Arnab, 1990, 1996; Kuk, 1990; Rueda, Cobo, & Arcos, 2015).
Corresponding author: Stanis law Jaworski, Department of Econometrics and Statistics, Warsaw University of Life
Sciences, Nowoursynowska 159, PL-02-767 Warsaw
1
arXiv:2210.15245v1 [stat.ME] 27 Oct 2022
Jaworski, Zieli´nski: Optimal Sample Size... 2
(Tian et al., 2007) proposed a nonrandomized response model (NRR). Their idea consists in asking two questions
simultaneously: one sensitive and one neutral. This model was extended to other, similar approaches: (Yu,
Tian, & Tang, 2008; Tan, Tian, & Tang, 2009; Tian, 2014).
Unfortunately, the problem of constructing confidence intervals for πwas considered rather rarely. Moreover,
proposed confidence intervals are asymptotic. These confidence intervals are not c.i. in (Neyman, 1934, p. 562)
sense: they do not keep prescribed confidence level. In what follows the finite sample size confidence interval is
proposed. Its construction is based on the distribution of the Maximum Likelihood estimator of π. We consider
only the crosswise model proposed by (Yu et al., 2008).
In section 2. we give the method of the construction of a new confidence interval for π. In section 3. we recall
the construction of asymptotic confidence intervals. We also discuss the probability of the coverage of presented
confidence intervals. In the next section we present the methods of sample size selection. This section plays the
main role in our paper. In section 5. some concluding remarks are given.
2 Confidence interval in Crosswise Model
In the crosswise model (CM) respondents are presented with two questions simultaneously, one neutral and one
sensitive. They are instructed to report 1 only if answers to both questions are the same, i.e. the observable
variable in this model is Z, where
Z=(1,if both answers are YES or NO,
0,otherwise.
The answers of nrespondents may be treated as the realizations of a Binomial distribution with parameters
(n, %), where %is the probability of receiving an outcome 1 of the Zvariable. Assume that the asked questions
are independent and the probability of the answer YES to the sensitive question is π(qfor the neutral question).
It is assumed that qis known. Hence, in CM model
%=qπ + (1 q)(1 π) = (2q1)π+ (1 q).
In this model
π=%(1 q)
2q1.
Without loss of generality we assume that q < 0.5.
Let Z1, . . . , Znbe a sample. MLE of %is ˆ%=1
nPn
i=1 Zi. The distribution of nˆ%is Bin(n;%).
The MLE of πhas the form
ˆπCM = max min ˆ%(1 q)
2q1,1,0.
Let Bin (·, n;%) denote the CDF of the binomial distribution with the probability of success equal to %and let
B(a, b;·) denote the CDF of the Beta distribution with parameters (a, b).
In the derivation of the pdf of ˆπCM the following known relationship will be applied: if ξis a random variable
distributed as binomial with parameters (n, ρ) then
Pρ{ξx}=
n
X
i=0 n
iρi(1 ρ)ni=B(nx, x + 1; 1 ρ).
The pdf of the distribution of ˆπCM is
Pπ{ˆπCM =x}=
Pπ{nˆ%≥ d(1 q)ne},for x= 0,
n
due((2q1)π+ (1 q))due(1 (2q1)π(1 q))n−due,for 0 <x<1
Pπ{nˆ%≤ bqnc},for x= 1
=
B(dn(1 q)e, n − dn(1 q)e+ 1; (2q1)π+ (1 q)) ,for x= 0
n
due((2q1)π+ (1 q))due(1 (2q1)π(1 q))n−due,for 0 <x<1
1B(bnqc+ 1, n − bnqc; (2q1)π+ (1 q)) for x= 1
,
where u=n(x(2q1) + (1 q)). Here dxedenotes the smallest integer not smaller than xand bxcdenotes the
greatest integer not greater than x.
摘要:

TheOptimalSampleSizeinCrosswiseModelforSensitiveQuestionsStanislawJaworski*WojciechZielinskiWarsawUniversityofLifeSciences(Poland)e-mail:stanislawjaworski@sggw.edu.ple-mail:wojciechzielinski@sggw.edu.plAbstractTheproblemisintheestimationofthefractionofpopulationwithastigmatizingcharacteristic.Inthe...

展开>> 收起<<
The Optimal Sample Size in Crosswise Model for Sensitive Questions Stanis law Jaworski.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:10 页 大小:527.13KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注