1 Achievable Error Exponents for Two-Phase Multiple Classification

2025-04-30 0 0 584.67KB 20 页 10玖币
侵权投诉
1
Achievable Error Exponents for Two-Phase
Multiple Classification
Lin Zhou, Jun Diao and Lin Bai
Abstract
We revisit M-ary classification of Gutman (TIT 1989), where one is tasked to determine whether a testing sequence is
generated with the same distribution as one of the Mtraining sequences or not. Our main result is a two-phase test, its theoretical
analysis and its optimality guarantee. Specifically, our two-phase test is a special case of a sequential test with only two decision
time points: the first phase of our test is a fixed-length test with a reject option, the second-phase of our test proceeds only if a
reject option is decided in the first phase and the second phase of our test does not allow a reject option. To provide theoretical
guarantee for our test, we derive achievable error exponents using the method of types and derive a converse result for the optimal
sequential test using the techniques recently proposed by Hsu, Li and Wang (ITW, 2022) for binary classification. Analytically
and numerically, we show that our two phase test achieves the performance of an optimal sequential test with proper choice of test
parameters. In particular, similarly as the optimal sequential test, our test does not need a final reject option to achieve the optimal
error exponent region while an optimal fixed-length test needs a reject option to achieve the same region. Finally, we specialize our
results to binary classification when M= 2 and to M-ary hypothesis testing when the ratio of the lengths of training sequences
and testing sequences tends to infinity so that generating distributions can be estimated perfectly. For both specializations, our
two-phase test bridges over optimal fixed-length and sequential tests in the same spirit as the results of Lalitha and Javidi for
binary hypothesis testing (ISIT 2016). In summary, our results generalize the design and analysis of the two-phase test for binary
hypothesis testing to broader and more practical families of statistical classification with multiple decision outcomes.
Index Terms
Hypothesis Testing, Classification, Large deviations, Two-phase test, Sequential test
I. INTRODUCTION
Binary hypothesis testing lies in the intersection of information theory and statistics with applications in various domains [4].
In this problem, one is given two distributions P1and P2and a test sequence Yn. The task is to decide from which distribution
the test sequence Ynis generated i.i.d. from and the performance critera are the type-I and type-II error probabilities. The
Neyman-Pearson lemma [5] states that the likelihood ratio test (LRT) is optimal. When the type-I error probability is upper
bounded by a constant, the Chernoff-Stein lemma [6] shows that the type-II error probability decays exponentially fast at the
speed of D(P1P2). Blahut [7] showed that to ensure that the type-I error probability decays exponentially fast with a speed
λ, the decay rate of the type-II error probability is reduced from D(P1P2)to min ˜
P:D(˜
PP1)λD(˜
PP2). Such a tradeoff can
be improved in the sequential setting where the length of the test sequence is allowed to vary but the expected length is upper
bounded. In this setting, Wald [8] showed that the sequential probability ratio test (SPRT) is optimal and achieves maximal
error exponents for both types of error probabilities simultaneously.
The superior performance of the SPRT comes at the high complexity of the test design, where at each time point, one takes
a new test sample and determines whether to continue collecting samples or to make a decision. One might wonder whether
it is possible to achieve the performance of the SPRT while simplifying the test design with limited number of decision time
points. Lalitha and Javidi [9] answered this question affirmatively by proposing a two-phase test. Specifically, the test in [9]
consists of two phases: the first phase is a fixed-length test taking nsamples, the second phase is another fixed-length test
taking additional (k1)n1samples for some real number k1and the second phase proceeds only if the first phase
This work was partially presented at IEEE ISIT 2022 [1], IEEE ICASSP 2023 [2] and IEEE ISIT 2023 [3].
The authors are with the School of Cyber Science and Technology, Beihang University, Beijing, China, 100083 (Emails: {lzhou, jundiao, l.bai}@buaa.edu.cn).
L. Zhou and L. Bai are also with the Beijing Laboratory for General Aviation Technology, Beihang University, Beijing.
This work was supported in part by the National Natural Science Foundation of China under Grants 61922010 and 62201022, in part by the Beijing Natural
Science Foundation under Grant JQ20019 and 4232007 and in part by funds of Beihang university.
1In the rest of the paper, for simplicity, we drop the integer constraint and use (k1)nand kn.
arXiv:2210.12736v2 [cs.IT] 26 May 2023
2
outputs a reject option, indicating that nsamples are not sufficient to make a reliable decision. With proper design, the second
phase proceeds with an exponentially small probability. Thus, the two-phase test is simply a sequential test with two decision
time points nand kn and the asymptotical average sample size is n. Such a design renders the two-phase test more practical
than a usual sequential test that checks at each time point whether to stop after collecting each data sample. It is the ease
of the test design that motivates the study of the two-phase test for hypothesis testing, analogously to the motivation for the
two-phase code of Yamamoto and Itoh [10] for channel coding with feedback.
The theoretical results for fixed-length and sequential tests for binary hypothesis testing have been generalized to the M-ary
hypothesis testing [11]–[13]. For the more practical problem where the generating distributions are unknown, Gutman [14]
proposed the statistical classification framework where one determines whether a test sequence Ynis generated from one of
the unknown distributions using empirically observed statistics. Gutman proposed a distribution free test, derived its achievable
error exponent under each hypothesis and proved its optimality under the generalized Neyman-Pearson criterion. Recently,
Haghifam, Tan and Khisti [15] generalized Gutman’s result to the sequential setting and derived the achievable error exponents
under the universal error probability constraint. The results for the binary case of [15] was further generalized by Hsu, Li and
Wang [16], who derived tight results with matching achievability and converse bounds for both the expected stopping time
universality constraint and error probability universality constraint.
However, there is no publication that derives the performance of a two-phase test for the more practical problem of statistical
classification. Inspired by the results of Lalitha and Javidi [9] for binary hypothesis testing, we are interested in the following
question. Can one propose a two-phase test for statistical classification, and demonstrate that the test has performance close
to the optimal sequential test using the simple design of a fixed-length test? Fortunately, we answer the question affirmatively.
Our contributions are summarized as follows.
A. Main Contributions
We generalize the results of Lalitha and Javidi [9] for binary hypothesis testing to the more practical statistical classification
problem where the generating distributions under each hypothesis are unknown and there are more than two hypotheses. We
propose a two-phase tests using the generalized Jensen-Shannon divergence [17, Eq. (2.3)] that takes empirical distributions of
training sequences and the testing sequence as the parameters. We derive the achievable error exponents of our tests using the
method of types. To provide the optimality guarantee of the two-phase test, we characterize the error exponent of an optimal
sequential test under error probability universality constraint using the techniques recently proposed by Hsu, Li and Wang
for binary sequential classification [16]. This is because our two-phase test is a special case of the sequential test and thus
the converse bound for sequential classification is naturally a converse bound for our two-phase test. Analytical results and
numerical simulations verify that our two-phase tests achieve error exponent close to the optimal sequential test and bridge the
performance gaps between the fixed-length test [14] and the sequential test. In particular, our two-phase test does not require a
reject option to achieve the optimal error exponent region of the sequential test, which is in stark contrast with the fixed-length
test [14]. The non-asymptotic complexity of our test is at most multiple of a fixed-length test and the asymptotic complexity
of our test is the same as the fixed-length test. Furthermore, we specialize our results to binary classification with M= 2 and
compare the performance between the specialized test and the threshold-based test in our previous conference paper [1, Eq.
(14) and Eq. (15)]. Numerical results demonstrate that the specialized test in this paper is superior to the threshold-based test
in the Bayesian setting.
We further discuss how the performance of our tests are influenced by α, the ratio of the lengths of training sequences
and that of the testing sequence. We demonstrate that the more the training sequences (larger α), the better the performance.
When α→ ∞ so that the generating distribution under each hypothesis is estimated accurately, our results specialize to M-ary
hypothesis testing. Analytical results and numerical simulations show that by tuning the design parameters, the achievable error
exponents of our test approach either the fixed-length test [12] or the sequential test [13] in both the Neyman-Pearson and
the Bayesian settings. Note that our test is different from the LRT-based test of Lalitha and Javidi [9, Eq. (14) and Eq. (15)]
when specialized to M= 2, but it is more amenable to generalizations to the case with unknown distributions and multiple
hypotheses.
3
B. Other Related Works
We remark that the two-phase test for hypothesis testing is closely related with the two-phase code for channel coding with
feedback [18], [19] by Yamamoto and Itoh [10]. Specifically, the two-phase code in [10] includes a message phase and a
confirmation phase. In the message phase, a message is transmitted using a fixed number nof channel uses. After receiving the
channel outputs, the receiver checks whether the message is transmitted correctly or not and sends either an ACK or NACK
signal to inform the transmitter. Once the NACK signal is received, the transmitter retransmits the message. Note that the
ACK or NACK signal corresponds to the non-reject and reject decisions of the first phase for hypothesis testing. However,
we would like to note that the second phase of both problems differ. In channel coding, the second phase is a retransmission
of the same message. In contrast, in hypothesis testing, the second phase collects another (k1)ndata samples and makes
a final decision using all kn data samples from both phases. Furthermore, since the objectives of both problems are different,
the analyses are rather different and thus the theoretical analyses for channel coding with feedback [19]–[21] does not directly
imply the results for hypothesis testing, regardless whether the generating distribution is known or not.
We then provide a non-exhaustive list of related works on statistical classification. Zhou, Tan and Motani [17] analyzed the
finite sample performance of Gutman’s test. Hsu and Wang [22] generalized Gutman’s results to the mismatched setting where
the generating distributions of the training sequences and the test sequences deviate slightly. The Bayesian error probability of
binary classification was studied by Merhav and Ziv [23] in the asymptotic case and more recently by Saito [24], [25] in the finite
blocklength using Bayes codes. Gutman’s results are also generalized to multiple test sequences [26], distributed detection [27],
quickest change-point detection [28], outlier hypothesis testing [29], [30] and universal sequential classification [16].
Notation
We use R,R+,Nto denote the set of real numbers, non-negative real numbers, and natural numbers respectively. Given any
integer aN, we use [a]to denote [1,2, . . . , a]. Random variables and their realizations are denoted by upper case variables
(e.g., X) and lower case variables (e.g., x), respectively. All sets are denoted in calligraphic font (e.g., X). For any NN,
let XN:= (X1,...XN)be a random vector of length Nand let xN= (x1, . . . , xN)be a particular realization of XN. The
set of all probability distributions on a finite set Xis denoted as P(X). We use E[·]to denote expectation. Given a sequence
xn X n, the type or empirical distribution is defined as ˆ
Txn(a) = 1
nPn
i=1 (xi) = a, a X . The set of types formed from
length-nsequences with alphabet Xis denoted as Pn(X). Given any P∈ Pn(X), the set of all sequences of length nwith
type P, the type class, is denoted as Tn
P.
II. PROBLEM FORMULATION, FIXED-LENGTH TEST AND SEQUENTIAL TEST
A. Problem Formulation
In M-ary classification, we are given Mtraining sequences XN:= (XN
1, . . . , XN
M)(XN)Mof length NNgenerated
i.i.d. according to distinct unknown distributions P= (P1, . . . , PM)∈ P(X)Mand a test sequence Yτgenerated i.i.d.
according to one of the Mdistributions, where τis a random stopping time with respect to the filtration σ{XN, Y1,...Yn}.
For simplicity, we assume that N=2. The task is to design a test Φ = (τ, ϕτ), which consists of a random stopping
time τand a mapping ϕτ:XMN × X τ→ {H1,H2,...,HM}, to decide among the following Mhypotheses:
Hj: the test sequence Yτis generated i.i.d. from the same distribution as the jth training sequence XN
j,j[M].
At the random stopping time τ, any mapping ϕτpartitions the sample space into Mdisjoint regions: {Aj(ϕτ)}j[M]where
Yτ∈ Aj(ϕτ)favors hypothesis Hj.
Given any test Φand any tuple of distributions P∈ P(X)M, we have the following Merror probabilities to evaluate the
performance of the test:
βj|P) := Pjϕτ(XN, Y τ)̸= Hj, j [M],(1)
where for each j[M], we define Pj{·} := Pr{·|Hj}under which XN
iPifor all i[M]and YτPj. It is challenging to
characterize the non-asymptotic performance of error probabilities with finite sample size. As a compromise, to be consistent
2In subsequent analysis, without loss of generality, we ignore the integer requirement and drop the ceiling operation for simplicity.
4
with literature, one usually derives error exponents, i.e., the decay rates of error probabilities. For M-ary classification, we are
interested the following error exponents
Ej|P) := lim inf
n→∞
log βj|P)
Ej[τ],(2)
and Bayesian error exponent:
EBayesianΦ|P:= lim inf
n→∞
log PM
j=1 πjβjΦ|P
Ej[τ],(3)
where πj(0,1) is the prior probability of hypothesis Hjfor each j[M]such that PM
j=1 πj= 1. Note that in both (2)
and (3), the random stopping time τis a function of nthrough the filtration σ{Y1, . . . , Yn}and thus the definitions are valid.
Fix any integer nN. When the random stopping time τis fixed as a constant such that τ=n, the test is called a
fixed-length test; when the random stopping time τsatisfies lim supn→∞
E[τ]
n1, the test is called a sequential test. The
achievable error exponents of a fixed-length test was derived by Gutman [14]. We generalize the achievability and converse
results of the sequential test for binary classification [16] to M-ary classification.
B. Fixed-length Test
We first recall the results of the fixed-length test by Gutman [14].To present Gutman’s test, we need the following measure
of distributions. Given any two distributions (P, Q)∈ P(X)2and any positive real number αR+, the generalized Jensen-
Shannon divergence is defined as
GJS(P, Q, α)= αD P
αP +Q
1 + α+DQ
αP +Q
1 + α.(4)
Then define the set
Mdis := {(i, j)[M]2:i̸=j}.(5)
For any positive real number λR+, Gutman’s test Φ(M)
Gut =n, ϕ(M)
n,Guthas a fixed stopping time τ=nand proceeds as
follows using the training and test sequences (XN, Y n):
ϕ(M)
n,Gut(XN, Y n) =
H1if GJSˆ
TXN
i,ˆ
TYn, αλ, i[2, M],
Hiif GJSˆ
TXN
i,ˆ
TYn, α< λ, i[1, M]
and GJSˆ
TXN
t,ˆ
TYn, αλ, t[M]\i,
Hrotherwise,
(6)
where Hrdenotes the reject option, which indicates that a reliable decision could not be made.
Gutman proved the following result.
Theorem 1. For any tuple of distributions P∈ P(X)Mand any λR+, Gutman’s test satisfies
EjΦ(M)
Gut |Pλ, j [M].(7)
Furthermore, among all fixed-length tests that achieve the same exponent as Gutman’s test, Gutman’s test has the smallest
rejection region.
Note that the largest achievable exponent λR+of Gutman’s test such that the rejection probability vanishes as n→ ∞
is ¯
λ= min
(i,j)∈Mdis
GJS(Pi, Pj, α).
C. Sequential Test
To present the result for sequential classification, we need the following definitions. Given β(0,1) and any sequential test
Φ, we say that Φsatisfies the universality constraint on the error probability with βif for all tuple of distributions ˜
P∈ PM(X),
max
j[M]βj|˜
P)β. (8)
摘要:

1AchievableErrorExponentsforTwo-PhaseMultipleClassificationLinZhou,JunDiaoandLinBaiAbstractWerevisitM-aryclassificationofGutman(TIT1989),whereoneistaskedtodeterminewhetheratestingsequenceisgeneratedwiththesamedistributionasoneoftheMtrainingsequencesornot.Ourmainresultisatwo-phasetest,itstheoreticala...

展开>> 收起<<
1 Achievable Error Exponents for Two-Phase Multiple Classification.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:584.67KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注