1 Achievable Error Exponents for Two-Phase Multiple Classification

2025-04-30 0 0 584.67KB 20 页 10玖币

Achievable Error Exponents for Two-Phase

Multiple Classiﬁcation

Lin Zhou, Jun Diao and Lin Bai

Abstract

We revisit M-ary classiﬁcation of Gutman (TIT 1989), where one is tasked to determine whether a testing sequence is

generated with the same distribution as one of the Mtraining sequences or not. Our main result is a two-phase test, its theoretical

analysis and its optimality guarantee. Speciﬁcally, our two-phase test is a special case of a sequential test with only two decision

time points: the ﬁrst phase of our test is a ﬁxed-length test with a reject option, the second-phase of our test proceeds only if a

reject option is decided in the ﬁrst phase and the second phase of our test does not allow a reject option. To provide theoretical

guarantee for our test, we derive achievable error exponents using the method of types and derive a converse result for the optimal

sequential test using the techniques recently proposed by Hsu, Li and Wang (ITW, 2022) for binary classiﬁcation. Analytically

and numerically, we show that our two phase test achieves the performance of an optimal sequential test with proper choice of test

parameters. In particular, similarly as the optimal sequential test, our test does not need a ﬁnal reject option to achieve the optimal

error exponent region while an optimal ﬁxed-length test needs a reject option to achieve the same region. Finally, we specialize our

results to binary classiﬁcation when M= 2 and to M-ary hypothesis testing when the ratio of the lengths of training sequences

and testing sequences tends to inﬁnity so that generating distributions can be estimated perfectly. For both specializations, our

two-phase test bridges over optimal ﬁxed-length and sequential tests in the same spirit as the results of Lalitha and Javidi for

binary hypothesis testing (ISIT 2016). In summary, our results generalize the design and analysis of the two-phase test for binary

hypothesis testing to broader and more practical families of statistical classiﬁcation with multiple decision outcomes.

Index Terms

Hypothesis Testing, Classiﬁcation, Large deviations, Two-phase test, Sequential test

I. INTRODUCTION

Binary hypothesis testing lies in the intersection of information theory and statistics with applications in various domains [4].

In this problem, one is given two distributions P1and P2and a test sequence Yn. The task is to decide from which distribution

the test sequence Ynis generated i.i.d. from and the performance critera are the type-I and type-II error probabilities. The

Neyman-Pearson lemma [5] states that the likelihood ratio test (LRT) is optimal. When the type-I error probability is upper

bounded by a constant, the Chernoff-Stein lemma [6] shows that the type-II error probability decays exponentially fast at the

speed of D(P1∥P2). Blahut [7] showed that to ensure that the type-I error probability decays exponentially fast with a speed

λ, the decay rate of the type-II error probability is reduced from D(P1∥P2)to min ˜

P:D(˜

P∥P1)≤λD(˜

P∥P2). Such a tradeoff can

be improved in the sequential setting where the length of the test sequence is allowed to vary but the expected length is upper

bounded. In this setting, Wald [8] showed that the sequential probability ratio test (SPRT) is optimal and achieves maximal

error exponents for both types of error probabilities simultaneously.

The superior performance of the SPRT comes at the high complexity of the test design, where at each time point, one takes

a new test sample and determines whether to continue collecting samples or to make a decision. One might wonder whether

it is possible to achieve the performance of the SPRT while simplifying the test design with limited number of decision time

points. Lalitha and Javidi [9] answered this question afﬁrmatively by proposing a two-phase test. Speciﬁcally, the test in [9]

consists of two phases: the ﬁrst phase is a ﬁxed-length test taking nsamples, the second phase is another ﬁxed-length test

taking additional ⌈(k−1)n⌉1samples for some real number k≥1and the second phase proceeds only if the ﬁrst phase

This work was partially presented at IEEE ISIT 2022 [1], IEEE ICASSP 2023 [2] and IEEE ISIT 2023 [3].

The authors are with the School of Cyber Science and Technology, Beihang University, Beijing, China, 100083 (Emails: {lzhou, jundiao, l.bai}@buaa.edu.cn).

L. Zhou and L. Bai are also with the Beijing Laboratory for General Aviation Technology, Beihang University, Beijing.

This work was supported in part by the National Natural Science Foundation of China under Grants 61922010 and 62201022, in part by the Beijing Natural

Science Foundation under Grant JQ20019 and 4232007 and in part by funds of Beihang university.

1In the rest of the paper, for simplicity, we drop the integer constraint and use (k−1)nand kn.

arXiv:2210.12736v2 [cs.IT] 26 May 2023

outputs a reject option, indicating that nsamples are not sufﬁcient to make a reliable decision. With proper design, the second

phase proceeds with an exponentially small probability. Thus, the two-phase test is simply a sequential test with two decision

time points nand kn and the asymptotical average sample size is n. Such a design renders the two-phase test more practical

than a usual sequential test that checks at each time point whether to stop after collecting each data sample. It is the ease

of the test design that motivates the study of the two-phase test for hypothesis testing, analogously to the motivation for the

two-phase code of Yamamoto and Itoh [10] for channel coding with feedback.

The theoretical results for ﬁxed-length and sequential tests for binary hypothesis testing have been generalized to the M-ary

hypothesis testing [11]–[13]. For the more practical problem where the generating distributions are unknown, Gutman [14]

proposed the statistical classiﬁcation framework where one determines whether a test sequence Ynis generated from one of

the unknown distributions using empirically observed statistics. Gutman proposed a distribution free test, derived its achievable

error exponent under each hypothesis and proved its optimality under the generalized Neyman-Pearson criterion. Recently,

Haghifam, Tan and Khisti [15] generalized Gutman’s result to the sequential setting and derived the achievable error exponents

under the universal error probability constraint. The results for the binary case of [15] was further generalized by Hsu, Li and

Wang [16], who derived tight results with matching achievability and converse bounds for both the expected stopping time

universality constraint and error probability universality constraint.

However, there is no publication that derives the performance of a two-phase test for the more practical problem of statistical

classiﬁcation. Inspired by the results of Lalitha and Javidi [9] for binary hypothesis testing, we are interested in the following

question. Can one propose a two-phase test for statistical classiﬁcation, and demonstrate that the test has performance close

to the optimal sequential test using the simple design of a ﬁxed-length test? Fortunately, we answer the question afﬁrmatively.

Our contributions are summarized as follows.

A. Main Contributions

We generalize the results of Lalitha and Javidi [9] for binary hypothesis testing to the more practical statistical classiﬁcation

problem where the generating distributions under each hypothesis are unknown and there are more than two hypotheses. We

propose a two-phase tests using the generalized Jensen-Shannon divergence [17, Eq. (2.3)] that takes empirical distributions of

training sequences and the testing sequence as the parameters. We derive the achievable error exponents of our tests using the

method of types. To provide the optimality guarantee of the two-phase test, we characterize the error exponent of an optimal

sequential test under error probability universality constraint using the techniques recently proposed by Hsu, Li and Wang

for binary sequential classiﬁcation [16]. This is because our two-phase test is a special case of the sequential test and thus

the converse bound for sequential classiﬁcation is naturally a converse bound for our two-phase test. Analytical results and

numerical simulations verify that our two-phase tests achieve error exponent close to the optimal sequential test and bridge the

performance gaps between the ﬁxed-length test [14] and the sequential test. In particular, our two-phase test does not require a

reject option to achieve the optimal error exponent region of the sequential test, which is in stark contrast with the ﬁxed-length

test [14]. The non-asymptotic complexity of our test is at most multiple of a ﬁxed-length test and the asymptotic complexity

of our test is the same as the ﬁxed-length test. Furthermore, we specialize our results to binary classiﬁcation with M= 2 and

compare the performance between the specialized test and the threshold-based test in our previous conference paper [1, Eq.

(14) and Eq. (15)]. Numerical results demonstrate that the specialized test in this paper is superior to the threshold-based test

in the Bayesian setting.

We further discuss how the performance of our tests are inﬂuenced by α, the ratio of the lengths of training sequences

and that of the testing sequence. We demonstrate that the more the training sequences (larger α), the better the performance.

When α→ ∞ so that the generating distribution under each hypothesis is estimated accurately, our results specialize to M-ary

hypothesis testing. Analytical results and numerical simulations show that by tuning the design parameters, the achievable error

exponents of our test approach either the ﬁxed-length test [12] or the sequential test [13] in both the Neyman-Pearson and

the Bayesian settings. Note that our test is different from the LRT-based test of Lalitha and Javidi [9, Eq. (14) and Eq. (15)]

when specialized to M= 2, but it is more amenable to generalizations to the case with unknown distributions and multiple

hypotheses.

B. Other Related Works

We remark that the two-phase test for hypothesis testing is closely related with the two-phase code for channel coding with

feedback [18], [19] by Yamamoto and Itoh [10]. Speciﬁcally, the two-phase code in [10] includes a message phase and a

conﬁrmation phase. In the message phase, a message is transmitted using a ﬁxed number nof channel uses. After receiving the

channel outputs, the receiver checks whether the message is transmitted correctly or not and sends either an ACK or NACK

signal to inform the transmitter. Once the NACK signal is received, the transmitter retransmits the message. Note that the

ACK or NACK signal corresponds to the non-reject and reject decisions of the ﬁrst phase for hypothesis testing. However,

we would like to note that the second phase of both problems differ. In channel coding, the second phase is a retransmission

of the same message. In contrast, in hypothesis testing, the second phase collects another (k−1)ndata samples and makes

a ﬁnal decision using all kn data samples from both phases. Furthermore, since the objectives of both problems are different,

the analyses are rather different and thus the theoretical analyses for channel coding with feedback [19]–[21] does not directly

imply the results for hypothesis testing, regardless whether the generating distribution is known or not.

We then provide a non-exhaustive list of related works on statistical classiﬁcation. Zhou, Tan and Motani [17] analyzed the

ﬁnite sample performance of Gutman’s test. Hsu and Wang [22] generalized Gutman’s results to the mismatched setting where

the generating distributions of the training sequences and the test sequences deviate slightly. The Bayesian error probability of

binary classiﬁcation was studied by Merhav and Ziv [23] in the asymptotic case and more recently by Saito [24], [25] in the ﬁnite

blocklength using Bayes codes. Gutman’s results are also generalized to multiple test sequences [26], distributed detection [27],

quickest change-point detection [28], outlier hypothesis testing [29], [30] and universal sequential classiﬁcation [16].

Notation

We use R,R+,Nto denote the set of real numbers, non-negative real numbers, and natural numbers respectively. Given any

integer a∈N, we use [a]to denote [1,2, . . . , a]. Random variables and their realizations are denoted by upper case variables

(e.g., X) and lower case variables (e.g., x), respectively. All sets are denoted in calligraphic font (e.g., X). For any N∈N,

let XN:= (X1,...XN)be a random vector of length Nand let xN= (x1, . . . , xN)be a particular realization of XN. The

set of all probability distributions on a ﬁnite set Xis denoted as P(X). We use E[·]to denote expectation. Given a sequence

xn∈ X n, the type or empirical distribution is deﬁned as ˆ

Txn(a) = 1

nPn

i=1 (xi) = a, ∀a∈ X . The set of types formed from

length-nsequences with alphabet Xis denoted as Pn(X). Given any P∈ Pn(X), the set of all sequences of length nwith

type P, the type class, is denoted as Tn

II. PROBLEM FORMULATION, FIXED-LENGTH TEST AND SEQUENTIAL TEST

A. Problem Formulation

In M-ary classiﬁcation, we are given Mtraining sequences XN:= (XN

1, . . . , XN

M)∈(XN)Mof length N∈Ngenerated

i.i.d. according to distinct unknown distributions P= (P1, . . . , PM)∈ P(X)Mand a test sequence Yτgenerated i.i.d.

according to one of the Mdistributions, where τis a random stopping time with respect to the ﬁltration σ{XN, Y1,...Yn}.

For simplicity, we assume that N=⌈nα⌉2. The task is to design a test Φ = (τ, ϕτ), which consists of a random stopping

time τand a mapping ϕτ:XMN × X τ→ {H1,H2,...,HM}, to decide among the following Mhypotheses:

•Hj: the test sequence Yτis generated i.i.d. from the same distribution as the jth training sequence XN

j,j∈[M].

At the random stopping time τ, any mapping ϕτpartitions the sample space into Mdisjoint regions: {Aj(ϕτ)}j∈[M]where

Yτ∈ Aj(ϕτ)favors hypothesis Hj.

Given any test Φand any tuple of distributions P∈ P(X)M, we have the following Merror probabilities to evaluate the

performance of the test:

βj(Φ|P) := Pjϕτ(XN, Y τ)̸= Hj, j ∈[M],(1)

where for each j∈[M], we deﬁne Pj{·} := Pr{·|Hj}under which XN

i∼Pifor all i∈[M]and Yτ∼Pj. It is challenging to

characterize the non-asymptotic performance of error probabilities with ﬁnite sample size. As a compromise, to be consistent

2In subsequent analysis, without loss of generality, we ignore the integer requirement and drop the ceiling operation for simplicity.

with literature, one usually derives error exponents, i.e., the decay rates of error probabilities. For M-ary classiﬁcation, we are

interested the following error exponents

Ej(Φ|P) := lim inf

n→∞

−log βj(Φ|P)

Ej[τ],(2)

and Bayesian error exponent:

EBayesianΦ|P:= lim inf

n→∞

−log PM

j=1 πjβjΦ|P

Ej[τ],(3)

where πj∈(0,1) is the prior probability of hypothesis Hjfor each j∈[M]such that PM

j=1 πj= 1. Note that in both (2)

and (3), the random stopping time τis a function of nthrough the ﬁltration σ{Y1, . . . , Yn}and thus the deﬁnitions are valid.

Fix any integer n∈N. When the random stopping time τis ﬁxed as a constant such that τ=n, the test is called a

ﬁxed-length test; when the random stopping time τsatisﬁes lim supn→∞

E[τ]

n≤1, the test is called a sequential test. The

achievable error exponents of a ﬁxed-length test was derived by Gutman [14]. We generalize the achievability and converse

results of the sequential test for binary classiﬁcation [16] to M-ary classiﬁcation.

B. Fixed-length Test

We ﬁrst recall the results of the ﬁxed-length test by Gutman [14].To present Gutman’s test, we need the following measure

of distributions. Given any two distributions (P, Q)∈ P(X)2and any positive real number α∈R+, the generalized Jensen-

Shannon divergence is deﬁned as

GJS(P, Q, α)= αD P



αP +Q

1 + α+DQ



αP +Q

1 + α.(4)

Then deﬁne the set

Mdis := {(i, j)∈[M]2:i̸=j}.(5)

For any positive real number λ∈R+, Gutman’s test Φ(M)

Gut =n, ϕ(M)

n,Guthas a ﬁxed stopping time τ=nand proceeds as

follows using the training and test sequences (XN, Y n):

ϕ(M)

n,Gut(XN, Y n) = 









H1if GJSˆ

TXN

i,ˆ

TYn, α≥λ, ∀i∈[2, M],

Hiif GJSˆ

TXN

i,ˆ

TYn, α< λ, ∃i∈[1, M]

and GJSˆ

TXN

t,ˆ

TYn, α≥λ, ∀t∈[M]\i,

Hrotherwise,

(6)

where Hrdenotes the reject option, which indicates that a reliable decision could not be made.

Gutman proved the following result.

Theorem 1. For any tuple of distributions P∈ P(X)Mand any λ∈R+, Gutman’s test satisﬁes

EjΦ(M)

Gut |P≥λ, j ∈[M].(7)

Furthermore, among all ﬁxed-length tests that achieve the same exponent as Gutman’s test, Gutman’s test has the smallest

rejection region.

Note that the largest achievable exponent λ∈R+of Gutman’s test such that the rejection probability vanishes as n→ ∞

is ¯

λ= min

(i,j)∈Mdis

GJS(Pi, Pj, α).

C. Sequential Test

To present the result for sequential classiﬁcation, we need the following deﬁnitions. Given β∈(0,1) and any sequential test

Φ, we say that Φsatisﬁes the universality constraint on the error probability with βif for all tuple of distributions ˜

P∈ PM(X),

max

j∈[M]βj(Φ|˜

P)≤β. (8)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1AchievableErrorExponentsforTwo-PhaseMultipleClassificationLinZhou,JunDiaoandLinBaiAbstractWerevisitM-aryclassificationofGutman(TIT1989),whereoneistaskedtodeterminewhetheratestingsequenceisgeneratedwiththesamedistributionasoneoftheMtrainingsequencesornot.Ourmainresultisatwo-phasetest,itstheoreticala...

展开>> 收起<<

1 Achievable Error Exponents for Two-Phase Multiple Classification.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Achievable Error Exponents for Two-Phase Multiple Classification

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: