Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finettis Theorem for Markov Chains_2

2025-04-27 0 0 562.81KB 14 页 10玖币
侵权投诉
Extending Conformal Prediction to Hidden Markov Models with Exact Validity
via de Finetti’s Theorem for Markov Chains
Buddhika Nettasinghe 1Samrat Chatterjee 2Ramakrishna Tipireddy 3Mahantesh Halappanavar 2
Abstract
Conformal prediction is a widely used method
to quantify the uncertainty of a classifier under
the assumption of exchangeability (e.g., IID data).
We generalize conformal prediction to the Hid-
den Markov Model (HMM) framework where the
assumption of exchangeability is not valid. The
key idea of the proposed method is to partition
the non-exchangeable Markovian data from the
HMM into exchangeable blocks by exploiting the
de Finetti’s Theorem for Markov Chains discov-
ered by Diaconis and Freedman (1980). The per-
mutations of the exchangeable blocks are viewed
as randomizations of the observed Markovian data
from the HMM. The proposed method provably
retains all desirable theoretical guarantees offered
by the classical conformal prediction framework
in both exchangeable and Markovian settings. In
particular, while the lack of exchangeability in-
troduced by Markovian samples constitutes a vi-
olation of a crucial assumption for classical con-
formal prediction, the proposed method views it
as an advantage that can be exploited to improve
the performance further. Detailed numerical and
empirical results that complement the theoretical
conclusions are provided to illustrate the practical
feasibility of the proposed method.
1. Introduction
This paper extends the conformal prediction framework
proposed in (Vovk et al.,2005) to the Hidden Markov Model
(HMM) framework in a manner that provably preserves all
its desirable theoretical guarantees. In particular, we focus
on the following problem.
1
Tippie College of Business, University of Iowa, Iowa City
IA 52242, USA
2
Pacific Northwest National Laboratory, Rich-
land WA 99354, USA
3
Palo Alto Networks, Santa Clara, CA
95054. Correspondence to: Buddhika Nettasinghe
<
buddhika-
nettasinghe@uiowa.edu>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
Problem of quantifying the uncertainty of an unknown
HMM: Let
{Xt}t1
be a Markov chain taking values in
the space
X={1,2, . . . , n}
that is observed only through
the memoryless observations
{Yt}t1
taking values in the
space
Y={1,2, . . . , m}
. The transition and observation
probability matrices given by,
Pij =P{Xt+1 =j|Xt=i}, i, j ∈ X
Bij =P{Yt=j|Xt=i}, i X , j ∈ Y,(1)
are unknown and they are sampled from an unknown distri-
bution
µ
. Given a fully observed sequence
{(Xt, Yt)}T
t=1
,
a sequence of observations
{Yt}T+T1
t=T+1
and a miscoverage
level
α[0,1]
, our aim is to construct a set of sequences
C1α X T1such that,
Pn{Xt}T+T1
t=T+1 ∈ C1αo1α. (2)
Currently available solution and its limitations: The
conformal prediction framework presented in (Vovk et al.,
2005) provides an elegant solution to the above problem for
the special case where the sequence of states
{Xt}T+T1
t=1
is
exchangeable.
1
More specifically, conformal prediction (re-
viewed in Sec. 2.1) exploits exchangeability of the distri-
bution to permute the data and construct confidence sets.
However, the conformal prediction framework is not directly
applicable to our problem since Markov processes are not
generally exchangeable. A general solution to the above
problem that provably achieves the coverage guarantee
(2)
is not currently available to the best of our knowledge.
Main contributions: We generalize the original confor-
mal prediction framework proposed in (Vovk et al.,2005)
to the non-exchangeable Markovian setting in a principled
manner that provably preserves all theoretical guarantees
offered by the original framework. The key idea behind
the proposed method is to view the process
{(Xt, Yt)}T+T1
t=1
as a mixture of Markov chains with respect to the mix-
ing measure
µ
from which the parameters are sampled and
1
A sequence of random variables
X1, X2, . . .
is exchange-
able, if for any finite permutation
π:{1,...,T}→{1,...,T}
,
P{X1=x1, X2=x2,...,XT=xT}=P{Xπ(1) =
x1, Xπ(2) =x2,...,Xπ(T)=xT}
i.e., the joint probability
distribution is invariant under finite permutations.
1
arXiv:2210.02271v4 [stat.ME] 22 May 2023
Conformal Prediction for Hidden Markov Models with Exact Validity
then use a block-wise permutation scheme under which a
mixture of Markov chains is exchangeable. The proposed
block-wise permutation method is inspired by the notion
of partial exchangeability and the de Finetti’s Theorem for
Markov chains presented in (Diaconis & Freedman,1980).
The proposed method can be used as a wrapper for any
algorithm based on the HMM framework (filtering, predic-
tion, smoothing) and yields the coverage guarantee
(2)
for
any finite length of the time series. The performance of
the proposed method is illustrated with detailed numerical
experiments as well as empirical results based on multiple
real-world datasets.
Motivation: A solution to the above problem is useful in
high-stakes settings where the hidden underlying states of a
Markov process need to be inferred to a given confidence
level using only the noisy measurements and a past calibra-
tion (i.e., training) sequence. Examples of such high-stakes
settings include inferring life-threatening health events via
noisy wearable sensors used for home-based monitoring
of patients considered in (Uddin,2019;Forkan & Khalil,
2017), predicting vehicle movements in automated navi-
gation systems considered in (Yuan et al.,2018), human
safety systems in hazardous work environments considered
in (Petkovi
´
c et al.,2019;Rashid & Behzadan,2018), and
predicting movements in the stock market for making in-
vestment decisions considered in (Hassan & Nath,2005).
Further examples of application settings are discussed in
(Gupta et al.,2016;Batu et al.,2004).
2. Preliminaries and Related Work
In this section, we discuss results from the literature that are
closely related to our work. In particular, we briefly review
the original conformal prediction framework for exchange-
able processes proposed in (Vovk et al.,2005) and some of
its recent extensions.
2.1. Conformal Prediction for Exchangeable Data
The classical conformal prediction framework proposed in
(Vovk et al.,2005) applies to exchangeable data. In par-
ticular, conformal prediction has been widely utilized to
quantify the uncertainty of classifiers that deal with inde-
pendently and identically distributed (IID) processes. Let us
first briefly review how the classical conformal prediction
works in the exchangeable setting.2
Conformal prediction algorithm proposed in (Vovk et al.,
2005): Assume that we are given an exchangeable (e.g., IID)
sequence
{(Xt, Yt)}T
t=1
. For the next observation
YT+1
, we
aim to generate a set
C1α
which contains the unknown un-
2
We refer the reader to (Shafer & Vovk,2008;Angelopoulos
et al.,2023) for detailed tutorial introductions to the classical
conformal prediction framework.
derlying state
XT+1
with a confidence
1α
. In other
words, we are considering the main problem stated in Sec. 1
when the process is exchangeable and
T1= 1
. In order
to implement conformal prediction, we first need to iden-
tify a conformity score function
σ:X × Y R
which
quantifies the agreement between the state
X∈ X
and the
observation
Y∈ Y
: larger
σ(X, Y )
indicates disagreement
while smaller
σ(X, Y )
indicates agreement. The conformity
score function
σ(·,·)
could be based on a given pre-trained
classifier (e.g.,
1σ(X, Y )
could be the
Xth
element of
the softmax output of a neural network for observation
Y
),
or it can also be based on any classifier derived from the cal-
ibration sequence
{(Xt, Yt)}T
t=1
. Next, at time
t=T+ 1
,
assume XT+1 =iand calculate
ˆq(i) = 1
T+ 1
T+1
X
t=1
1(σ(Xt, Yt)σ(XT+1, YT+1)) (3)
which is the quantile of
σ(i, YT+1)
among the conformity
scores of the observation sequence
{(Xt, Yt)}T
t=1
together
with
(i, YT+1)
. After calculating
ˆq(i)
for each state
i∈ X
,
the confidence set for the unknown state
XT+1
correspond-
ing to the observation YT+1 is constructed as,
C1α={i∈ X : (T+ 1)ˆq(i)≤ ⌈(1 α) (T+ 1)⌉} .
(4)
In other words, each state
i∈ X
for which the conformity
score
σ(i, YT+1)
is within the smallest
(1 α) (T+ 1)
among the
{σ(Xt, Yt)}T
t=1
are included in the set
C1α
.
The constructed confidence set
C1α
is guaranteed to con-
tain the true unknown underlying state
XT+1
with probabil-
ity 1αi.e.,
P{XT+1 ∈ C1α} ≥ 1α. (5)
The role of exchangeability in conformal prediction: The
confidence bound
(5)
is guaranteed to be satisfied by the clas-
sical conformal prediction framework due to the exchange-
ability of the sequence
{(Xt, Yt)}T+1
t=1
. To understand this,
observe that,
P{XT+1 ∈ C1α}=Pˆq(XT+1)(1 α) (T+ 1)
T+ 1
=PT+1
X
t=1
1(σ(Xt, Yt)σ(XT+1, YT+1))
≤ ⌈(1 α) (T+ 1)(6)
which follow from
(3)
,
(4)
. Thus,
P{XT+1 ∈ C1α}
is
equal to the probability that the rank of
σ(XT+1, YT+1)
(among
σ(Xt, Yt), t = 1,2, . . . , T
) is less than or equal to
(1 α) (T+ 1)
. Due to the exchangeability of the se-
quence
{(Xt, Yt)}T+1
t=1
, the rank of
σ(XT+1, YT+1)
could
be any integer from
1
to
T+ 1
with equal probability, im-
2
Conformal Prediction for Hidden Markov Models with Exact Validity
plying that, for k= 1 ...,T + 1,
P(T+1
X
t=1
1(σ(Xt, Yt)σ(XT+1, YT+1)) k)=k
T+ 1.
(7)
Then (7) and (6) yield the coverage guarantee in (5).
Therefore, exchangeability is the crucial assumption in the
classical conformal prediction framework. As such, the clas-
sical conformal prediction framework does not guarantee
the desired coverage
(5)
in non-exchangeable settings such
as the HMM setting that we are dealing with.
2.2. Relaxing the Assumption of Exchangeability in
Conformal Prediction
Several works in the literature aimed to generalize the classi-
cal conformal prediction framework summarized in Sec. 2.1
to non-exchangeable settings. We briefly discuss some of
those works that are most relevant to our work below.3
Our work is in particular motivated by the approach pre-
sented in (Chernozhukov et al.,2018) that proposed a
method to extend the conformal prediction framework to
time series data via a randomization method which accounts
for potential temporal dependencies in the data. The key
idea is to construct an algebraic group of block-wise per-
mutations (instead of the element-wise permutations used
in classical conformal prediction) such that each permuta-
tion in that group is likely to preserve the potential tempo-
ral dependencies. Extending the work in (Chernozhukov
et al.,2018) further, (Xu & Xie,2021) proposed to derive
prediction intervals using an ensemble of bootstrapped esti-
mators to avoid having to split data into blocks. However,
when exchangeability assumption fails, the approaches pre-
sented in (Chernozhukov et al.,2018;Xu & Xie,2021) are
only approximately valid (i.e., not guaranteed to satisfy the
bound
(2)
). In contrast, the aim of our work is to devise a
method that is guaranteed to satisfy the bound
(2)
in any
unknown HMM. In particular, the approach we present is
based on constructing a block-wise permutation (which ex-
ploits the de Finetti’s theorem for Markov chains reviewed in
Sec. 3.1) that adapts to the observed sequence
{(Xt, Yt)}T
t=1
in a manner that guarantees the exact exchangeability of the
blocks. As a consequence, the method that we propose is
exactly valid in the sense that it is guaranteed to achieve the
bound (2).
In another direction, (Cherubin & Nouretdinov,2016;
Stankeviciute et al.,2021) proposed to apply the classical
conformal prediction algorithm as a solution when multiple
independent calibration sequences are available. However,
3
We refer the reader to (Fontana et al.,2023) for a comprehen-
sive review of more versions of the classical conformal prediction
framework.
our problem (stated in Sec. 1) assumes that only one real-
ized sequence is available and since a Markov chain is not
exchangeable in general, the classical conformal prediction
framework is not applicable to our setting.
Making conformal prediction robust to changes in the un-
derlying distributions has also received significant attention
in the literature that focuses on generalizing conformal pre-
diction. When the distributions of calibration states and
test states are both exchangeable but different from each
other, (Tibshirani et al.,2019) proposed an approach based
on re-weighting the calibration data with a likelihood ratio
(of test and training distributions). (Cauchois et al.,2020;
Gibbs & Candes,2021;Barber et al.,2022) proposed meth-
ods for even more general settings such as arbitrary number
of changes in both the state and observation distributions,
etc. In contrast to (Tibshirani et al.,2019;Cauchois et al.,
2020;Gibbs & Candes,2021;Barber et al.,2022), our aim
is to extend the conformal prediction framework specifically
to HMMs with exact validity (instead of approximate va-
lidity) for any sequence length. Additionally, our approach
is based on finding exchangeable blocks in the Markovian
data whereas (Tibshirani et al.,2019;Cauchois et al.,2020;
Gibbs & Candes,2021;Barber et al.,2022) utilize meth-
ods such as online updates to reflect the difference between
empirically achieved confidence and target confidence, non-
uniformly weighting calibration data, etc.
3. Quantifying the Uncertainty in Hidden
Markov Models via Conformal Prediction
In this section, we first review the notions of mixtures of
Markov chains and partial exchangeability. We then exploit
a characterization of partial exchangeability in terms of mix-
tures of Markov chains presented in (Diaconis & Freedman,
1980) to extend the classical conformal prediction frame-
work to Hidden Markov Models in a manner that provably
preserves all its key theoretical guarantees.
3.1. Mixtures of Markov Chains and Partial
Exchangeability
This subsection provides a brief review of the main result of
(Diaconis & Freedman,1980) which characterizes a mixture
of Markov chains as a partially exchangeable process.
Formally, a process
X1, X2, . . .
is a mixture of Markov
chains, if there exists a probability measure
µ
on the space
of all
n×n
stochastic matrices
P
(for the state space
X={1,2, . . . , n}) such that,
P{Xt=xtfor 1tT}=ZP
T1
Y
t=1
Pxtxt+1 µ(dP )
(8)
for any sequence of states
x1, x2, . . . , xT∈ X
. (Di-
3
摘要:

ExtendingConformalPredictiontoHiddenMarkovModelswithExactValidityviadeFinetti’sTheoremforMarkovChainsBuddhikaNettasinghe1SamratChatterjee2RamakrishnaTipireddy3MahanteshHalappanavar2AbstractConformalpredictionisawidelyusedmethodtoquantifytheuncertaintyofaclassifierundertheassumptionofexchangeability(...

展开>> 收起<<
Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finettis Theorem for Markov Chains_2.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:562.81KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注