Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finettis Theorem for Markov Chains_2

2025-04-27 0 0 562.81KB 14 页 10玖币

侵权投诉

Extending Conformal Prediction to Hidden Markov Models with Exact Validity

via de Finetti’s Theorem for Markov Chains

Buddhika Nettasinghe 1Samrat Chatterjee 2Ramakrishna Tipireddy 3Mahantesh Halappanavar 2

Abstract

Conformal prediction is a widely used method

to quantify the uncertainty of a classiﬁer under

the assumption of exchangeability (e.g., IID data).

We generalize conformal prediction to the Hid-

den Markov Model (HMM) framework where the

assumption of exchangeability is not valid. The

key idea of the proposed method is to partition

the non-exchangeable Markovian data from the

HMM into exchangeable blocks by exploiting the

de Finetti’s Theorem for Markov Chains discov-

ered by Diaconis and Freedman (1980). The per-

mutations of the exchangeable blocks are viewed

as randomizations of the observed Markovian data

from the HMM. The proposed method provably

retains all desirable theoretical guarantees offered

by the classical conformal prediction framework

in both exchangeable and Markovian settings. In

particular, while the lack of exchangeability in-

troduced by Markovian samples constitutes a vi-

olation of a crucial assumption for classical con-

formal prediction, the proposed method views it

as an advantage that can be exploited to improve

the performance further. Detailed numerical and

empirical results that complement the theoretical

conclusions are provided to illustrate the practical

feasibility of the proposed method.

1. Introduction

This paper extends the conformal prediction framework

proposed in (Vovk et al.,2005) to the Hidden Markov Model

(HMM) framework in a manner that provably preserves all

its desirable theoretical guarantees. In particular, we focus

on the following problem.

Tippie College of Business, University of Iowa, Iowa City

IA 52242, USA

Paciﬁc Northwest National Laboratory, Rich-

land WA 99354, USA

Palo Alto Networks, Santa Clara, CA

95054. Correspondence to: Buddhika Nettasinghe

buddhika-

nettasinghe@uiowa.edu>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

Problem of quantifying the uncertainty of an unknown

HMM: Let

{Xt}t≥1

be a Markov chain taking values in

the space

X={1,2, . . . , n}

that is observed only through

the memoryless observations

{Yt}t≥1

taking values in the

space

Y={1,2, . . . , m}

. The transition and observation

probability matrices given by,

Pij =P{Xt+1 =j|Xt=i}, i, j ∈ X

Bij =P{Yt=j|Xt=i}, i ∈ X , j ∈ Y,(1)

are unknown and they are sampled from an unknown distri-

bution

. Given a fully observed sequence

{(Xt, Yt)}T

t=1

a sequence of observations

{Yt}T+T1

t=T+1

and a miscoverage

level

α∈[0,1]

, our aim is to construct a set of sequences

C1−α⊆ X T1such that,

Pn{Xt}T+T1

t=T+1 ∈ C1−αo≥1−α. (2)

Currently available solution and its limitations: The

conformal prediction framework presented in (Vovk et al.,

2005) provides an elegant solution to the above problem for

the special case where the sequence of states

{Xt}T+T1

t=1

exchangeable.

More speciﬁcally, conformal prediction (re-

viewed in Sec. 2.1) exploits exchangeability of the distri-

bution to permute the data and construct conﬁdence sets.

However, the conformal prediction framework is not directly

applicable to our problem since Markov processes are not

generally exchangeable. A general solution to the above

problem that provably achieves the coverage guarantee

(2)

is not currently available to the best of our knowledge.

Main contributions: We generalize the original confor-

mal prediction framework proposed in (Vovk et al.,2005)

to the non-exchangeable Markovian setting in a principled

manner that provably preserves all theoretical guarantees

offered by the original framework. The key idea behind

the proposed method is to view the process

{(Xt, Yt)}T+T1

t=1

as a mixture of Markov chains with respect to the mix-

ing measure

from which the parameters are sampled and

A sequence of random variables

X1, X2, . . .

is exchange-

able, if for any ﬁnite permutation

π:{1,...,T}→{1,...,T}

P{X1=x1, X2=x2,...,XT=xT}=P{Xπ(1) =

x1, Xπ(2) =x2,...,Xπ(T)=xT}

i.e., the joint probability

distribution is invariant under ﬁnite permutations.

arXiv:2210.02271v4 [stat.ME] 22 May 2023

Conformal Prediction for Hidden Markov Models with Exact Validity

then use a block-wise permutation scheme under which a

mixture of Markov chains is exchangeable. The proposed

block-wise permutation method is inspired by the notion

of partial exchangeability and the de Finetti’s Theorem for

Markov chains presented in (Diaconis & Freedman,1980).

The proposed method can be used as a wrapper for any

algorithm based on the HMM framework (ﬁltering, predic-

tion, smoothing) and yields the coverage guarantee

(2)

for

any ﬁnite length of the time series. The performance of

the proposed method is illustrated with detailed numerical

experiments as well as empirical results based on multiple

real-world datasets.

Motivation: A solution to the above problem is useful in

high-stakes settings where the hidden underlying states of a

Markov process need to be inferred to a given conﬁdence

level using only the noisy measurements and a past calibra-

tion (i.e., training) sequence. Examples of such high-stakes

settings include inferring life-threatening health events via

noisy wearable sensors used for home-based monitoring

of patients considered in (Uddin,2019;Forkan & Khalil,

2017), predicting vehicle movements in automated navi-

gation systems considered in (Yuan et al.,2018), human

safety systems in hazardous work environments considered

in (Petkovi

c et al.,2019;Rashid & Behzadan,2018), and

predicting movements in the stock market for making in-

vestment decisions considered in (Hassan & Nath,2005).

Further examples of application settings are discussed in

(Gupta et al.,2016;Batu et al.,2004).

2. Preliminaries and Related Work

In this section, we discuss results from the literature that are

closely related to our work. In particular, we brieﬂy review

the original conformal prediction framework for exchange-

able processes proposed in (Vovk et al.,2005) and some of

its recent extensions.

2.1. Conformal Prediction for Exchangeable Data

The classical conformal prediction framework proposed in

(Vovk et al.,2005) applies to exchangeable data. In par-

ticular, conformal prediction has been widely utilized to

quantify the uncertainty of classiﬁers that deal with inde-

pendently and identically distributed (IID) processes. Let us

ﬁrst brieﬂy review how the classical conformal prediction

works in the exchangeable setting.2

Conformal prediction algorithm proposed in (Vovk et al.,

2005): Assume that we are given an exchangeable (e.g., IID)

sequence

{(Xt, Yt)}T

t=1

. For the next observation

YT+1

, we

aim to generate a set

C1−α

which contains the unknown un-

We refer the reader to (Shafer & Vovk,2008;Angelopoulos

et al.,2023) for detailed tutorial introductions to the classical

conformal prediction framework.

derlying state

XT+1

with a conﬁdence

1−α

. In other

words, we are considering the main problem stated in Sec. 1

when the process is exchangeable and

T1= 1

. In order

to implement conformal prediction, we ﬁrst need to iden-

tify a conformity score function

σ:X × Y → R

which

quantiﬁes the agreement between the state

X∈ X

and the

observation

Y∈ Y

: larger

σ(X, Y )

indicates disagreement

while smaller

σ(X, Y )

indicates agreement. The conformity

score function

σ(·,·)

could be based on a given pre-trained

classiﬁer (e.g.,

1−σ(X, Y )

could be the

Xth

element of

the softmax output of a neural network for observation

or it can also be based on any classiﬁer derived from the cal-

ibration sequence

{(Xt, Yt)}T

t=1

. Next, at time

t=T+ 1

assume XT+1 =iand calculate

ˆq(i) = 1

T+ 1

T+1

t=1

1(σ(Xt, Yt)≤σ(XT+1, YT+1)) (3)

which is the quantile of

σ(i, YT+1)

among the conformity

scores of the observation sequence

{(Xt, Yt)}T

t=1

together

with

(i, YT+1)

. After calculating

ˆq(i)

for each state

i∈ X

the conﬁdence set for the unknown state

XT+1

correspond-

ing to the observation YT+1 is constructed as,

C1−α={i∈ X : (T+ 1)ˆq(i)≤ ⌈(1 −α) (T+ 1)⌉} .

(4)

In other words, each state

i∈ X

for which the conformity

score

σ(i, YT+1)

is within the smallest

⌈(1 −α) (T+ 1)⌉

among the

{σ(Xt, Yt)}T

t=1

are included in the set

C1−α

The constructed conﬁdence set

C1−α

is guaranteed to con-

tain the true unknown underlying state

XT+1

with probabil-

ity 1−αi.e.,

P{XT+1 ∈ C1−α} ≥ 1−α. (5)

The role of exchangeability in conformal prediction: The

conﬁdence bound

(5)

is guaranteed to be satisﬁed by the clas-

sical conformal prediction framework due to the exchange-

ability of the sequence

{(Xt, Yt)}T+1

t=1

. To understand this,

observe that,

P{XT+1 ∈ C1−α}=Pˆq(XT+1)≤⌈(1 −α) (T+ 1)⌉

T+ 1 

=PT+1

t=1

1(σ(Xt, Yt)≤σ(XT+1, YT+1))

≤ ⌈(1 −α) (T+ 1)⌉(6)

which follow from

(3)

(4)

. Thus,

P{XT+1 ∈ C1−α}

equal to the probability that the rank of

σ(XT+1, YT+1)

(among

σ(Xt, Yt), t = 1,2, . . . , T

) is less than or equal to

⌈(1 −α) (T+ 1)⌉

. Due to the exchangeability of the se-

quence

{(Xt, Yt)}T+1

t=1

, the rank of

σ(XT+1, YT+1)

could

be any integer from

T+ 1

with equal probability, im-

Conformal Prediction for Hidden Markov Models with Exact Validity

plying that, for k= 1 ...,T + 1,

P(T+1

t=1

1(σ(Xt, Yt)≤σ(XT+1, YT+1)) ≤k)=k

T+ 1.

(7)

Then (7) and (6) yield the coverage guarantee in (5).

Therefore, exchangeability is the crucial assumption in the

classical conformal prediction framework. As such, the clas-

sical conformal prediction framework does not guarantee

the desired coverage

(5)

in non-exchangeable settings such

as the HMM setting that we are dealing with.

2.2. Relaxing the Assumption of Exchangeability in

Conformal Prediction

Several works in the literature aimed to generalize the classi-

cal conformal prediction framework summarized in Sec. 2.1

to non-exchangeable settings. We brieﬂy discuss some of

those works that are most relevant to our work below.3

Our work is in particular motivated by the approach pre-

sented in (Chernozhukov et al.,2018) that proposed a

method to extend the conformal prediction framework to

time series data via a randomization method which accounts

for potential temporal dependencies in the data. The key

idea is to construct an algebraic group of block-wise per-

mutations (instead of the element-wise permutations used

in classical conformal prediction) such that each permuta-

tion in that group is likely to preserve the potential tempo-

ral dependencies. Extending the work in (Chernozhukov

et al.,2018) further, (Xu & Xie,2021) proposed to derive

prediction intervals using an ensemble of bootstrapped esti-

mators to avoid having to split data into blocks. However,

when exchangeability assumption fails, the approaches pre-

sented in (Chernozhukov et al.,2018;Xu & Xie,2021) are

only approximately valid (i.e., not guaranteed to satisfy the

bound

(2)

). In contrast, the aim of our work is to devise a

method that is guaranteed to satisfy the bound

(2)

in any

unknown HMM. In particular, the approach we present is

based on constructing a block-wise permutation (which ex-

ploits the de Finetti’s theorem for Markov chains reviewed in

Sec. 3.1) that adapts to the observed sequence

{(Xt, Yt)}T

t=1

in a manner that guarantees the exact exchangeability of the

blocks. As a consequence, the method that we propose is

exactly valid in the sense that it is guaranteed to achieve the

bound (2).

In another direction, (Cherubin & Nouretdinov,2016;

Stankeviciute et al.,2021) proposed to apply the classical

conformal prediction algorithm as a solution when multiple

independent calibration sequences are available. However,

We refer the reader to (Fontana et al.,2023) for a comprehen-

sive review of more versions of the classical conformal prediction

framework.

our problem (stated in Sec. 1) assumes that only one real-

ized sequence is available and since a Markov chain is not

exchangeable in general, the classical conformal prediction

framework is not applicable to our setting.

Making conformal prediction robust to changes in the un-

derlying distributions has also received signiﬁcant attention

in the literature that focuses on generalizing conformal pre-

diction. When the distributions of calibration states and

test states are both exchangeable but different from each

other, (Tibshirani et al.,2019) proposed an approach based

on re-weighting the calibration data with a likelihood ratio

(of test and training distributions). (Cauchois et al.,2020;

Gibbs & Candes,2021;Barber et al.,2022) proposed meth-

ods for even more general settings such as arbitrary number

of changes in both the state and observation distributions,

etc. In contrast to (Tibshirani et al.,2019;Cauchois et al.,

2020;Gibbs & Candes,2021;Barber et al.,2022), our aim

is to extend the conformal prediction framework speciﬁcally

to HMMs with exact validity (instead of approximate va-

lidity) for any sequence length. Additionally, our approach

is based on ﬁnding exchangeable blocks in the Markovian

data whereas (Tibshirani et al.,2019;Cauchois et al.,2020;

Gibbs & Candes,2021;Barber et al.,2022) utilize meth-

ods such as online updates to reﬂect the difference between

empirically achieved conﬁdence and target conﬁdence, non-

uniformly weighting calibration data, etc.

3. Quantifying the Uncertainty in Hidden

Markov Models via Conformal Prediction

In this section, we ﬁrst review the notions of mixtures of

Markov chains and partial exchangeability. We then exploit

a characterization of partial exchangeability in terms of mix-

tures of Markov chains presented in (Diaconis & Freedman,

1980) to extend the classical conformal prediction frame-

work to Hidden Markov Models in a manner that provably

preserves all its key theoretical guarantees.

3.1. Mixtures of Markov Chains and Partial

Exchangeability

This subsection provides a brief review of the main result of

(Diaconis & Freedman,1980) which characterizes a mixture

of Markov chains as a partially exchangeable process.

Formally, a process

X1, X2, . . .

is a mixture of Markov

chains, if there exists a probability measure

on the space

of all

n×n

stochastic matrices

(for the state space

X={1,2, . . . , n}) such that,

P{Xt=xtfor 1≤t≤T}=ZP

T−1

t=1

Pxtxt+1 µ(dP )

(8)

for any sequence of states

x1, x2, . . . , xT∈ X

. (Di-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExtendingConformalPredictiontoHiddenMarkovModelswithExactValidityviadeFinetti’sTheoremforMarkovChainsBuddhikaNettasinghe1SamratChatterjee2RamakrishnaTipireddy3MahanteshHalappanavar2AbstractConformalpredictionisawidelyusedmethodtoquantifytheuncertaintyofaclassifierundertheassumptionofexchangeability(...

展开>> 收起<<

Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finettis Theorem for Markov Chains_2.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finettis Theorem for Markov Chains_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: