Self-explaining Hierarchical Model for Intraoperative Time Series Dingwen Li Bing Xue Christopher Kingy Bradley Fritzy Michael Avidany Joanna Abrahamy Chenyang Lu

2025-05-03 1 0 812.67KB 11 页 10玖币

侵权投诉

Self-explaining Hierarchical Model for

Intraoperative Time Series

Dingwen Li∗, Bing Xue∗, Christopher King†, Bradley Fritz†, Michael Avidan†, Joanna Abraham†, Chenyang Lu∗

∗McKelvey School of Engineering, Washington University in St. Louis

†School of Medicine, Washington University in St. Louis

{dingwenli, xuebing, christopherking, bafritz, avidanm, joannaa, lu}@wustl.edu

Abstract—Major postoperative complications are devastating

to surgical patients. Some of these complications are potentially

preventable via early predictions based on intraoperative data.

However, intraoperative data comprise long and ﬁne-grained

multivariate time series, prohibiting the effective learning of

accurate models. The large gaps associated with clinical events

and protocols are usually ignored. Moreover, deep models

generally lack transparency. Nevertheless, the interpretability

is crucial to assist clinicians in planning for and delivering

postoperative care and timely interventions. Towards this end,

we propose a hierarchical model combining the strength of both

attention and recurrent models for intraoperative time series.

We further develop an explanation module for the hierarchical

model to interpret the predictions by providing contributions of

intraoperative data in a ﬁne-grained manner. Experiments on a

large dataset of 111,888 surgeries with multiple outcomes and an

external high-resolution ICU dataset show that our model can

achieve strong predictive performance (i.e., high accuracy) and

offer robust interpretations (i.e., high transparency) for predicted

outcomes based on intraoperative time series.

I. INTRODUCTION

Major postoperative complications are devastating to sur-

gical patients with increased mortality risk, need for care,

length of postoperative hospital stay and costs of care [1],

[2]. With massive electronic intraoperative data and recent

advances in machine learning, some of these complications are

potentially preventable via early predictions [3]. Intraoperative

data comprise long and ﬁne-grained multivariate time series,

such as vital signs and medications. For example, Figure 1

visualizes the intraoperative data collected for a surgical case,

which lasts for longer than 600 minutes at a sampling rate up

to one per minute. Furthermore, there are large gaps consisting

of many consecutive missing values. These gaps are often

associated with the surgical procedure or clinical events that

require different variables to be monitored at different stages

of the surgery.

It is challenging to learn effective representations from the

long time series as modeling latent patterns from high temporal

complexity is hard. Recurrent neural networks (RNNs) have

been widely used for learning dynamics from the sequential

input. However, hundreds of time steps prohibit RNNs from

learning accurate representation, due to the vanishing gradient

issue. A common approach to tackle the long input sequence

for RNNs is to add convolutional layers before the recurrent

The last author is the corresponding author.

Fig. 1: An example of long intraoperative time series with

large gaps. Blue dots represent measurements collected from

a surgical case.

layers [4], [5]. However, the introduction of a stack of convolu-

tional layers before recurrent layers increases the complexity

leading to vanishing gradients. Another alternative to RNN

is the attention approach. Attention, e.g., Transformer [6], can

capture salient data patterns by skipping recurrent connections,

thus avoiding the vanishing gradient issue brought by the long-

term dependencies. Nevertheless, pure attention models cannot

exploit long-term progression patterns of intraoperative time

series, which are informative given the nature of physiological

changes during the operation.

Another challenge of learning with intraoperative time series

is associated with the large data gaps commonly observed in

intraoperative time series. While imputation techniques have

been investigated extensively to estimate missing values, they

cannot preserve the information carried by the data gaps.

The information may be exploited by predictive models given

their potential association with surgical procedures and clinical

events.

Furthermore, the interpretability of machine learning mod-

els, as explaining which and how input variables contribute to

the predictive outcomes, is crucial to the clinicians. A good

explanation helps clinicians understand the risk factors, thus

knowing how to plan for and deliver postoperative care and

timely interventions. Despite the invention of model-agnostic

explanation methods [7], [8], attribution methods tailored for

deep models [9], [10], and self-explaining models [4], [11]–

[15], it remains challenging to generate accurate explanations

identifying important data segments in ﬁne-grained time series

that can beneﬁt clinicians and medical research.

In this paper, we propose a novel Self-Explaining

Hierarchical Model (SEHM) to learn representations from

arXiv:2210.04417v1 [cs.LG] 10 Oct 2022

long multivariate time series with large gaps and generate ac-

curate explanations pinpointing the clinically meaningful data

points. The hierarchical model comprises a kernelized local

attention and a recurrent layer, which effectively 1) captures

local patterns while reducing the size of the intermediate repre-

sentations via the attention and 2) learns long-term progression

dynamics via the recurrent module. To make the model end-to-

end interpretable, we design a linear approximating network

parallel to the recurrent module that models the behavior of a

recurrent module locally.

We evaluate SEHM on an extensive dataset from a ma-

jor research hospital with experiments on predicting three

postoperative complications and High time Resolution ICU

Dataset (HiRID) [16] on predicting circulatory failure. In the

evaluation, we show SEHM outperforms other state-of-the-

art models in predictive performance. We also demonstrate

the proposed model achieves better computational efﬁciency,

which would be an advantage in supporting clinical decisions

for perioperative care. We evaluate the model interpretabil-

ity through both quantitative evaluation on the dataset and

clinician reviews of exemplar surgical cases. Results suggest

the advantage of SEHM over existing model interpretation

approaches in identifying data samples in the input time series

with potential clinical importance.

The main contributions of our work are four-fold: (1) we

present a novel hierarchical model with kernelized local at-

tention to effectively learn representations from intraoperative

time series; (2) we signiﬁcantly improve the computational

efﬁciency of the hierarchical model by reducing the size of

intermediate learned representation to the recurrent layer; (3)

we propose a linear approximating network to model the

behavior of the RNN module, which can be integrated with

the kernelized local attention to establish an end-to-end inter-

pretable model with three theoretical properties guaranteed; (4)

we evaluate SEHM with experiments from both computational

as well as clinical perspectives and demonstrate the end-to-

end interpretability of SEHM on large datasets with multiple

predictive outcomes.

II. RELATED WORK

In this section, we review the literature from three perspec-

tives: A) models designed for handling long sequential data,

B) techniques for handling missing values in time series, and

C) model interpretation techniques and self-explaining models.

Traditional RNN models are widely used for learning with

sequential data. However, they are ineffective when dealing

with long sequential data due to the vanishing gradient issue

and computation cost of recurrent operations. Temporal con-

volutional network (TCN), e.g., WaveNet [17], can capture

long-range temporal dependencies via dilated causal convo-

lutions. A more recent work suggests that TCN outperforms

RNN in various prediction problems based on sequential data,

particularly when the input sequences are long [18]. However,

TCN models rely on deep hierarchy to ensure the causal

convolutions and thus achieve large receptive ﬁelds. Deep

hierarchy, namely a large stack of layers, incurs signiﬁcant

computation cost for inference at run time. Efﬁcient attention

models adapted from Transformer [6] have been proposed

recently for learning representations from long sequential data,

which mainly focus on replacing the quadratic dot-product

attention calculation with more efﬁcient operations [19], [20].

In this work, SEHM builds on previous insights and introduces

a hierarchical model that integrates kernelized local attention

and RNN. Kernelized local attention captures important local

patterns and reduces the size of intermediate representation,

while the higher-level RNN model learns long-term dynamics.

As a result, SEHM can achieve better predictive performance

and computational efﬁciency when learning and inferring from

long multivariate intraoperative time series.

Missing values are prevalent in clinical data. They provide

both challenges and information for predicting clinical out-

comes. Standalone imputation models [21]–[23] impute miss-

ing values at the preprocessing stage. However, imputation in

the preprocessing stage prevents models from exploiting pre-

dictive information associated with gaps. Recently, researchers

introduced imputation approaches that can be integrated with

predictive models in an end-to-end manner. RNN-based im-

putation models, such as GRU-D [24] and BRITS [25],

demonstrate better performance when learning on sequential

data with missing values. However, the recurrent nature of

these models makes it difﬁcult to perform imputation and

predictions on long sequences. An alternative to imputation

is to treat data with missing values as irregularly sampled

time series. In this direction, models like multi-task Gaussian

process RNN (MGP-RNN) [26] and neural ordinal differential

equations (ODE) based RNN [27] have been proposed to ac-

commodate the irregularity by creating evenly-sampled latent

values. However, these models are computationally prohibitive

for long sequences as they either operate with a very large

covariance matrix or forward intermediate values to an ODE

solver numerous times. We note that the aforementioned

imputation approaches are not suitable for handling large

gaps in time series that are common in intraoperative data,

because uncertainty in missing values grows with the time

elapsed from the last observed data. Moreover, the large

gaps in intraoperative time series may reﬂect information of

the surgery. In the design of kernelized local attention, we

overcome this issue by taking advantage of the characteristics

of locality and using 0s to represent the missing values. This

design can encode the gap information, which helps capture

clinical information associated with the gaps.

Several approaches have been proposed for interpreting

the predictions made by machine learning models, including

model-agnostic approaches and feature attribution approaches

designed for deep models. Model-agnostic explanation ap-

proaches, such as LIME [7] and SHAP [8], provide gen-

eral frameworks for different models while treating them

as black-box models. There are also feature attribution ap-

proaches designed for interpreting neural networks [9], [10],

[28], [29]. Deep models are not always black boxes. When

properly designed attention models can be explainable by

itself. Self-explaining models allow predictions be interpreted

using attention matrices directly [4], [11]–[14]. In particular,

RAIM [4], HiTANet [11] and STAM [14] are self-explaining

attention models designed for interpreting clinical outcome

predictions. Alvarez-Melis et al. propose self-explaining neu-

ral networks (SENN) [15] that have relevance parametrizers

for interpretability, which can be optimized jointly with the

classiﬁcation objective. However, these self-explaining deep

models are not interpretable end-to-end. In the aforementioned

models, the explanations are generated for concept bases [15]

or intermediate representations [4], [11], [14], instead of raw

inputs. The concepts bases [15] or intermediate representa-

tions [4], [11], [14] do not necessarily reﬂect the contributions

of raw inputs to the predictive outcomes due to the non-linear

transformation from the raw inputs to concepts bases [15] or

intermediate representations [4], [11], [14].

In contrast, our SEHM is speciﬁcally designed to provide

end-to-end interpretability by generating decomposed data

contribution matrices associated with raw inputs in a linear

way. SEHM also comes with theoretical properties guarantee-

ing the quality of interpretability, which are not covered by

the existing self-explaining models. We note that end-to-end

interpretability is crucial for clinical applications as clinicians

usually need to review the original clinical data to interpret

the predictions.

III. SELF-EXPLAINING HIERARCHICAL MODEL

SEHM comprises three key components: 1) kernelized local

attention that captures important local patterns, preserves

information about the data gaps, and reduces the computational

complexity; 2) a recurrent layer that learns the long-term

dynamics; 3) a linear approximating network for interpret-

ing the recurrent layer locally. As shown in Figure 2, the

input high-resolution time series ﬁrstly go through multiple

kernelized local attention modules in parallel, the outputs of

which are concatenated as an intermediate output via multi-

head operations. The intermediate output is used as input to

both recurrent layers and the linear approximating network.

The cross-entropy loss and approximation loss are used for

classiﬁcation task and interpreting RNN, respectively.

A. Kernelized Local Attention

High-resolution clinical time series, such as intraoperative

time series, usually have a length of over one hundred minutes.

Such long sequences are prohibitive to traditional deep models,

e.g. recurrent neural networks and attention mechanism, due to

the computational complexity and vanishing gradient problem.

In order to effectively and efﬁciently learn useful representa-

tions from the high-resolution clinical time series, we propose

a kernelized local attention with the ability of exploiting short-

term patterns in a temporal neighborhood via the locality

structure and signiﬁcantly reducing the dot-product attention’s

notorious quadratic complexity to linear via kernelization.

Assume we have a two-dimensional multivariate time-series

input x∈RT×D. In order to calculate the attention out of the

neighbors, we reshape the input to three-dimensional tensor

˜x∈RL×C×D, such that T=L×C. This essentially enforces

⋯

Multi-head Operations

CE Loss

Multivariate

Time-series Input

Multivariate

Time-series Input

"!#

"!$

RNN

Sequential

Input

⋯

RNN Cell

Linear Approx. NN

Approx. Loss

≈

θ(z)

⨀

β(x)

Kernelized Local Attention

Concat !wT

iκ(Qi,K

"jκ(Qi,Kj

i)#Dh

i=1

Fig. 2: The overview of Self-explaining Hierarchical Model

(SEHM) with multi-head kernelized local attention and linear

approximating network

the attention weights attending to the neighbors with size C

and outputs Lcomputed attentions. The beneﬁts are two-fold.

On one hand, self-attention allows each time step to interact

with all its neighbors, which signiﬁcantly reduces the informa-

tion decay compared to RNN models. On the other hand, the

attention weights can be associated with each neighboring time

step, which allows direct interpretation on which time steps

contribute most to the ﬁnal outcomes. The attention matrix can

be formulated as a positive-deﬁnite kernel κ(qi, kj), such that

qiand kjare the i-th vector in the query matrix and j-th vector

in key matrix calculated from the localized expression of input

˜x. We deﬁne the kernelized attention as an expectation over

an inner product of a randomized feature map φ:RD→RR

as R > 0:

κ(qi, kj) = Eω∼D[φ(qi)Tφ(kj)] (1)

where Dis a distribution from which ωis sampled i.i.d. Thus

the attention can be formulated as a weighted sum over the

latent dimension (usually the temporal dimension):

ai=

j=1

κ(qi, kj)

j0=1 κ(qi, kj0)vj=

E[φ(qi)TPC

j=1 φ(kj)vj]

E[φ(qi)TPC

j0=1 φ(kj0)] .

(2)

After reordering products and reusing PC

j=1 φ(kj)vjand

j0=1 φ(kj0)for each i, the time and memory complexity

can be reduced to O(C)[20], [30]. Based on the kernel view,

the Transformer’s softmax function of QTKcan be approxi-

mated by kernel functions of randomized feature maps [20],

[30]. In particular, the kernel function in Eq.(2) unbiasedly

approximates the exponential of the dot product in softmax

attention by drawing feature vectors from a zero-mean Gaus-

sian distribution ω∼ N(0, ID)

exp(qT

ikj) = Eω∼N (0,ID)[φ(qi)Tφ(kj)],

s.t. φ(z) = exp(ωTz−kzk2

2), z =qior kj.(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Self-explainingHierarchicalModelforIntraoperativeTimeSeriesDingwenLi,BingXue,ChristopherKingy,BradleyFritzy,MichaelAvidany,JoannaAbrahamy,ChenyangLuMcKelveySchoolofEngineering,WashingtonUniversityinSt.LouisySchoolofMedicine,WashingtonUniversityinSt.Louisfdingwenli,xuebing,christopherking,bafritz...

展开>> 收起<<

Self-explaining Hierarchical Model for Intraoperative Time Series Dingwen Li Bing Xue Christopher Kingy Bradley Fritzy Michael Avidany Joanna Abrahamy Chenyang Lu.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Self-explaining Hierarchical Model for Intraoperative Time Series Dingwen Li Bing Xue Christopher Kingy Bradley Fritzy Michael Avidany Joanna Abrahamy Chenyang Lu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: