Self-explaining Hierarchical Model for Intraoperative Time Series Dingwen Li Bing Xue Christopher Kingy Bradley Fritzy Michael Avidany Joanna Abrahamy Chenyang Lu

2025-05-03 0 0 812.67KB 11 页 10玖币
侵权投诉
Self-explaining Hierarchical Model for
Intraoperative Time Series
Dingwen Li, Bing Xue, Christopher King, Bradley Fritz, Michael Avidan, Joanna Abraham, Chenyang Lu
McKelvey School of Engineering, Washington University in St. Louis
School of Medicine, Washington University in St. Louis
{dingwenli, xuebing, christopherking, bafritz, avidanm, joannaa, lu}@wustl.edu
Abstract—Major postoperative complications are devastating
to surgical patients. Some of these complications are potentially
preventable via early predictions based on intraoperative data.
However, intraoperative data comprise long and fine-grained
multivariate time series, prohibiting the effective learning of
accurate models. The large gaps associated with clinical events
and protocols are usually ignored. Moreover, deep models
generally lack transparency. Nevertheless, the interpretability
is crucial to assist clinicians in planning for and delivering
postoperative care and timely interventions. Towards this end,
we propose a hierarchical model combining the strength of both
attention and recurrent models for intraoperative time series.
We further develop an explanation module for the hierarchical
model to interpret the predictions by providing contributions of
intraoperative data in a fine-grained manner. Experiments on a
large dataset of 111,888 surgeries with multiple outcomes and an
external high-resolution ICU dataset show that our model can
achieve strong predictive performance (i.e., high accuracy) and
offer robust interpretations (i.e., high transparency) for predicted
outcomes based on intraoperative time series.
I. INTRODUCTION
Major postoperative complications are devastating to sur-
gical patients with increased mortality risk, need for care,
length of postoperative hospital stay and costs of care [1],
[2]. With massive electronic intraoperative data and recent
advances in machine learning, some of these complications are
potentially preventable via early predictions [3]. Intraoperative
data comprise long and fine-grained multivariate time series,
such as vital signs and medications. For example, Figure 1
visualizes the intraoperative data collected for a surgical case,
which lasts for longer than 600 minutes at a sampling rate up
to one per minute. Furthermore, there are large gaps consisting
of many consecutive missing values. These gaps are often
associated with the surgical procedure or clinical events that
require different variables to be monitored at different stages
of the surgery.
It is challenging to learn effective representations from the
long time series as modeling latent patterns from high temporal
complexity is hard. Recurrent neural networks (RNNs) have
been widely used for learning dynamics from the sequential
input. However, hundreds of time steps prohibit RNNs from
learning accurate representation, due to the vanishing gradient
issue. A common approach to tackle the long input sequence
for RNNs is to add convolutional layers before the recurrent
The last author is the corresponding author.
Fig. 1: An example of long intraoperative time series with
large gaps. Blue dots represent measurements collected from
a surgical case.
layers [4], [5]. However, the introduction of a stack of convolu-
tional layers before recurrent layers increases the complexity
leading to vanishing gradients. Another alternative to RNN
is the attention approach. Attention, e.g., Transformer [6], can
capture salient data patterns by skipping recurrent connections,
thus avoiding the vanishing gradient issue brought by the long-
term dependencies. Nevertheless, pure attention models cannot
exploit long-term progression patterns of intraoperative time
series, which are informative given the nature of physiological
changes during the operation.
Another challenge of learning with intraoperative time series
is associated with the large data gaps commonly observed in
intraoperative time series. While imputation techniques have
been investigated extensively to estimate missing values, they
cannot preserve the information carried by the data gaps.
The information may be exploited by predictive models given
their potential association with surgical procedures and clinical
events.
Furthermore, the interpretability of machine learning mod-
els, as explaining which and how input variables contribute to
the predictive outcomes, is crucial to the clinicians. A good
explanation helps clinicians understand the risk factors, thus
knowing how to plan for and deliver postoperative care and
timely interventions. Despite the invention of model-agnostic
explanation methods [7], [8], attribution methods tailored for
deep models [9], [10], and self-explaining models [4], [11]–
[15], it remains challenging to generate accurate explanations
identifying important data segments in fine-grained time series
that can benefit clinicians and medical research.
In this paper, we propose a novel Self-Explaining
Hierarchical Model (SEHM) to learn representations from
arXiv:2210.04417v1 [cs.LG] 10 Oct 2022
long multivariate time series with large gaps and generate ac-
curate explanations pinpointing the clinically meaningful data
points. The hierarchical model comprises a kernelized local
attention and a recurrent layer, which effectively 1) captures
local patterns while reducing the size of the intermediate repre-
sentations via the attention and 2) learns long-term progression
dynamics via the recurrent module. To make the model end-to-
end interpretable, we design a linear approximating network
parallel to the recurrent module that models the behavior of a
recurrent module locally.
We evaluate SEHM on an extensive dataset from a ma-
jor research hospital with experiments on predicting three
postoperative complications and High time Resolution ICU
Dataset (HiRID) [16] on predicting circulatory failure. In the
evaluation, we show SEHM outperforms other state-of-the-
art models in predictive performance. We also demonstrate
the proposed model achieves better computational efficiency,
which would be an advantage in supporting clinical decisions
for perioperative care. We evaluate the model interpretabil-
ity through both quantitative evaluation on the dataset and
clinician reviews of exemplar surgical cases. Results suggest
the advantage of SEHM over existing model interpretation
approaches in identifying data samples in the input time series
with potential clinical importance.
The main contributions of our work are four-fold: (1) we
present a novel hierarchical model with kernelized local at-
tention to effectively learn representations from intraoperative
time series; (2) we significantly improve the computational
efficiency of the hierarchical model by reducing the size of
intermediate learned representation to the recurrent layer; (3)
we propose a linear approximating network to model the
behavior of the RNN module, which can be integrated with
the kernelized local attention to establish an end-to-end inter-
pretable model with three theoretical properties guaranteed; (4)
we evaluate SEHM with experiments from both computational
as well as clinical perspectives and demonstrate the end-to-
end interpretability of SEHM on large datasets with multiple
predictive outcomes.
II. RELATED WORK
In this section, we review the literature from three perspec-
tives: A) models designed for handling long sequential data,
B) techniques for handling missing values in time series, and
C) model interpretation techniques and self-explaining models.
Traditional RNN models are widely used for learning with
sequential data. However, they are ineffective when dealing
with long sequential data due to the vanishing gradient issue
and computation cost of recurrent operations. Temporal con-
volutional network (TCN), e.g., WaveNet [17], can capture
long-range temporal dependencies via dilated causal convo-
lutions. A more recent work suggests that TCN outperforms
RNN in various prediction problems based on sequential data,
particularly when the input sequences are long [18]. However,
TCN models rely on deep hierarchy to ensure the causal
convolutions and thus achieve large receptive fields. Deep
hierarchy, namely a large stack of layers, incurs significant
computation cost for inference at run time. Efficient attention
models adapted from Transformer [6] have been proposed
recently for learning representations from long sequential data,
which mainly focus on replacing the quadratic dot-product
attention calculation with more efficient operations [19], [20].
In this work, SEHM builds on previous insights and introduces
a hierarchical model that integrates kernelized local attention
and RNN. Kernelized local attention captures important local
patterns and reduces the size of intermediate representation,
while the higher-level RNN model learns long-term dynamics.
As a result, SEHM can achieve better predictive performance
and computational efficiency when learning and inferring from
long multivariate intraoperative time series.
Missing values are prevalent in clinical data. They provide
both challenges and information for predicting clinical out-
comes. Standalone imputation models [21]–[23] impute miss-
ing values at the preprocessing stage. However, imputation in
the preprocessing stage prevents models from exploiting pre-
dictive information associated with gaps. Recently, researchers
introduced imputation approaches that can be integrated with
predictive models in an end-to-end manner. RNN-based im-
putation models, such as GRU-D [24] and BRITS [25],
demonstrate better performance when learning on sequential
data with missing values. However, the recurrent nature of
these models makes it difficult to perform imputation and
predictions on long sequences. An alternative to imputation
is to treat data with missing values as irregularly sampled
time series. In this direction, models like multi-task Gaussian
process RNN (MGP-RNN) [26] and neural ordinal differential
equations (ODE) based RNN [27] have been proposed to ac-
commodate the irregularity by creating evenly-sampled latent
values. However, these models are computationally prohibitive
for long sequences as they either operate with a very large
covariance matrix or forward intermediate values to an ODE
solver numerous times. We note that the aforementioned
imputation approaches are not suitable for handling large
gaps in time series that are common in intraoperative data,
because uncertainty in missing values grows with the time
elapsed from the last observed data. Moreover, the large
gaps in intraoperative time series may reflect information of
the surgery. In the design of kernelized local attention, we
overcome this issue by taking advantage of the characteristics
of locality and using 0s to represent the missing values. This
design can encode the gap information, which helps capture
clinical information associated with the gaps.
Several approaches have been proposed for interpreting
the predictions made by machine learning models, including
model-agnostic approaches and feature attribution approaches
designed for deep models. Model-agnostic explanation ap-
proaches, such as LIME [7] and SHAP [8], provide gen-
eral frameworks for different models while treating them
as black-box models. There are also feature attribution ap-
proaches designed for interpreting neural networks [9], [10],
[28], [29]. Deep models are not always black boxes. When
properly designed attention models can be explainable by
itself. Self-explaining models allow predictions be interpreted
using attention matrices directly [4], [11]–[14]. In particular,
RAIM [4], HiTANet [11] and STAM [14] are self-explaining
attention models designed for interpreting clinical outcome
predictions. Alvarez-Melis et al. propose self-explaining neu-
ral networks (SENN) [15] that have relevance parametrizers
for interpretability, which can be optimized jointly with the
classification objective. However, these self-explaining deep
models are not interpretable end-to-end. In the aforementioned
models, the explanations are generated for concept bases [15]
or intermediate representations [4], [11], [14], instead of raw
inputs. The concepts bases [15] or intermediate representa-
tions [4], [11], [14] do not necessarily reflect the contributions
of raw inputs to the predictive outcomes due to the non-linear
transformation from the raw inputs to concepts bases [15] or
intermediate representations [4], [11], [14].
In contrast, our SEHM is specifically designed to provide
end-to-end interpretability by generating decomposed data
contribution matrices associated with raw inputs in a linear
way. SEHM also comes with theoretical properties guarantee-
ing the quality of interpretability, which are not covered by
the existing self-explaining models. We note that end-to-end
interpretability is crucial for clinical applications as clinicians
usually need to review the original clinical data to interpret
the predictions.
III. SELF-EXPLAINING HIERARCHICAL MODEL
SEHM comprises three key components: 1) kernelized local
attention that captures important local patterns, preserves
information about the data gaps, and reduces the computational
complexity; 2) a recurrent layer that learns the long-term
dynamics; 3) a linear approximating network for interpret-
ing the recurrent layer locally. As shown in Figure 2, the
input high-resolution time series firstly go through multiple
kernelized local attention modules in parallel, the outputs of
which are concatenated as an intermediate output via multi-
head operations. The intermediate output is used as input to
both recurrent layers and the linear approximating network.
The cross-entropy loss and approximation loss are used for
classification task and interpreting RNN, respectively.
A. Kernelized Local Attention
High-resolution clinical time series, such as intraoperative
time series, usually have a length of over one hundred minutes.
Such long sequences are prohibitive to traditional deep models,
e.g. recurrent neural networks and attention mechanism, due to
the computational complexity and vanishing gradient problem.
In order to effectively and efficiently learn useful representa-
tions from the high-resolution clinical time series, we propose
a kernelized local attention with the ability of exploiting short-
term patterns in a temporal neighborhood via the locality
structure and significantly reducing the dot-product attention’s
notorious quadratic complexity to linear via kernelization.
Assume we have a two-dimensional multivariate time-series
input xRT×D. In order to calculate the attention out of the
neighbors, we reshape the input to three-dimensional tensor
˜xRL×C×D, such that T=L×C. This essentially enforces
Multi-head Operations
CE Loss
Multivariate
Time-series Input
Multivariate
Time-series Input
!!
"!#
"!$
"
RNN
Sequential
Input
RNN Cell
Linear Approx. NN
Approx. Loss
θ(z)
β(x)
Kernelized Local Attention
!
"
#
"!
Kernelized Local Attention
!
"
#
"!
Concat !wT
iκ(Qi,K
i)
"jκ(Qi,Kj
i)#Dh
i=1
Fig. 2: The overview of Self-explaining Hierarchical Model
(SEHM) with multi-head kernelized local attention and linear
approximating network
the attention weights attending to the neighbors with size C
and outputs Lcomputed attentions. The benefits are two-fold.
On one hand, self-attention allows each time step to interact
with all its neighbors, which significantly reduces the informa-
tion decay compared to RNN models. On the other hand, the
attention weights can be associated with each neighboring time
step, which allows direct interpretation on which time steps
contribute most to the final outcomes. The attention matrix can
be formulated as a positive-definite kernel κ(qi, kj), such that
qiand kjare the i-th vector in the query matrix and j-th vector
in key matrix calculated from the localized expression of input
˜x. We define the kernelized attention as an expectation over
an inner product of a randomized feature map φ:RDRR
+
as R > 0:
κ(qi, kj) = Eω∼D[φ(qi)Tφ(kj)] (1)
where Dis a distribution from which ωis sampled i.i.d. Thus
the attention can be formulated as a weighted sum over the
latent dimension (usually the temporal dimension):
ai=
C
X
j=1
κ(qi, kj)
PC
j0=1 κ(qi, kj0)vj=
E[φ(qi)TPC
j=1 φ(kj)vj]
E[φ(qi)TPC
j0=1 φ(kj0)] .
(2)
After reordering products and reusing PC
j=1 φ(kj)vjand
PC
j0=1 φ(kj0)for each i, the time and memory complexity
can be reduced to O(C)[20], [30]. Based on the kernel view,
the Transformer’s softmax function of QTKcan be approxi-
mated by kernel functions of randomized feature maps [20],
[30]. In particular, the kernel function in Eq.(2) unbiasedly
approximates the exponential of the dot product in softmax
attention by drawing feature vectors from a zero-mean Gaus-
sian distribution ω∼ N(0, ID)
exp(qT
ikj) = Eω∼N (0,ID)[φ(qi)Tφ(kj)],
s.t. φ(z) = exp(ωTzkzk2
2), z =qior kj.(3)
摘要:

Self-explainingHierarchicalModelforIntraoperativeTimeSeriesDingwenLi,BingXue,ChristopherKingy,BradleyFritzy,MichaelAvidany,JoannaAbrahamy,ChenyangLuMcKelveySchoolofEngineering,WashingtonUniversityinSt.LouisySchoolofMedicine,WashingtonUniversityinSt.Louisfdingwenli,xuebing,christopherking,bafritz...

展开>> 收起<<
Self-explaining Hierarchical Model for Intraoperative Time Series Dingwen Li Bing Xue Christopher Kingy Bradley Fritzy Michael Avidany Joanna Abrahamy Chenyang Lu.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:812.67KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注