
Improving Medical Predictions by Irregular Multimodal Electronic Health Records Modeling
To the best of our knowledge, none of the existing works has
fully considered irregularity in multimodal representation
learning.
We observed three major drawbacks for irregular multi-
modal EHRs modeling in existing works. 1) MISTS models
perform diversely. While the numerous MISTS models have
been proposed to tackle irregularity (Lipton et al.,2016;
Shukla & Marlin,2019;2021;Zhang et al.,2021b;Horn
et al.,2020;Rubanova et al.,2019), none of the approaches
consistently outperforms the others. Even among Temporal
discretization-based embedding (TDE) methods, including
hand-crafted imputation (Lipton et al.,2016) and learned
interpolation (Shukla & Marlin,2019;2021), which trans-
form MISTS into regular time representations to interface
with deep neural networks for regular time series, there is
no clear superior approach. 2) Irregularity in clinical notes
is not well tackled. Most existing works (Golmaei & Luo,
2021;Mahbub et al.,2022) directly concatenate all clinical
notes of each patient but ignore the note-taking time infor-
mation. Although Zhang et al. (2020) proposes an LSTM
variant to model time decay among clinical notes, this ap-
proach utilizes only a few trainable parameters, which could
be less powerful. 3) Exiting works ignore irregularity in
multimodal fusion. Deznabi et al. (2021); Yang et al. (2021)
have demonstrated the effectiveness of combining time se-
ries and clinical notes for medical prediction tasks, however
these works are deployed only on multimodal data without
considering irregularity. Their fusion strategies may not be
able to fully integrate irregular time information into multi-
modal representations, which can be essential for prediction
performance in real-world scenarios.
Our Contributions. To tackle the aforementioned issues,
we separately model irregularity in MISTS and irregular
clinical notes, and further integrate multimodalities across
temporal steps, so as to provide powerful medical predic-
tions based on the complicated irregular time pattern and
multimodal structure of EHRs. Specifically, we first show
that different TDE methods of tackling MISTS are comple-
mentary for medical predictions, by introducing a gating
mechanism that incorporates different TDE embeddings
specific to each patient. Secondly, we cast note representa-
tions and note-taking time as MISTS, and leverage a time
attention mechanism (Shukla & Marlin,2021) to model
the irregularity in each dimension of note representations.
Finally, we incorporate irregularity into multimodal rep-
resentations by adopting a fusion method that interleaves
self-attentions and cross-attentions (Vaswani et al.,2017) to
integrate multimodal knowledge across temporal steps. To
the best of our knowledge, this is the first work for a unified
system that fully considers irregularity to improve medical
predictions, not only in every single modality but also in
multimodal fusion scenarios. Our approach demonstrates
superior performance compared to baselines in both single
modality and multimodal fusion scenarios, with notable rel-
ative improvements of 6.5%, 3.6%, and 4.3% in terms of
F1 for MISTS, clinical notes, and multimodal fusion, re-
spectively. Our comprehensive ablation study demonstrates
that tackling irregularity in every single modality benefits
not only their own modality but also multimodal fusion.
We also show that modeling long sequential clinical notes
further improves medical prediction performance.
2. Related Work
Multivariate irregularly sampled time series (MISTS).
MISTS refer to observations of each variable that are ac-
quired at irregular time intervals and can have misaligned
observation times across different variables (Zerveas et al.,
2021). GRU-D (Che et al.,2018) captures temporal depen-
dencies by decaying the hidden states in gated recurrent
units. SeFT (Horn et al.,2020) represents the MISTS to
a set of observations based on differentiable set function
learning. ODE-RNN (Rubanova et al.,2019) uses latent
neural ordinary differential equations (Chen et al.,2018)
to specify hidden state dynamics and update RNN hidden
states with a new observation. RAINDROP (Zhang et al.,
2021b) models MISTS as separate sensor graphs and lever-
ages graph neural networks to learn the dependencies among
variables. These approaches model irregular temporal de-
pendencies in MISTS from different perspectives through
specialized design. TDE methods are a subset of methods
for handling MISTS, converting them to fixed-dimensional
feature spaces, and feeding regular time representations
into deep neural models for regular time series. Imputa-
tion methods (Lipton et al.,2016;Harutyunyan et al.,2019;
McDermott et al.,2021) are straightforward TDE methods
to discretize MISTS into regular time series with manual
missing values imputation, but these ignore the irregularity
in the raw data. To fill this gap, Shukla & Marlin (2019)
presents interpolation-prediction networks (IP-Nets) to inter-
polate MISTS at a set of regular reference points via a kernel
function with learned parameters. Shukla & Marlin (2021)
further presents a time attention mechanism with time em-
beddings to learn interpolation representations. However,
learned interpolation strategies do not always outperform
simple imputation methods. This may be due to compli-
cated data sampling patterns (Horn et al.,2020). Inspired
by Mixture-of-Experts (MoE) (Shazeer et al.,2017;Jacobs
et al.,1991), which maintains a set of experts (neural net-
works) and seeks a combination of the experts specific to
each input via a gating mechanism, we leverage different
TDE methods as submodules and integrate hand-crafted im-
putation embeddings into learned interpolation embeddings
to improve medical predictions.
Irregular clinical notes modeling. (Golmaei & Luo,2021;
Mahbub et al.,2022) concatenate each patient’s clinical
2