Fusing Modalities by Multiplexed Graph Neural Networks for Outcome Prediction in Tuberculosis Niharika S. DSouza1 Hongzhi Wang1 Andrea Giovannini2 Antonio

2025-05-06 0 0 2MB 11 页 10玖币
侵权投诉
Fusing Modalities by Multiplexed Graph Neural
Networks for Outcome Prediction in Tuberculosis
Niharika S. D’Souza1, Hongzhi Wang1, Andrea Giovannini2, Antonio
Foncubierta-Rodriguez2, Kristen L. Beck1, Orest Boyko3, and Tanveer
Syeda-Mahmood1
1IBM Research Almaden, San Jose, CA, USA
2IBM Research, Zurich, Switzerland
3Department of Radiology, VA Southern Nevada Healthcare System, NV, USA
Abstract.
In a complex disease such as tuberculosis, the evidence for
the disease and its evolution may be present in multiple modalities such
as clinical, genomic, or imaging data. Effective patient-tailored outcome
prediction and therapeutic guidance will require fusing evidence from these
modalities. Such multimodal fusion is difficult since the evidence for the
disease may not be uniform across all modalities, not all modality features
may be relevant, or not all modalities may be present for all patients. All
these nuances make simple methods of early, late, or intermediate fusion
of features inadequate for outcome prediction. In this paper, we present a
novel fusion framework using multiplexed graphs and derive a new graph
neural network for learning from such graphs. Specifically, the framework
allows modalities to be represented through their targeted encodings,
and models their relationship explicitly via multiplexed graphs derived
from salient features in a combined latent space. We present results that
show that our proposed method outperforms state-of-the-art methods of
fusing modalities for multi-outcome prediction on a large Tuberculosis
(TB) dataset.
Keywords:
Multimodal Fusion
·
Graph Neural Networks
·
Multiplex
Graphs ·Imaging Data ·Genomic Data ·Clinical Data
1 Introduction
Tuberculosis (TB) is one of the most common infectious diseases worldwide [
18
].
Although the mortality rate caused by TB has declined in recent years, single-
and multi-drug resistance has become a major threat to quick and effective TB
treatment. Studies have revealed that predicting the outcome of a treatment is
a function of many patient-specific factors for which collection of multimodal
data covering clinical, genomic, and imaging information about the patient has
become essential [
17
]. However, it is not clear what information is best captured
in each modality and how best to combine them. For example, genomic data can
reveal the genetic underpinnings of drug resistance and identify genes/mutations
conferring drug-resistance [
16
]. Although imaging data from X-ray or CT can
arXiv:2210.14377v1 [cs.LG] 25 Oct 2022
2 N.S. D’Souza et al.
show statistical difference for drug resistance, they alone may be insufficient
to differentiate multi-drug resistant TB from drug-sensitive TB [
29
]. Thus it is
important to develop methods that allow simultaneously extraction of relevant
information from multiple modalities as well as ways to combine them in an
optimal fashion to lead to better outcome prediction.
Recent efforts to study the fusion problem for outcome prediction in TB have
focused on a single outcome such as treatment failure or used only a limited
number of modalities such as clinical and demographic data [
20
,
1
]. In this paper,
we take a comprehensive approach by treating the outcome prediction problem
as a multiclass classification for multiple possible outcomes through multimodal
fusion. We leverage more extensive modalities beyond clinical, imaging, or genomic
data, including novel features extracted via advanced analysis of protein domains
from genomic sequence as well as deep learning-derived features from CT images
as shown in Fig. 2.(a). Specifically, we develop a novel fusion framework using
multiplexed graphs to capture the information from modalities and derive a new
graph neural network for learning from such graphs. The framework represents
modalities through their targeted encodings, and models their relationship via
multiplexed graphs derived from projections in a latent space.
Existing approaches often infer matrix or tensor encodings from individual
modalities [
13
] combined with early, late, or intermediate fusion [
24
,
2
,
26
] of
the individual representations. Example applications include- CCA for speaker
identification [
19
], autoencoders for video analytics [
25
], transformers for VQA [
10
],
etc. In contrast, our approach allows for the modalities to retain their individuality
while still participating in exploring explicit relationships between the modality
features through the multiplexed framework. Specifically, we design our framework
to explicitly model relationships within and across modality features via a self-
supervised multi-graph construction and design a novel graph neural network for
reasoning from these feature dependencies via structured message passing walks.
We present results which show that by relaxing the fusing constraints through
the multiplex formulation, our method outperforms state-of-the-art methods of
multimodal fusion in the context of multi-outcome prediction for TB treatments.
2 A Graph Based Multimodal Fusion Framework
As alluded to earlier, exploring various facets of cross-modal interactions is at
the heart of the multimodal fusion problem. To this end, we propose to utilize
the representation learning theory of multiplexed graphs to develop a generalized
framework for multimodal fusion. A multiplexed graph [
3
] is a type of multigraph
in which the nodes are grouped into multiple planes, each representing an
individual edge-type. The information captured within a plane is multiplexed to
other planes through diagonal connections as shown in Fig. 1. Mathematically, we
define a multiplexed graph as:
GMplex
= (
VMplex,EMplex
), where
|VMplex|
=
|V|×K
and
EMplex
=
{
(
i, j
)
∈ VMplex ×VMplex}
. There are
K
distinct types of edges which
can link two given nodes. Analogous to ordinary graphs, we have
k
adjacency
matrices
A(k)∈ RP×P
, where
P
=
|V|
, each summarizing the connectivity
Title Suppressed Due to Excessive Length 3
Fig. 1.
Graph Based Multimodal Fusion for Outcome Prediction.
Blue Box:
Incoming
modality features are concatenated into a feature vector (of size P=396) and projected
into a common latent space (of size K=32). Salient activations in the latent space are
used to form the planes of the multiplexed graph.
Green Box:
The multiplexed GNN
uses message passing walks to combine latent concepts for inference.
information given by the edge-type
k
. The elements of these matrices are binary
A(k)[m, n]=1if there is an edge of type kbetween nodes m, n ∈ V.
Multimodal Graph Representation Learning:
While the multiplexed graph
has been used for various modeling purposes in literature [
12
,
5
,
6
,
15
], we propose
to use it for multimodal fusion of imaging, genomic and clinical data for outcome
prediction in TB. We adopt the construction shown in the Blue Box in Fig. 1
to produce the multiplexed graph from the individual modality features. First,
domain specific autoencoders (d-AE) are used to convert each modality into a
compact feature space that can provide good reconstruction using Mean Squared
Error (MSE). To capture feature dependencies across modalities, the concatenated
features are brought to a common low dimensional subspace through a common
autoencoder (c-AE) trained to reconstruct the concatenated features. Each latent
dimension of the autoencoder captures an abstract aspect of the multimodal
fusion problem, e.g. features projected to be salient in the same latent dimension
are likely to form meaningful joint patterns for a specific task, and form a
“conceptual” plane of the multiplexed graph. The
|VMplex|
“supra-nodes” of
GMplex
are produced by creating copies of features (i.e. nodes) across the planes. The
edges between nodes in each plane represent features whose projections in the
respective latent dimensions were salient (see section 3.1 for details). Further, each
plane is endowed with its own topology and is a proxy for the correlation between
features across the corresponding latent dimension. This procedure helps model
the interactions between the various modality features in a principled fashion.
We thus connect supra-nodes within a plane to each other via the intra-planar
adjacency matrix
A(k)
, allowing us to traverse the multi-graph according to the
edge-type
k
. We also connect each supra-node with its own copy in other planes
via diagonal connections, allowing for inter-planar traversal.
摘要:

FusingModalitiesbyMultiplexedGraphNeuralNetworksforOutcomePredictioninTuberculosisNiharikaS.D'Souza1,HongzhiWang1,AndreaGiovannini2,AntonioFoncubierta-Rodriguez2,KristenL.Beck1,OrestBoyko3,andTanveerSyeda-Mahmood11IBMResearchAlmaden,SanJose,CA,USA2IBMResearch,Zurich,Switzerland3DepartmentofRadiology...

展开>> 收起<<
Fusing Modalities by Multiplexed Graph Neural Networks for Outcome Prediction in Tuberculosis Niharika S. DSouza1 Hongzhi Wang1 Andrea Giovannini2 Antonio.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注