Fusing Modalities by Multiplexed Graph Neural Networks for Outcome Prediction in Tuberculosis Niharika S. DSouza1 Hongzhi Wang1 Andrea Giovannini2 Antonio

2025-05-06 0 0 2MB 11 页 10玖币

侵权投诉

Fusing Modalities by Multiplexed Graph Neural

Networks for Outcome Prediction in Tuberculosis

Niharika S. D’Souza1, Hongzhi Wang1, Andrea Giovannini2, Antonio

Foncubierta-Rodriguez2, Kristen L. Beck1, Orest Boyko3, and Tanveer

Syeda-Mahmood1

1IBM Research Almaden, San Jose, CA, USA

2IBM Research, Zurich, Switzerland

3Department of Radiology, VA Southern Nevada Healthcare System, NV, USA

Abstract.

In a complex disease such as tuberculosis, the evidence for

the disease and its evolution may be present in multiple modalities such

as clinical, genomic, or imaging data. Eﬀective patient-tailored outcome

prediction and therapeutic guidance will require fusing evidence from these

modalities. Such multimodal fusion is diﬃcult since the evidence for the

disease may not be uniform across all modalities, not all modality features

may be relevant, or not all modalities may be present for all patients. All

these nuances make simple methods of early, late, or intermediate fusion

of features inadequate for outcome prediction. In this paper, we present a

novel fusion framework using multiplexed graphs and derive a new graph

neural network for learning from such graphs. Speciﬁcally, the framework

allows modalities to be represented through their targeted encodings,

and models their relationship explicitly via multiplexed graphs derived

from salient features in a combined latent space. We present results that

show that our proposed method outperforms state-of-the-art methods of

fusing modalities for multi-outcome prediction on a large Tuberculosis

(TB) dataset.

Keywords:

Multimodal Fusion

Graph Neural Networks

Multiplex

Graphs ·Imaging Data ·Genomic Data ·Clinical Data

1 Introduction

Tuberculosis (TB) is one of the most common infectious diseases worldwide [

Although the mortality rate caused by TB has declined in recent years, single-

and multi-drug resistance has become a major threat to quick and eﬀective TB

treatment. Studies have revealed that predicting the outcome of a treatment is

a function of many patient-speciﬁc factors for which collection of multimodal

data covering clinical, genomic, and imaging information about the patient has

become essential [

]. However, it is not clear what information is best captured

in each modality and how best to combine them. For example, genomic data can

reveal the genetic underpinnings of drug resistance and identify genes/mutations

conferring drug-resistance [

]. Although imaging data from X-ray or CT can

arXiv:2210.14377v1 [cs.LG] 25 Oct 2022

2 N.S. D’Souza et al.

show statistical diﬀerence for drug resistance, they alone may be insuﬃcient

to diﬀerentiate multi-drug resistant TB from drug-sensitive TB [

]. Thus it is

important to develop methods that allow simultaneously extraction of relevant

information from multiple modalities as well as ways to combine them in an

optimal fashion to lead to better outcome prediction.

Recent eﬀorts to study the fusion problem for outcome prediction in TB have

focused on a single outcome such as treatment failure or used only a limited

number of modalities such as clinical and demographic data [

]. In this paper,

we take a comprehensive approach by treating the outcome prediction problem

as a multiclass classiﬁcation for multiple possible outcomes through multimodal

fusion. We leverage more extensive modalities beyond clinical, imaging, or genomic

data, including novel features extracted via advanced analysis of protein domains

from genomic sequence as well as deep learning-derived features from CT images

as shown in Fig. 2.(a). Speciﬁcally, we develop a novel fusion framework using

multiplexed graphs to capture the information from modalities and derive a new

graph neural network for learning from such graphs. The framework represents

modalities through their targeted encodings, and models their relationship via

multiplexed graphs derived from projections in a latent space.

Existing approaches often infer matrix or tensor encodings from individual

modalities [

] combined with early, late, or intermediate fusion [

] of

the individual representations. Example applications include- CCA for speaker

identiﬁcation [

], autoencoders for video analytics [

], transformers for VQA [

etc. In contrast, our approach allows for the modalities to retain their individuality

while still participating in exploring explicit relationships between the modality

features through the multiplexed framework. Speciﬁcally, we design our framework

to explicitly model relationships within and across modality features via a self-

supervised multi-graph construction and design a novel graph neural network for

reasoning from these feature dependencies via structured message passing walks.

We present results which show that by relaxing the fusing constraints through

the multiplex formulation, our method outperforms state-of-the-art methods of

multimodal fusion in the context of multi-outcome prediction for TB treatments.

2 A Graph Based Multimodal Fusion Framework

As alluded to earlier, exploring various facets of cross-modal interactions is at

the heart of the multimodal fusion problem. To this end, we propose to utilize

the representation learning theory of multiplexed graphs to develop a generalized

framework for multimodal fusion. A multiplexed graph [

] is a type of multigraph

in which the nodes are grouped into multiple planes, each representing an

individual edge-type. The information captured within a plane is multiplexed to

other planes through diagonal connections as shown in Fig. 1. Mathematically, we

deﬁne a multiplexed graph as:

GMplex

= (

VMplex,EMplex

), where

|VMplex|

|V|×K

and

EMplex

{

(

i, j

)

∈ VMplex ×VMplex}

. There are

distinct types of edges which

can link two given nodes. Analogous to ordinary graphs, we have

adjacency

matrices

A(k)∈ RP×P

, where

|V|

, each summarizing the connectivity

Title Suppressed Due to Excessive Length 3

Fig. 1.

Graph Based Multimodal Fusion for Outcome Prediction.

Blue Box:

Incoming

modality features are concatenated into a feature vector (of size P=396) and projected

into a common latent space (of size K=32). Salient activations in the latent space are

used to form the planes of the multiplexed graph.

Green Box:

The multiplexed GNN

uses message passing walks to combine latent concepts for inference.

information given by the edge-type

. The elements of these matrices are binary

A(k)[m, n]=1if there is an edge of type kbetween nodes m, n ∈ V.

Multimodal Graph Representation Learning:

While the multiplexed graph

has been used for various modeling purposes in literature [

], we propose

to use it for multimodal fusion of imaging, genomic and clinical data for outcome

prediction in TB. We adopt the construction shown in the Blue Box in Fig. 1

to produce the multiplexed graph from the individual modality features. First,

domain speciﬁc autoencoders (d-AE) are used to convert each modality into a

compact feature space that can provide good reconstruction using Mean Squared

Error (MSE). To capture feature dependencies across modalities, the concatenated

features are brought to a common low dimensional subspace through a common

autoencoder (c-AE) trained to reconstruct the concatenated features. Each latent

dimension of the autoencoder captures an abstract aspect of the multimodal

fusion problem, e.g. features projected to be salient in the same latent dimension

are likely to form meaningful joint patterns for a speciﬁc task, and form a

“conceptual” plane of the multiplexed graph. The

|VMplex|

“supra-nodes” of

GMplex

are produced by creating copies of features (i.e. nodes) across the planes. The

edges between nodes in each plane represent features whose projections in the

respective latent dimensions were salient (see section 3.1 for details). Further, each

plane is endowed with its own topology and is a proxy for the correlation between

features across the corresponding latent dimension. This procedure helps model

the interactions between the various modality features in a principled fashion.

We thus connect supra-nodes within a plane to each other via the intra-planar

adjacency matrix

A(k)

, allowing us to traverse the multi-graph according to the

edge-type

. We also connect each supra-node with its own copy in other planes

via diagonal connections, allowing for inter-planar traversal.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FusingModalitiesbyMultiplexedGraphNeuralNetworksforOutcomePredictioninTuberculosisNiharikaS.D'Souza1,HongzhiWang1,AndreaGiovannini2,AntonioFoncubierta-Rodriguez2,KristenL.Beck1,OrestBoyko3,andTanveerSyeda-Mahmood11IBMResearchAlmaden,SanJose,CA,USA2IBMResearch,Zurich,Switzerland3DepartmentofRadiology...

展开>> 收起<<

Fusing Modalities by Multiplexed Graph Neural Networks for Outcome Prediction in Tuberculosis Niharika S. DSouza1 Hongzhi Wang1 Andrea Giovannini2 Antonio.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Fusing Modalities by Multiplexed Graph Neural Networks for Outcome Prediction in Tuberculosis Niharika S. DSouza1 Hongzhi Wang1 Andrea Giovannini2 Antonio

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: