BRAIN NETWORK TRANSFORMER Xuan Kan1Wei Dai2Hejie Cui1Zilong Zhang3Ying Guo1Carl Yang1 1Emory University2Stanford University3University of International Business and Economics

2025-04-27 1 0 2.53MB 23 页 10玖币

侵权投诉

BRAIN NETWORK TRANSFORMER

Xuan Kan1Wei Dai2Hejie Cui1Zilong Zhang3Ying Guo1Carl Yang1

1Emory University 2Stanford University 3University of International Business and Economics

{xuan.kan,hejie.cui,yguo2,j.carlyang}@emory.edu

dvd.ai@stanford.edu 201957020@uibe.edu.cn

Abstract

Human brains are commonly modeled as networks of Regions of Interest (ROIs)

and their connections for the understanding of brain functions and mental dis-

orders. Recently, Transformer-based models have been studied over different

types of data, including graphs, shown to bring performance gains widely. In this

work, we study Transformer-based models for brain network analysis. Driven

by the unique properties of data, we model brain networks as graphs with nodes

of ﬁxed size and order, which allows us to (1) use connection proﬁles as node

features to provide natural and low-cost positional information and (2) learn pair-

wise connection strengths among ROIs with efﬁcient attention weights across

individuals that are predictive towards downstream analysis tasks. Moreover, we

propose an ORTHONORMAL CLUSTERING READOUT operation based on self-

supervised soft clustering and orthonormal projection. This design accounts for

the underlying functional modules that determine similar behaviors among groups

of ROIs, leading to distinguishable cluster-aware node embeddings and informa-

tive graph embeddings. Finally, we re-standardize the evaluation pipeline on the

only one publicly available large-scale brain network dataset of ABIDE, to en-

able meaningful comparison of different models. Experiment results show clear

improvements of our proposed BRAIN NETWORK TRANSFORMER on both the

public ABIDE and our restricted ABCD datasets. The implementation is available

at https://github.com/Wayfear/BrainNetworkTransformer.

1 Introduction

Brain network analysis has been an intriguing pursuit for neuroscientists to understand human brain

organizations and predict clinical outcomes [

]. Among

various neuroimaging modalities, functional Magnetic Resonance Imaging (fMRI) is one of the most

commonly used for brain network construction, where the nodes are deﬁned as Regions of Interest

(ROIs) given an atlas, and the edges are calculated as pairwise correlations between the blood-oxygen-

level-dependent (BOLD) signal series extracted from each region [

]. Researchers

observe that some regions can co-activate or co-deactivate simultaneously when performing cognitive-

related tasks such as action, language, and vision. Based on this pattern, brain regions can be classiﬁed

into diverse functional modules to analyze diseases towards their diagnosis, progress understanding

and treatment.

Nowadays Transformer-based models have led a tremendous success in various downstream tasks

across ﬁelds including natural language processing [

] and computer vision [

]. Recent

efforts have also emerged to apply Transformer-based designs to graph representation learning.

GAT [

] ﬁrstly adapts the attention mechanism to graph neural networks (GNNs) but only considers

the local structures of neighboring nodes. Graph Transformer [

] injects edge information into

the attention mechanism and leverages the eigenvectors of each node as positional embeddings.

SAN [

] further enhances the positional embeddings by considering both eigenvalues and eigenvec-

tors and improves the attention mechanism by extending the attention from local to global structures.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.06681v2 [cs.LG] 15 Oct 2022

Graphomer [

], which achieves the ﬁrst place on the quantum prediction track of OGB Large-Scale

Challenge [

], designs unique mechanisms for molecule graphs such as centrality encoding to

enhance node features and spatial/edge encoding to adapt attention scores.

However, brain networks have several unique traits that make directly applying existing graph

Transformer models impractical. First, one of the simplest and most frequently used methods to

construct a brain network in the neuroimaging community is via pairwise correlations between

BOLD time courses from two ROIs [

]. This impedes the designs like centrality,

spatial, and edge encoding because each node in the brain network has the same degree and connects

to every other node by a single hop. Second, in previous graph transformer models, eigenvalues

and eigenvectors are commonly used as positional embeddings because they can provide identity

and positional information for each node [

]. Nevertheless, in brain networks, the connection

proﬁle, which is deﬁned as each node’s corresponding row in the brain network adjacency matrix,

is recognized as the most effective node feature [

]. This node feature naturally encodes both

structural and positional information, making the aforementioned positional embedding design based

on eigenvalues and eigenvectors redundant. The third challenge is scalability. Typically, the numbers

of nodes and edges in molecule graphs are less than 50 and 2500, respectively. However, for brain

networks, the node number is generally around 100 to 400, while the edge number can be up to

160,000. Therefore, operations like the generation of all edge features in existing graph transformer

models can be time-consuming, if not infeasible.

Orthonormal Bases

Non-orthonormal Bases

(a) Node features projected to a 3D space

with PCA. Colors indicate functional modules.

(b) Orthonormal bases can make indistinguishable nodes in non-

orthonormal bases easily distinguishable.

Figure 1: Illustration of the motivations behind ORTHONORMAL CLUSTERING READOUT.

In this work, we propose to develop BRAIN NETWORK TRANSFORMER (BRAINNETTF), which

leverages the unique properties of brain network data to fully unleash the power of Transformer-based

models for brain network analysis. Speciﬁcally, motivated by previous ﬁndings on effective GNN

designs for brain networks [

], we propose to use the effective initial node features of connection

proﬁles. Empirical analysis shows that connection proﬁles naturally provide positional features

for Transformer-based models and avoid the costly computations of eigenvalues or eigenvectors.

Moreover, recent work demonstrates that GNNs trained on learnable graph structures can achieve

superior effectiveness and explainability [

]. Inspired by this insight, we propose to learn fully

pairwise attention weights with Transformer-based models, which resembles the process of learning

predictive brain network structures towards downstream tasks.

One step further, when GNNs are used for brain network analysis, a graph-level embedding needs

to be generated through a readout function based on the learned node embeddings [

]. As

is shown in Figure 1(a), a property of brain networks is that brain regions (nodes) belonging to the

same functional modules often share similar behaviors regarding activations and deactivations in

response to various stimulations [

]. Unfortunately, the current labeling of functional modules is

rather empirical and far from accurate. For example, [

] provides more than 100 different functional

module organizations based on hierarchical clustering. In order to leverage the natural functions of

brain regions without the limitation of inaccurate functional module labels, we design a new global

pooling operator, ORTHONORMAL CLUSTERING READOUT, where the graph-level embeddings

are pooled from clusters of functionally similar nodes through soft clustering with orthonormal

projection. Speciﬁcally, we ﬁrst devise a self-supervised mechanism based on [

] to jointly assign

soft clusters to brain regions while learning their individual embeddings. To further facilitate the

learning of clusters and embeddings, we design an orthonormal projection and theoretically prove

its effectiveness in distinguishing embeddings across clusters, thus obtaining expressive graph-level

embeddings after the global pooling, as illustrated in Figure 1(b).

Finally, the lack of open-access datasets has been a non-negligible challenge for brain network

analysis. The strict access restrictions and complicated extraction/preprocessing of brain networks

from fMRI data limit the development of machine learning models for brain network analysis.

Speciﬁcally, among all the large-scale publicly available fMRI datasets in literature, ABIDE [

] is the

only one provided with extracted brain networks fully accessible without permission requirements.

However, ABIDE is aggregated from 17 international sites with different scanners and acquisition

parameters. This inter-site variability conceals inter-group differences that are really meaningful,

which is reﬂected in the unstable training performance and the signiﬁcant gap between validation and

testing performance in practice. To address these limitations, we propose to apply a stratiﬁed sampling

method in the dataset splitting process and standardize a fair evaluation pipeline for meaningful

model comparison on the ABIDE dataset. Our extensive experiments on this public ABIDE dataset

and a restricted ABCD dataset [

] show signiﬁcant improvements brought by our proposed BRAIN

NETWORK TRANSFORMER.

2 Background and Related Work

2.1 GNNs for Brain Network Analysis

Recently, emerging attention has been devoted to the generalization of GNN-based models to brain

network analysis [

]. GroupINN [

] utilizes a grouping-based layer to provide explainability

and reduce the model size. BrainGNN [

] designs the ROI-aware GNNs to leverage the functional

information in brain networks and uses a special pooling operator to select these crucial nodes.

IBGNN [

] proposes an interpretable framework to analyze disorder-speciﬁc ROIs and prominent

connections. In addition, FBNetGen [

] considers the learnable generation of brain networks and

explores the explainability of the generated brain networks towards downstream tasks. Another

benchmark paper [

] systematically studies the effectiveness of various GNN designs over brain

network data. Different from other work focusing on static brain networks, STAGIN [

] utilizes

GNNs with spatio-temporal attention to model dynamic brain networks extracted from fMRI data.

2.2 Graph Transformer

Graph Transformer raises many researchers’ interest currently due to its outstanding performance

in graph representation learning. Graph Transformer [

] ﬁrstly injects edge information into the

attention mechanism and leverages the eigenvectors as positional embeddings. SAN [

] enhances

the positional embeddings and improves the attention mechanism by emphasizing neighbor nodes

while incorporating the global information. Graphomer [

] designs unique mechanisms for molecule

graphs and achieves the SOTA performance. Besides, a ﬁne-grained attention mechanism is developed

for node classiﬁcation [68]. Also, the Transformer is extended to larger-scale heterogeneous graphs

with a particular sampling algorithm in HGT [

]. EGT [

] further employs edge augmentation to

assist global self-attention. In addition, LSPE [

] leverages the learnable structural and positional

encoding to improve GNNs’ representation power, and GRPE [

] enhances the design of encoding

node relative position information in Transformer.

3 BRAIN NETWORK TRANSFORMER

3.1 Problem Deﬁnition

In brain network analysis, given a brain network

X∈RV×V

, where

is the number of nodes

(ROIs), the model aims to make a prediction indicating biological sex, presence of a disease or

other properties of the brain subject. The overall framework of our proposed BRAIN NETWORK

TRANSFORMER is shown in Figure 2, which is mainly composed of two components, an

-layer

attention module

MHSA

and a graph pooling operator OCREAD. Speciﬁcally, in the ﬁrst component

MHSA

, the model learns attention-enhanced node features

through a non-linear mapping

X→ZL∈RV×V

. Then the second component of OCREAD compresses the enhanced node

embeddings

to graph-level embeddings

ZG∈RK×V

, where

is a hyperparameter representing

the number of clusters.

is then ﬂattened and passed to a multi-layer perceptron for graph-level

predictions. The whole training process is supervised with the cross-entropy loss.

Scaled Dot-Product

Attention

Linear Linear Linear

Scaled Dot-Product

Attention

Linear Linear Linear

Scaled Dot-Product

Attention

Linear Linear Linear

Concat

Linear

𝑀

Project

Flatten

Biological sex, Autism

spectrum disorder…

Time

fMRI

ROI#1

ROI#2

ROI#3

ROI#4

𝐸!

×𝐿

Layers

𝒁!

𝑷

FCN

OCRead

MHSA

𝒁"

Figure 2: The overall framework of our proposed BRAIN NETWORK TRANSFORMER.

3.2 Multi-Head Self-Attention Module (MHSA)

To develop a powerful Transformer-based model suitable for brain networks, two fundamental

designs, the positional embedding and attention mechanism, need to be reconsidered to ﬁt the natural

properties of brain network data. In existing graph transformer models, the positional information

is usually encoded via eigendecomposition, while the attention mechanism often combines node

positions with existing edges to calculate the attention scores. However, for the dense (often fully

connected) graphs of brain networks, eigendecomposition is rather costly, and the existence of edges

is hardly informative.

ROI node features on brain networks naturally contain sufﬁcient positional information, making the

positional embeddings based on eigendecomposition redundant. Previous work on brain network

analysis has shown that the connection proﬁle

Xi·

for node

, deﬁned as the corresponding row for

each node in the edge weight matrix

, always achieves superior performance over others such

as node identities, degrees or eigenvector-based embeddings [

]. With this node feature

initialization, the self-connection weight

xii

on the diagonal is always equal to one, which encodes

sufﬁcient information to determine the position of each node in a fully connected graph based on

the given brain atlas. To verify this insight, we also empirically compare the performance of the

original connection proﬁle with two variants concatenated with additional positional information,

i.e., connection proﬁle w/ identity feature and connection proﬁle w/ eigen feature. The results indeed

show no beneﬁt brought by the additional computations (c.f. Appendix B). As for the attention

mechanism, previous work [

] has empirically demonstrated that integrating edge weights into the

attention score calculation can signiﬁcantly degrade the effectiveness of attention on complete graphs,

while the generation of edge-wise embedding can be unaffordable given a large number of edges in

brain networks. On the other hand, the existence of edges provides no useful information for the

computation of attention scores as well because all edges simply exist in complete graphs.

Based on the observations above, we design the basic BRAIN NETWORK TRANSFORMER by

(1) adopting the connection proﬁle as initial node features and eliminating any extra positional

embeddings and (2) adopting the vanilla pair-wise attention mechanism without using edge weights

or relative position information to learn a singular attention score for each edge in the complete graph.

Formally, we leverage a

-layer non-linear mapping module, namely Multi-Head Self-Attention

(

MHSA

), to generate more expressive node features

ZL= MHSA(X)∈RV×V

. For each layer

the output Zlis obtained by

Zl= (kM

m=1hl,m)Wl

O,hl,m = Softmax 



Wl,m

QZl−1(Wl,m

KZl−1)>

qdl,m



Wl,m

VZl−1,(1)

where

Z0=X

is the concatenation operator,

is the number of heads,

is the layer index,

O,Wl,m

Q,Wl,m

Wl,m

are learnable model parameters, and

dl,m

is the ﬁrst dimension of

Wl,m

3.3 ORTHONORMAL CLUSTERING READOUT (OCREAD)

The readout function is an essential component to learn the graph-level representations for brain

network analysis (e.g., classiﬁcation), which maps a set of learned node-level embeddings to a graph-

level embedding.

Mean(·),Sum(·)

and

Max(·)

are the most commonly used readout functions for

GNNs. Xu et al. [

] show that GNNs equipped with

Sum(·)

readout have the same discriminative

power as the Weisfeiler-Lehman Test. Zhang et al. [

] propose a sort pooling to generate the

graph-level representation by sorting the ﬁnal node representations. Ju et al. [

] present a layer-wise

readout by extending the node information aggregated from the last layer of GNNs to all layers.

However, none of the existing readout functions leverages the properties of brain networks that nodes

in the same functional modules tend to have similar behaviors and clustered representations, as shown

in Figure 1(a). To address this deﬁciency, we design a novel readout function to take advantage of

the modular-level similarities between ROIs in brain networks, where nodes are assigned softly to

well-chosen clusters with an unsupervised process.

Formally, given

cluster centers, each center has

dimensions,

E∈RK×V

, a Softmax projection

operator is used as the function to calculate the probability Pik of assigning node ito cluster k,

Pik =ehZL

i·,Ek·i

k0ehZL

i·,Ek0·i,(2)

where

h·,·i

denotes the inner product and

is the learned set of node embeddings from the last

layer of

MHSA

module. With this computed soft assignment

P∈RV×K

, the original learned node

representation

can be aggregated under the guidance of the soft cluster information, where the

graph-level embedding ZGis obtained by ZG=P>ZL.

However, jointly learning node embeddings and clusters without ground-truth cluster labels is difﬁcult.

To obtain representative soft assignment

, the initialization of

cluster centers

is critical and

should be designed delicately. To this end, we leverage the observation illustrated in Figure 1(b),

where orthonormal embeddings can improve the clustering of nodes in brain networks w.r.t. the

functional modules underlying brain regions.

Orthonormal Initialization

. To initialize a group of orthonormal bases as cluster centers, we ﬁrst

adopt the Xavier uniform initialization [

] to initialize

random centers and each center contains

dimensions

C∈RK×V

. Then, we apply the Gram-Schmidt process to obtain the orthonormal

bases E, where

uk=Ck·−

k−1

j=1

huj,Ck·i

huj,ujiuj,Ek·=uk

kukk.(3)

In the next section, we theoretically prove the advantage of this orthonormal initialization.

3.3.1 Theoretical Justiﬁcations

In OCREAD, proper cluster centers can generate higher-quality soft assignments and enlarge the

difference between

from different classes. [

] showed the advantages of orthogonal initial-

ization in DNN model parameters. However, none of them proves whether it is an ideal strategy to

obtain the cluster centers. We propose two methods from the perspective of statistics as follows.

Firstly, to discern features of different nodes, we would expect a larger discrepancy among their

similarity probabilities indicated from the readout. One way to measure the discrepancy is using the

variance of

for each feature. Let

P≡1/K

denote the mean of any discrete probabilities with

values. Variance of

measures the difference between

and

. We average over the feature vector

space: if the result is small, then there is a large tendency that different

approaches

and hence

cannot be discerned easily. Speciﬁcally, the following theorem holds for our function Eq. (2):

Theorem 3.1.

For arbitrary

r > 0

, let

Br={Z ∈ RV;kZk ≤ r}

denote the round ball centered

at origin of radius

with

being fracture vectors. Let

be the volume of

. The variance of

Softmax projection averaged over Br

VrZBr

kehZ,Ek·i

k0ehZ,Ek0·i−1

K2dZ,(4)

attains maximum when Eis orthonormal.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BRAINNETWORKTRANSFORMERXuanKan1WeiDai2HejieCui1ZilongZhang3YingGuo1CarlYang11EmoryUniversity2StanfordUniversity3UniversityofInternationalBusinessandEconomics{xuan.kan,hejie.cui,yguo2,j.carlyang}@emory.edudvd.ai@stanford.edu201957020@uibe.edu.cnAbstractHumanbrainsarecommonlymodeledasnetworksofRegions...

展开>> 收起<<

BRAIN NETWORK TRANSFORMER Xuan Kan1Wei Dai2Hejie Cui1Zilong Zhang3Ying Guo1Carl Yang1 1Emory University2Stanford University3University of International Business and Economics.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BRAIN NETWORK TRANSFORMER Xuan Kan1Wei Dai2Hejie Cui1Zilong Zhang3Ying Guo1Carl Yang1 1Emory University2Stanford University3University of International Business and Economics

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: