BRAIN NETWORK TRANSFORMER Xuan Kan1Wei Dai2Hejie Cui1Zilong Zhang3Ying Guo1Carl Yang1 1Emory University2Stanford University3University of International Business and Economics

2025-04-27 0 0 2.53MB 23 页 10玖币
侵权投诉
BRAIN NETWORK TRANSFORMER
Xuan Kan1Wei Dai2Hejie Cui1Zilong Zhang3Ying Guo1Carl Yang1
1Emory University 2Stanford University 3University of International Business and Economics
{xuan.kan,hejie.cui,yguo2,j.carlyang}@emory.edu
dvd.ai@stanford.edu 201957020@uibe.edu.cn
Abstract
Human brains are commonly modeled as networks of Regions of Interest (ROIs)
and their connections for the understanding of brain functions and mental dis-
orders. Recently, Transformer-based models have been studied over different
types of data, including graphs, shown to bring performance gains widely. In this
work, we study Transformer-based models for brain network analysis. Driven
by the unique properties of data, we model brain networks as graphs with nodes
of fixed size and order, which allows us to (1) use connection profiles as node
features to provide natural and low-cost positional information and (2) learn pair-
wise connection strengths among ROIs with efficient attention weights across
individuals that are predictive towards downstream analysis tasks. Moreover, we
propose an ORTHONORMAL CLUSTERING READOUT operation based on self-
supervised soft clustering and orthonormal projection. This design accounts for
the underlying functional modules that determine similar behaviors among groups
of ROIs, leading to distinguishable cluster-aware node embeddings and informa-
tive graph embeddings. Finally, we re-standardize the evaluation pipeline on the
only one publicly available large-scale brain network dataset of ABIDE, to en-
able meaningful comparison of different models. Experiment results show clear
improvements of our proposed BRAIN NETWORK TRANSFORMER on both the
public ABIDE and our restricted ABCD datasets. The implementation is available
at https://github.com/Wayfear/BrainNetworkTransformer.
1 Introduction
Brain network analysis has been an intriguing pursuit for neuroscientists to understand human brain
organizations and predict clinical outcomes [
50
,
59
,
58
,
5
,
18
,
27
,
52
,
29
,
58
,
28
,
41
,
44
,
31
]. Among
various neuroimaging modalities, functional Magnetic Resonance Imaging (fMRI) is one of the most
commonly used for brain network construction, where the nodes are defined as Regions of Interest
(ROIs) given an atlas, and the edges are calculated as pairwise correlations between the blood-oxygen-
level-dependent (BOLD) signal series extracted from each region [
54
,
53
,
59
,
16
]. Researchers
observe that some regions can co-activate or co-deactivate simultaneously when performing cognitive-
related tasks such as action, language, and vision. Based on this pattern, brain regions can be classified
into diverse functional modules to analyze diseases towards their diagnosis, progress understanding
and treatment.
Nowadays Transformer-based models have led a tremendous success in various downstream tasks
across fields including natural language processing [
56
,
17
] and computer vision [
20
,
10
,
55
]. Recent
efforts have also emerged to apply Transformer-based designs to graph representation learning.
GAT [
57
] firstly adapts the attention mechanism to graph neural networks (GNNs) but only considers
the local structures of neighboring nodes. Graph Transformer [
21
] injects edge information into
the attention mechanism and leverages the eigenvectors of each node as positional embeddings.
SAN [
40
] further enhances the positional embeddings by considering both eigenvalues and eigenvec-
tors and improves the attention mechanism by extending the attention from local to global structures.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.06681v2 [cs.LG] 15 Oct 2022
Graphomer [
64
], which achieves the first place on the quantum prediction track of OGB Large-Scale
Challenge [
30
], designs unique mechanisms for molecule graphs such as centrality encoding to
enhance node features and spatial/edge encoding to adapt attention scores.
However, brain networks have several unique traits that make directly applying existing graph
Transformer models impractical. First, one of the simplest and most frequently used methods to
construct a brain network in the neuroimaging community is via pairwise correlations between
BOLD time courses from two ROIs [
43
,
35
,
13
,
63
,
69
]. This impedes the designs like centrality,
spatial, and edge encoding because each node in the brain network has the same degree and connects
to every other node by a single hop. Second, in previous graph transformer models, eigenvalues
and eigenvectors are commonly used as positional embeddings because they can provide identity
and positional information for each node [
15
,
26
]. Nevertheless, in brain networks, the connection
profile, which is defined as each node’s corresponding row in the brain network adjacency matrix,
is recognized as the most effective node feature [
13
]. This node feature naturally encodes both
structural and positional information, making the aforementioned positional embedding design based
on eigenvalues and eigenvectors redundant. The third challenge is scalability. Typically, the numbers
of nodes and edges in molecule graphs are less than 50 and 2500, respectively. However, for brain
networks, the node number is generally around 100 to 400, while the edge number can be up to
160,000. Therefore, operations like the generation of all edge features in existing graph transformer
models can be time-consuming, if not infeasible.
Orthonormal Bases
Non-orthonormal Bases
(a) Node features projected to a 3D space
with PCA. Colors indicate functional modules.
(b) Orthonormal bases can make indistinguishable nodes in non-
orthonormal bases easily distinguishable.
Figure 1: Illustration of the motivations behind ORTHONORMAL CLUSTERING READOUT.
In this work, we propose to develop BRAIN NETWORK TRANSFORMER (BRAINNETTF), which
leverages the unique properties of brain network data to fully unleash the power of Transformer-based
models for brain network analysis. Specifically, motivated by previous findings on effective GNN
designs for brain networks [
13
], we propose to use the effective initial node features of connection
profiles. Empirical analysis shows that connection profiles naturally provide positional features
for Transformer-based models and avoid the costly computations of eigenvalues or eigenvectors.
Moreover, recent work demonstrates that GNNs trained on learnable graph structures can achieve
superior effectiveness and explainability [
35
]. Inspired by this insight, we propose to learn fully
pairwise attention weights with Transformer-based models, which resembles the process of learning
predictive brain network structures towards downstream tasks.
One step further, when GNNs are used for brain network analysis, a graph-level embedding needs
to be generated through a readout function based on the learned node embeddings [
37
,
43
,
13
]. As
is shown in Figure 1(a), a property of brain networks is that brain regions (nodes) belonging to the
same functional modules often share similar behaviors regarding activations and deactivations in
response to various stimulations [
7
]. Unfortunately, the current labeling of functional modules is
rather empirical and far from accurate. For example, [
3
] provides more than 100 different functional
module organizations based on hierarchical clustering. In order to leverage the natural functions of
brain regions without the limitation of inaccurate functional module labels, we design a new global
pooling operator, ORTHONORMAL CLUSTERING READOUT, where the graph-level embeddings
are pooled from clusters of functionally similar nodes through soft clustering with orthonormal
projection. Specifically, we first devise a self-supervised mechanism based on [
60
] to jointly assign
soft clusters to brain regions while learning their individual embeddings. To further facilitate the
learning of clusters and embeddings, we design an orthonormal projection and theoretically prove
its effectiveness in distinguishing embeddings across clusters, thus obtaining expressive graph-level
embeddings after the global pooling, as illustrated in Figure 1(b).
2
Finally, the lack of open-access datasets has been a non-negligible challenge for brain network
analysis. The strict access restrictions and complicated extraction/preprocessing of brain networks
from fMRI data limit the development of machine learning models for brain network analysis.
Specifically, among all the large-scale publicly available fMRI datasets in literature, ABIDE [
6
] is the
only one provided with extracted brain networks fully accessible without permission requirements.
However, ABIDE is aggregated from 17 international sites with different scanners and acquisition
parameters. This inter-site variability conceals inter-group differences that are really meaningful,
which is reflected in the unstable training performance and the significant gap between validation and
testing performance in practice. To address these limitations, we propose to apply a stratified sampling
method in the dataset splitting process and standardize a fair evaluation pipeline for meaningful
model comparison on the ABIDE dataset. Our extensive experiments on this public ABIDE dataset
and a restricted ABCD dataset [
8
] show significant improvements brought by our proposed BRAIN
NETWORK TRANSFORMER.
2 Background and Related Work
2.1 GNNs for Brain Network Analysis
Recently, emerging attention has been devoted to the generalization of GNN-based models to brain
network analysis [
42
,
2
]. GroupINN [
62
] utilizes a grouping-based layer to provide explainability
and reduce the model size. BrainGNN [
43
] designs the ROI-aware GNNs to leverage the functional
information in brain networks and uses a special pooling operator to select these crucial nodes.
IBGNN [
14
] proposes an interpretable framework to analyze disorder-specific ROIs and prominent
connections. In addition, FBNetGen [
35
] considers the learnable generation of brain networks and
explores the explainability of the generated brain networks towards downstream tasks. Another
benchmark paper [
13
] systematically studies the effectiveness of various GNN designs over brain
network data. Different from other work focusing on static brain networks, STAGIN [
39
] utilizes
GNNs with spatio-temporal attention to model dynamic brain networks extracted from fMRI data.
2.2 Graph Transformer
Graph Transformer raises many researchers’ interest currently due to its outstanding performance
in graph representation learning. Graph Transformer [
21
] firstly injects edge information into the
attention mechanism and leverages the eigenvectors as positional embeddings. SAN [
40
] enhances
the positional embeddings and improves the attention mechanism by emphasizing neighbor nodes
while incorporating the global information. Graphomer [
64
] designs unique mechanisms for molecule
graphs and achieves the SOTA performance. Besides, a fine-grained attention mechanism is developed
for node classification [68]. Also, the Transformer is extended to larger-scale heterogeneous graphs
with a particular sampling algorithm in HGT [
32
]. EGT [
33
] further employs edge augmentation to
assist global self-attention. In addition, LSPE [
22
] leverages the learnable structural and positional
encoding to improve GNNs’ representation power, and GRPE [
49
] enhances the design of encoding
node relative position information in Transformer.
3 BRAIN NETWORK TRANSFORMER
3.1 Problem Definition
In brain network analysis, given a brain network
XRV×V
, where
V
is the number of nodes
(ROIs), the model aims to make a prediction indicating biological sex, presence of a disease or
other properties of the brain subject. The overall framework of our proposed BRAIN NETWORK
TRANSFORMER is shown in Figure 2, which is mainly composed of two components, an
L
-layer
attention module
MHSA
and a graph pooling operator OCREAD. Specifically, in the first component
of
MHSA
, the model learns attention-enhanced node features
ZL
through a non-linear mapping
XZLRV×V
. Then the second component of OCREAD compresses the enhanced node
embeddings
ZL
to graph-level embeddings
ZGRK×V
, where
K
is a hyperparameter representing
the number of clusters.
ZG
is then flattened and passed to a multi-layer perceptron for graph-level
predictions. The whole training process is supervised with the cross-entropy loss.
3
Scaled Dot-Product
Attention
Linear Linear Linear
Scaled Dot-Product
Attention
Linear Linear Linear
Scaled Dot-Product
Attention
Linear Linear Linear
Concat
Linear
𝑀
Project
Flatten
Biological sex, Autism
spectrum disorder…
Time
fMRI
ROI#1
ROI#2
ROI#3
ROI#4
𝐸!
×𝐿
Layers
𝒁!
𝑷
FCN
OCRead
MHSA
X
X
X
𝒁"
X
Figure 2: The overall framework of our proposed BRAIN NETWORK TRANSFORMER.
3.2 Multi-Head Self-Attention Module (MHSA)
To develop a powerful Transformer-based model suitable for brain networks, two fundamental
designs, the positional embedding and attention mechanism, need to be reconsidered to fit the natural
properties of brain network data. In existing graph transformer models, the positional information
is usually encoded via eigendecomposition, while the attention mechanism often combines node
positions with existing edges to calculate the attention scores. However, for the dense (often fully
connected) graphs of brain networks, eigendecomposition is rather costly, and the existence of edges
is hardly informative.
ROI node features on brain networks naturally contain sufficient positional information, making the
positional embeddings based on eigendecomposition redundant. Previous work on brain network
analysis has shown that the connection profile
Xi·
for node
i
, defined as the corresponding row for
each node in the edge weight matrix
X
, always achieves superior performance over others such
as node identities, degrees or eigenvector-based embeddings [
43
,
35
,
13
]. With this node feature
initialization, the self-connection weight
xii
on the diagonal is always equal to one, which encodes
sufficient information to determine the position of each node in a fully connected graph based on
the given brain atlas. To verify this insight, we also empirically compare the performance of the
original connection profile with two variants concatenated with additional positional information,
i.e., connection profile w/ identity feature and connection profile w/ eigen feature. The results indeed
show no benefit brought by the additional computations (c.f. Appendix B). As for the attention
mechanism, previous work [
13
] has empirically demonstrated that integrating edge weights into the
attention score calculation can significantly degrade the effectiveness of attention on complete graphs,
while the generation of edge-wise embedding can be unaffordable given a large number of edges in
brain networks. On the other hand, the existence of edges provides no useful information for the
computation of attention scores as well because all edges simply exist in complete graphs.
Based on the observations above, we design the basic BRAIN NETWORK TRANSFORMER by
(1) adopting the connection profile as initial node features and eliminating any extra positional
embeddings and (2) adopting the vanilla pair-wise attention mechanism without using edge weights
or relative position information to learn a singular attention score for each edge in the complete graph.
Formally, we leverage a
L
-layer non-linear mapping module, namely Multi-Head Self-Attention
(
MHSA
), to generate more expressive node features
ZL= MHSA(X)RV×V
. For each layer
l
,
the output Zlis obtained by
Zl= (kM
m=1hl,m)Wl
O,hl,m = Softmax
Wl,m
QZl1(Wl,m
KZl1)>
qdl,m
K
Wl,m
VZl1,(1)
where
Z0=X
,
k
is the concatenation operator,
M
is the number of heads,
l
is the layer index,
Wl
O,Wl,m
Q,Wl,m
K
,
Wl,m
V
are learnable model parameters, and
dl,m
K
is the first dimension of
Wl,m
K
.
4
3.3 ORTHONORMAL CLUSTERING READOUT (OCREAD)
The readout function is an essential component to learn the graph-level representations for brain
network analysis (e.g., classification), which maps a set of learned node-level embeddings to a graph-
level embedding.
Mean(·),Sum(·)
and
Max(·)
are the most commonly used readout functions for
GNNs. Xu et al. [
61
] show that GNNs equipped with
Sum(·)
readout have the same discriminative
power as the Weisfeiler-Lehman Test. Zhang et al. [
66
] propose a sort pooling to generate the
graph-level representation by sorting the final node representations. Ju et al. [
34
] present a layer-wise
readout by extending the node information aggregated from the last layer of GNNs to all layers.
However, none of the existing readout functions leverages the properties of brain networks that nodes
in the same functional modules tend to have similar behaviors and clustered representations, as shown
in Figure 1(a). To address this deficiency, we design a novel readout function to take advantage of
the modular-level similarities between ROIs in brain networks, where nodes are assigned softly to
well-chosen clusters with an unsupervised process.
Formally, given
K
cluster centers, each center has
V
dimensions,
ERK×V
, a Softmax projection
operator is used as the function to calculate the probability Pik of assigning node ito cluster k,
Pik =ehZL
i·,Ek·i
PK
k0ehZL
i·,Ek0·i,(2)
where
,·i
denotes the inner product and
ZL
is the learned set of node embeddings from the last
layer of
MHSA
module. With this computed soft assignment
PRV×K
, the original learned node
representation
ZL
can be aggregated under the guidance of the soft cluster information, where the
graph-level embedding ZGis obtained by ZG=P>ZL.
However, jointly learning node embeddings and clusters without ground-truth cluster labels is difficult.
To obtain representative soft assignment
P
, the initialization of
K
cluster centers
E
is critical and
should be designed delicately. To this end, we leverage the observation illustrated in Figure 1(b),
where orthonormal embeddings can improve the clustering of nodes in brain networks w.r.t. the
functional modules underlying brain regions.
Orthonormal Initialization
. To initialize a group of orthonormal bases as cluster centers, we first
adopt the Xavier uniform initialization [
25
] to initialize
K
random centers and each center contains
V
dimensions
CRK×V
. Then, we apply the Gram-Schmidt process to obtain the orthonormal
bases E, where
uk=Ck·
k1
X
j=1
huj,Ck·i
huj,ujiuj,Ek·=uk
kukk.(3)
In the next section, we theoretically prove the advantage of this orthonormal initialization.
3.3.1 Theoretical Justifications
In OCREAD, proper cluster centers can generate higher-quality soft assignments and enlarge the
difference between
P
from different classes. [
51
,
46
] showed the advantages of orthogonal initial-
ization in DNN model parameters. However, none of them proves whether it is an ideal strategy to
obtain the cluster centers. We propose two methods from the perspective of statistics as follows.
Firstly, to discern features of different nodes, we would expect a larger discrepancy among their
similarity probabilities indicated from the readout. One way to measure the discrepancy is using the
variance of
P
for each feature. Let
¯
P1/K
denote the mean of any discrete probabilities with
K
values. Variance of
P
measures the difference between
P
and
¯
P
. We average over the feature vector
space: if the result is small, then there is a large tendency that different
P
approaches
¯
P
and hence
cannot be discerned easily. Specifically, the following theorem holds for our function Eq. (2):
Theorem 3.1.
For arbitrary
r > 0
, let
Br={Z RV;kZk ≤ r}
denote the round ball centered
at origin of radius
r
with
Z
being fracture vectors. Let
Vr
be the volume of
Br
. The variance of
Softmax projection averaged over Br
1
VrZBr
K
X
kehZ,Ek·i
PK
k0ehZ,Ek0·i1
K2dZ,(4)
attains maximum when Eis orthonormal.
5
摘要:

BRAINNETWORKTRANSFORMERXuanKan1WeiDai2HejieCui1ZilongZhang3YingGuo1CarlYang11EmoryUniversity2StanfordUniversity3UniversityofInternationalBusinessandEconomics{xuan.kan,hejie.cui,yguo2,j.carlyang}@emory.edudvd.ai@stanford.edu201957020@uibe.edu.cnAbstractHumanbrainsarecommonlymodeledasnetworksofRegions...

收起<<
BRAIN NETWORK TRANSFORMER Xuan Kan1Wei Dai2Hejie Cui1Zilong Zhang3Ying Guo1Carl Yang1 1Emory University2Stanford University3University of International Business and Economics.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:23 页 大小:2.53MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注