
Graphomer [
64
], which achieves the first place on the quantum prediction track of OGB Large-Scale
Challenge [
30
], designs unique mechanisms for molecule graphs such as centrality encoding to
enhance node features and spatial/edge encoding to adapt attention scores.
However, brain networks have several unique traits that make directly applying existing graph
Transformer models impractical. First, one of the simplest and most frequently used methods to
construct a brain network in the neuroimaging community is via pairwise correlations between
BOLD time courses from two ROIs [
43
,
35
,
13
,
63
,
69
]. This impedes the designs like centrality,
spatial, and edge encoding because each node in the brain network has the same degree and connects
to every other node by a single hop. Second, in previous graph transformer models, eigenvalues
and eigenvectors are commonly used as positional embeddings because they can provide identity
and positional information for each node [
15
,
26
]. Nevertheless, in brain networks, the connection
profile, which is defined as each node’s corresponding row in the brain network adjacency matrix,
is recognized as the most effective node feature [
13
]. This node feature naturally encodes both
structural and positional information, making the aforementioned positional embedding design based
on eigenvalues and eigenvectors redundant. The third challenge is scalability. Typically, the numbers
of nodes and edges in molecule graphs are less than 50 and 2500, respectively. However, for brain
networks, the node number is generally around 100 to 400, while the edge number can be up to
160,000. Therefore, operations like the generation of all edge features in existing graph transformer
models can be time-consuming, if not infeasible.
Orthonormal Bases
Non-orthonormal Bases
(a) Node features projected to a 3D space
with PCA. Colors indicate functional modules.
(b) Orthonormal bases can make indistinguishable nodes in non-
orthonormal bases easily distinguishable.
Figure 1: Illustration of the motivations behind ORTHONORMAL CLUSTERING READOUT.
In this work, we propose to develop BRAIN NETWORK TRANSFORMER (BRAINNETTF), which
leverages the unique properties of brain network data to fully unleash the power of Transformer-based
models for brain network analysis. Specifically, motivated by previous findings on effective GNN
designs for brain networks [
13
], we propose to use the effective initial node features of connection
profiles. Empirical analysis shows that connection profiles naturally provide positional features
for Transformer-based models and avoid the costly computations of eigenvalues or eigenvectors.
Moreover, recent work demonstrates that GNNs trained on learnable graph structures can achieve
superior effectiveness and explainability [
35
]. Inspired by this insight, we propose to learn fully
pairwise attention weights with Transformer-based models, which resembles the process of learning
predictive brain network structures towards downstream tasks.
One step further, when GNNs are used for brain network analysis, a graph-level embedding needs
to be generated through a readout function based on the learned node embeddings [
37
,
43
,
13
]. As
is shown in Figure 1(a), a property of brain networks is that brain regions (nodes) belonging to the
same functional modules often share similar behaviors regarding activations and deactivations in
response to various stimulations [
7
]. Unfortunately, the current labeling of functional modules is
rather empirical and far from accurate. For example, [
3
] provides more than 100 different functional
module organizations based on hierarchical clustering. In order to leverage the natural functions of
brain regions without the limitation of inaccurate functional module labels, we design a new global
pooling operator, ORTHONORMAL CLUSTERING READOUT, where the graph-level embeddings
are pooled from clusters of functionally similar nodes through soft clustering with orthonormal
projection. Specifically, we first devise a self-supervised mechanism based on [
60
] to jointly assign
soft clusters to brain regions while learning their individual embeddings. To further facilitate the
learning of clusters and embeddings, we design an orthonormal projection and theoretically prove
its effectiveness in distinguishing embeddings across clusters, thus obtaining expressive graph-level
embeddings after the global pooling, as illustrated in Figure 1(b).
2